### Prepping Data Challenge: C&BS Co Absence Monitoring (week 24)

Chin & Beard Suds Co is just like any other company, people have unscheduled time off. Whilst this is expected in organisations, it can be difficult to manage. At C&BS Co, we have had a rough start to our financial year with lots of people being off for illness or sickness. How bad has it been and do we have people off every single day?

This analysis can be tough in BI tools to look at the day-to-day reality when days off are recorded with just a start date and the number of days taken off. This week's challenge is producing a simple data set that will give us this view. 

We are analysing the period 1st April to 31st May 2021.

### Requirements
 - Input data
 - Build a data set that has each date listed out between 1st April to 31st May 2021
 - Build a data set containing each date someone will be off work
 - Merge these two data sets together 
 - Workout the number of people off each day
 - Output the data

In [1]:
import pandas as pd

In [2]:
#Input the data

with pd.ExcelFile('WK24-Absenteeism Scaffold.xlsx') as xlsx:
    df = pd.read_excel(xlsx, 'Reasons') 

In [3]:
df.head()

Unnamed: 0,Name,Start Date,Days Off,Reason
0,Andy,2021-04-01,4.0,Illness
1,Carl,2021-04-04,5.0,Illness
2,Luke,2021-04-05,7.0,Accident
3,Tom,2021-04-07,2.0,Illness
4,Craig,2021-04-08,3.0,Accident


In [4]:
#Build a data set that has each date listed out between 1st April to 31st May 2021

df['Date'] = [pd.date_range(d, periods=p, freq='D') for d, p in zip(df['Start Date'], df['Days Off'])]

<div class="alert alert-block alert-info">
    
Pandas <strong>zip()</strong> function creates the objects and that can be used to produce single item at a time. This function can create pandans DataFrames by merging two lists. In this case, <strong> 'Start Date' </strong> and <strong> 'Days Off'</strong> 

    
</div>

In [5]:
df = df.explode('Date').groupby('Date')['Name'].count().reset_index()
df.rename(columns={'Name' : 'Number of people off each day'}, inplace=True)

In [6]:
# generate a list of days in the range of interest
start_date = '2021-04-01'
end_date = '2021-05-31'

df_dates = pd.DataFrame({'Date' : pd.date_range(start=start_date, end=end_date)})

In [7]:
# join to dates of interest and fill in zeroes
df = df_dates.merge(df, on='Date', how='left').fillna(0)

In [8]:
df.head()

Unnamed: 0,Date,Number of people off each day
0,2021-04-01,1.0
1,2021-04-02,1.0
2,2021-04-03,1.0
3,2021-04-04,2.0
4,2021-04-05,2.0


In [9]:
df.to_csv('WK24-output.csv', index=False)