### Prepping Data Challenge:  Calendar Conundrum (week 36)

### Requirements
- Input the data
- Create a Calendar Table
    - Create a date range for the calendar
          - This should be dynamic to handle new data
          - The start of the range should be the based on the year of the earliest date
                 - If earliest date is 06/01/2021, the start date should be 01/01/2021
          - The end of the range should be the last day of the year for the latest date in the data set 
                 - If the latest date is 06/01/2022, the end date should be 31/12/2022
          - Generate a row for every day between the start and end date to get a calendar table
- Create a field containing the full name for each employee
- Get a unique list of employees with their full name, first/last name fields, and employee id
- Join the list to the calendar table
    - You should have a table with one row per employee per day
- Join the new calendar table to the main table
    - One row per employee per day, even on days where the employee wasn’t scheduled
- Create a flag if the employee was scheduled on the day
- Handle any null values
- Output the data

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime

In [2]:
#Input the data
df =  pd.read_csv(r"\Dataprep\2022\employee_data.csv", parse_dates=['scheduled_date'])

In [3]:
df.head(10)

Unnamed: 0,emp_id,first_name,last_name,scheduled_date
0,3,Maressa,Dearell,2022-08-30
1,4,Micheil,Manklow,2022-08-30
2,8,Nikola,Baszkiewicz,2022-08-30
3,3,Maressa,Dearell,2022-08-29
4,4,Micheil,Manklow,2022-08-29
5,8,Nikola,Baszkiewicz,2022-08-29
6,3,Maressa,Dearell,2022-08-28
7,4,Micheil,Manklow,2022-08-28
8,8,Nikola,Baszkiewicz,2022-08-28
9,3,Maressa,Dearell,2022-08-24


In [4]:
# Create a Calendar Table
# generate the date range
dates = pd.date_range(start=datetime(df['scheduled_date'].min().year, 1, 1), end=datetime(df['scheduled_date'].max().year, 12, 31), freq='1D')

In [5]:
dates

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04',
               '2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08',
               '2022-01-09', '2022-01-10',
               ...
               '2023-12-22', '2023-12-23', '2023-12-24', '2023-12-25',
               '2023-12-26', '2023-12-27', '2023-12-28', '2023-12-29',
               '2023-12-30', '2023-12-31'],
              dtype='datetime64[ns]', length=730, freq='D')

In [6]:
#Create a field containing the full name for each employee
# Get a unique list of employees with their full name, first/last name fields, and employee id
# Join the list to the calendar table
dfjoin = ( pd.DataFrame(dates, columns=['scheduled_date'])
             .merge(df.drop(columns='scheduled_date').drop_duplicates(), how='cross')
             .merge(df[['scheduled_date', 'emp_id']].assign(scheduled=True), 
                    on=['scheduled_date', 'emp_id'], how='left')
             .fillna(False))

  .fillna(False))


In [7]:
dfjoin.head()

Unnamed: 0,scheduled_date,emp_id,first_name,last_name,scheduled
0,2022-01-01,3,Maressa,Dearell,False
1,2022-01-01,4,Micheil,Manklow,False
2,2022-01-01,8,Nikola,Baszkiewicz,False
3,2022-01-01,6,Graig,Colvin,False
4,2022-01-01,7,Jennilee,Brimson,False


In [8]:
dfjoin.insert(loc=2, column='Full_name', value=dfjoin['first_name'] + ' ' + dfjoin['last_name'])

In [11]:
output = dfjoin[['scheduled_date', 'emp_id','Full_name', 'first_name', 'last_name','scheduled']]

In [12]:
output.head(20)

Unnamed: 0,scheduled_date,emp_id,Full_name,first_name,last_name,scheduled
0,2022-01-01,3,Maressa Dearell,Maressa,Dearell,False
1,2022-01-01,4,Micheil Manklow,Micheil,Manklow,False
2,2022-01-01,8,Nikola Baszkiewicz,Nikola,Baszkiewicz,False
3,2022-01-01,6,Graig Colvin,Graig,Colvin,False
4,2022-01-01,7,Jennilee Brimson,Jennilee,Brimson,False
5,2022-01-01,5,Burtie Connar,Burtie,Connar,False
6,2022-01-01,9,Chet Fitzharris,Chet,Fitzharris,False
7,2022-01-01,2,Curtis Utterson,Curtis,Utterson,False
8,2022-01-01,10,Aile Antonazzi,Aile,Antonazzi,False
9,2022-01-01,1,Brooke Spurgeon,Brooke,Spurgeon,False


In [13]:
#output the data
output.to_csv('wk36-output.csv', index=False)