# Chapter 8 Code Examples (Dates & Time Series in HR)

This Colab notebook consolidates the code examples from **Chapter 8** (working with dates and time-based analysis in HR data).

**Required file:** `Absenteeism_data.csv` (upload it to the Colab session or mount Google Drive).

> Source: user-provided PDF (Chapter 8).

## 1) Environment setup
We start by importing the core libraries used across the chapter.

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta


## 2) Load the Absenteeism dataset
This reads the CSV and previews the first rows.

In [2]:
# Load data
df = pd.read_csv('Absenteeism_data.csv')
print(df.head())


   ID  Reason for Absence        Date  Transportation Expense  \
0  11                  26  07/07/2015                     289   
1  36                   0  14/07/2015                     118   
2   3                  23  15/07/2015                     179   
3   7                   7  16/07/2015                     279   
4  11                  23  23/07/2015                     289   

   Distance to Work  Age  Daily Work Load Average  Body Mass Index  Education  \
0                36   33                  239.554               30          1   
1                13   50                  239.554               31          1   
2                51   38                  239.554               31          1   
3                 5   39                  239.554               24          1   
4                36   33                  239.554               30          1   

   Children  Pets  Absenteeism Time in Hours  
0         2     1                          4  
1         1     0           

## 3) Inspect and convert the Date column
In many HR datasets, dates are stored as text (`object`). Convert them to `datetime64[ns]` for time operations.

In [8]:
# عرض نوع البيانات في عمود التاريخ
print(df['Date'].dtype)

# عرض بعض القيم من عمود التاريخ
print(df['Date'].head(10))


# تحويل عمود التاريخ إلى نوع datetime
df['Date'] = pd.to_datetime(df['Date'], format="%d/%m/%Y")

# التحقق من نوع البيانات بعد التحويل
print(df['Date'].dtype)

# عرض البيانات بعد التحويل
print(df['Date'].head())

object
0    07/07/2015
1    14/07/2015
2    15/07/2015
3    16/07/2015
4    23/07/2015
5    10/07/2015
6    17/07/2015
7    24/07/2015
8    06/07/2015
9    13/07/2015
Name: Date, dtype: object
datetime64[ns]
0   2015-07-07
1   2015-07-14
2   2015-07-15
3   2015-07-16
4   2015-07-23
Name: Date, dtype: datetime64[ns]


## 4) Extract date components
Create new columns (year, month, day, day-of-week, and day name) to support seasonal and weekly pattern analysis.

In [9]:
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['Day_of_Week'] = df['Date'].dt.dayofweek  # Monday=0 ... Sunday=6
df['Day_Name'] = df['Date'].dt.day_name()

print(df[['Date', 'Year', 'Month', 'Day', 'Day_of_Week', 'Day_Name']].head())


        Date  Year  Month  Day  Day_of_Week   Day_Name
0 2015-07-07  2015      7    7            1    Tuesday
1 2015-07-14  2015      7   14            1    Tuesday
2 2015-07-15  2015      7   15            2  Wednesday
3 2015-07-16  2015      7   16            3   Thursday
4 2015-07-23  2015      7   23            3   Thursday


## 5) Absenteeism analysis by month and weekday
Compute (i) total absenteeism hours by month and (ii) average absenteeism hours by weekday.

In [11]:
# Total hours absent by month
monthly_absence = df.groupby('Month')['Absenteeism Time in Hours'].sum()
print('Total absenteeism hours by month:')
print(monthly_absence)

# Average hours absent by weekday
weekly_absence = df.groupby('Day_Name')['Absenteeism Time in Hours'].mean()
print('Average absenteeism hours by weekday:')
print(weekly_absence)


Total absenteeism hours by month:
Month
1     222
2     294
3     765
4     482
5     371
6     313
7     470
8     288
9     292
10    349
11    473
12    414
Name: Absenteeism Time in Hours, dtype: int64
Average absenteeism hours by weekday:
Day_Name
Friday       5.659091
Monday       9.397163
Saturday     8.500000
Sunday       3.666667
Thursday     4.000000
Tuesday      7.524823
Wednesday    6.863636
Name: Absenteeism Time in Hours, dtype: float64


## 6) Seasonal/quarter analysis
Create a quarter column and aggregate absenteeism hours by year and quarter.

In [12]:
df['Quarter'] = df['Date'].dt.quarter

quarterly_absence = df.groupby(['Year', 'Quarter'])['Absenteeism Time in Hours'].sum()
print('Total absenteeism hours by year and quarter:')
print(quarterly_absence)


Total absenteeism hours by year and quarter:
Year  Quarter
2015  3          357
      4          412
2016  1          428
      2          476
      3          372
      4          450
2017  1          307
      2          371
      3          321
      4          374
2018  1          546
      2          319
Name: Absenteeism Time in Hours, dtype: int64


## 7) Filter a specific date range
Filter the dataset to a specific time interval (example: the second half of 2010).

In [13]:
start_date = '2010-07-01'
end_date = '2010-12-31'

filtered_data = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]
print(f'Number of records from {start_date} to {end_date}: {len(filtered_data)}')


Number of records from 2010-07-01 to 2010-12-31: 0


## 8) Identify coverage period (first/last date)
Compute the first and last date in the dataset, and the total covered time span.

In [14]:
first_date = df['Date'].min()
last_date = df['Date'].max()

total_period = last_date - first_date

print(f'First date in data: {first_date}')
print(f'Last date in data: {last_date}')
print(f'Total period (days): {total_period.days}')


First date in data: 2015-07-06 00:00:00
Last date in data: 2018-05-31 00:00:00
Total period (days): 1060


## 9) Resampling: monthly summary
Set `Date` as an index and resample to monthly frequency, computing sum/mean/count.

In [15]:
df_indexed = df.set_index('Date')

monthly_summary = df_indexed['Absenteeism Time in Hours'].resample('M').agg(['sum', 'mean', 'count'])
print('Monthly summary of absenteeism hours:')
print(monthly_summary.head())


Monthly summary of absenteeism hours:
            sum      mean  count
Date                            
2015-07-31  129  6.789474     19
2015-08-31  140  6.666667     21
2015-09-30   88  4.000000     22
2015-10-31   74  4.111111     18
2015-11-30  161  8.944444     18


  monthly_summary = df_indexed['Absenteeism Time in Hours'].resample('M').agg(['sum', 'mean', 'count'])


## 10) Workforce example dataset (synthetic)
The chapter uses a small monthly headcount series to demonstrate resampling, interpolation, and rolling statistics.

In [16]:
workforce_data = pd.DataFrame({
    'date': pd.date_range('2015-01-01', periods=36, freq='M'),
    'employees': [100, 105, 102, 108, 110, 115, 112, 118, 120, 125,
                  122, 128, 130, 135, 132, 138, 140, 145, 142, 148,
                  150, 155, 152, 158, 160, 165, 162, 168, 170, 175,
                  172, 178, 180, 185, 182, 188]
})

workforce_data.set_index('date', inplace=True)
print('First 5 rows:')
print(workforce_data.head())


First 5 rows:
            employees
date                 
2015-01-31        100
2015-02-28        105
2015-03-31        102
2015-04-30        108
2015-05-31        110


  'date': pd.date_range('2015-01-01', periods=36, freq='M'),


## 11) Resample monthly → quarterly (downsample)
Compute quarterly mean and quarterly sum.

In [18]:
quarterly_avg = workforce_data.resample('Q').mean()
print('Quarterly average headcount:')
print(quarterly_avg.head())

quarterly_sum = workforce_data.resample('Q').sum()
print('Quarterly sum headcount:')
print(quarterly_sum.head())


Quarterly average headcount:
             employees
date                  
2015-03-31  102.333333
2015-06-30  111.000000
2015-09-30  116.666667
2015-12-31  125.000000
2016-03-31  132.333333
Quarterly sum headcount:
            employees
date                 
2015-03-31        307
2015-06-30        333
2015-09-30        350
2015-12-31        375
2016-03-31        397


  quarterly_avg = workforce_data.resample('Q').mean()
  quarterly_sum = workforce_data.resample('Q').sum()


## 12) Resample monthly → weekly (upsample) with interpolation
Increase frequency to weekly and fill missing values by linear interpolation.

In [19]:
weekly_data = workforce_data.resample('W').interpolate(method='linear')
print('Weekly data (first 10 rows):')
print(weekly_data.head(10))

# Optional: fill both directions to reduce initial NaN when needed
weekly_data_both = workforce_data.resample('W').interpolate(method='linear', limit_direction='both')
print('Weekly data with bidirectional fill (first 10 rows):')
print(weekly_data_both.head(10))


Weekly data (first 10 rows):
            employees
date                 
2015-02-01        NaN
2015-02-08        NaN
2015-02-15        NaN
2015-02-22        NaN
2015-03-01        NaN
2015-03-08        NaN
2015-03-15        NaN
2015-03-22        NaN
2015-03-29        NaN
2015-04-05        NaN
Weekly data with bidirectional fill (first 10 rows):
            employees
date                 
2015-02-01      110.0
2015-02-08      110.0
2015-02-15      110.0
2015-02-22      110.0
2015-03-01      110.0
2015-03-08      110.0
2015-03-15      110.0
2015-03-22      110.0
2015-03-29      110.0
2015-04-05      110.0


## 13) Rolling statistics
Compute rolling mean (3 and 6 months), rolling standard deviation (3 months), and rolling min/max (3 months).

In [21]:
rolling_mean_3 = workforce_data['employees'].rolling(window=3).mean()
rolling_mean_6 = workforce_data['employees'].rolling(window=6).mean()

print('Rolling mean (3 months) - first 10:')
print(rolling_mean_3.head(10))

print('Rolling mean (6 months) - first 10:')
print(rolling_mean_6.head(10))

rolling_std_3 = workforce_data['employees'].rolling(window=3).std()
print('Rolling std (3 months) - first 10:')
print(rolling_std_3.head(10))

rolling_min_3 = workforce_data['employees'].rolling(window=3).min()
rolling_max_3 = workforce_data['employees'].rolling(window=3).max()

print('Rolling min (3 months) - first 10:')
print(rolling_min_3.head(10))

print('Rolling max (3 months) - first 10:')
print(rolling_max_3.head(10))


Rolling mean (3 months) - first 10:
date
2015-01-31           NaN
2015-02-28           NaN
2015-03-31    102.333333
2015-04-30    105.000000
2015-05-31    106.666667
2015-06-30    111.000000
2015-07-31    112.333333
2015-08-31    115.000000
2015-09-30    116.666667
2015-10-31    121.000000
Name: employees, dtype: float64
Rolling mean (6 months) - first 10:
date
2015-01-31           NaN
2015-02-28           NaN
2015-03-31           NaN
2015-04-30           NaN
2015-05-31           NaN
2015-06-30    106.666667
2015-07-31    108.666667
2015-08-31    110.833333
2015-09-30    113.833333
2015-10-31    116.666667
Name: employees, dtype: float64
Rolling std (3 months) - first 10:
date
2015-01-31         NaN
2015-02-28         NaN
2015-03-31    2.516611
2015-04-30    3.000000
2015-05-31    4.163332
2015-06-30    3.605551
2015-07-31    2.516611
2015-08-31    3.000000
2015-09-30    4.163332
2015-10-31    3.605551
Name: employees, dtype: float64
Rolling min (3 months) - first 10:
date
2015-01-31  

## Appendix) Rolling with `min_periods`
If you want rolling statistics to start earlier (with fewer points), set `min_periods`.

In [22]:
rolling_mean_3_min1 = workforce_data['employees'].rolling(window=3, min_periods=1).mean()
print(rolling_mean_3_min1.head(10))


date
2015-01-31    100.000000
2015-02-28    102.500000
2015-03-31    102.333333
2015-04-30    105.000000
2015-05-31    106.666667
2015-06-30    111.000000
2015-07-31    112.333333
2015-08-31    115.000000
2015-09-30    116.666667
2015-10-31    121.000000
Name: employees, dtype: float64
