# üêº Pandas Handbook

## 09 - Dates and Time Series

Check out the official [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/)  

This notebook uses the [Weather Prediction dataset](https://www.kaggle.com/datasets/ananthr1/weather-prediction/data) from Kaggle to demonstrate how to work with dates, times and time-based operations in pandas.

## üìö Table of Contents

---

üìÖ **Datetime Conversion**  
üï∞Ô∏è **Creating Date Ranges**  
üïì **Accessing Time Components**  
üïì **Accessing Time with .dt**  
üïì **Accessing Date Properties**  
üïì **Time Differences with Timedelta**  
üìÜ **Working with Periods and Offsets**    
üåç **Timezones and Localization**    
üîç **Filtering and Slicing Dates**  
üìä **Frequency Conversion, Resampling and Rolling**  
üîß **Handling Missing Data with .interpolate()**  
üìà **Calculating Lagged Values and Differences**  

---


In [1]:
import pandas as pd
import os

In [2]:
data_raw = "../data/raw/"
csv_file = "weather.csv"
import_path = os.path.join(data_raw, csv_file)
df = pd.read_csv(import_path)
df.head()

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather
0,2012-01-01,0.0,12.8,5.0,4.7,drizzle
1,2012-01-02,10.9,10.6,2.8,4.5,rain
2,2012-01-03,0.8,11.7,7.2,2.3,rain
3,2012-01-04,20.3,12.2,5.6,4.7,rain
4,2012-01-05,1.3,8.9,2.8,6.1,rain


In [3]:
df.dtypes

date              object
precipitation    float64
temp_max         float64
temp_min         float64
wind             float64
weather           object
dtype: object

### üìÖ Datetime Conversion

```pd.read_csv(PATH, parse_dates=['COLUMN'], date_format='FORMAT')``` ‚Äì Reads CSV and parses specified column as datetime.  
```pd.to_datetime(df['COLUMN'], format='FORMAT')``` ‚Äì Converts string-formatted date column to datetime.  

In [4]:
df = pd.read_csv(import_path, parse_dates=['date'], date_format='%Y-%m-%d')
df.dtypes

date             datetime64[ns]
precipitation           float64
temp_max                float64
temp_min                float64
wind                    float64
weather                  object
dtype: object

In [5]:
type(df.loc[0, 'date'])

pandas._libs.tslibs.timestamps.Timestamp

In [6]:
df = pd.read_csv(import_path)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df.dtypes

date             datetime64[ns]
precipitation           float64
temp_max                float64
temp_min                float64
wind                    float64
weather                  object
dtype: object

In [7]:
type(df.loc[0, 'date'])

pandas._libs.tslibs.timestamps.Timestamp

### üï∞Ô∏è Creating Date Ranges

```pd.date_range(START_DATE, periods=NUM, freq='FREQ')``` ‚Äì Generates a sequence of dates with a specified frequency.  
```pd.DataFrame(data=SEQUENCE, columns=['COLUMN'])``` ‚Äì Creates a DataFrame from a date sequence.  

In [8]:
time_range = pd.date_range('1/1/2012', periods=30, freq='d')
time_range_df = pd.DataFrame(data=time_range, columns=['date'])
time_range_df.set_index('date', inplace=True)
time_range_df.head()

2012-01-01
2012-01-02
2012-01-03
2012-01-04
2012-01-05


In [9]:
time_range = pd.date_range('1/1/2012', periods=100, freq='h')
time_range_df = pd.DataFrame(data=time_range, columns=['date'])
time_range_df.set_index('date', inplace=True)
time_range_df.head()

2012-01-01 00:00:00
2012-01-01 01:00:00
2012-01-01 02:00:00
2012-01-01 03:00:00
2012-01-01 04:00:00


### üïì Accessing Time Components

```df.at_time('HH:MM')``` ‚Äì Filters data for a specific time of day.  
```df.between_time('START', 'END')``` ‚Äì Filters data between two times of day.

In [10]:
time_range_df.at_time('09:00')

2012-01-01 09:00:00
2012-01-02 09:00:00
2012-01-03 09:00:00
2012-01-04 09:00:00


In [11]:
time_range_df.between_time('00:00','02:00')

2012-01-01 00:00:00
2012-01-01 01:00:00
2012-01-01 02:00:00
2012-01-02 00:00:00
2012-01-02 01:00:00
2012-01-02 02:00:00
2012-01-03 00:00:00
2012-01-03 01:00:00
2012-01-03 02:00:00
2012-01-04 00:00:00
2012-01-04 01:00:00


### üïì Accessing Time with .dt

```df['COLUMN'].dt.year``` ‚Äì Extracts the year from datetime.  
```df['COLUMN'].dt.month``` ‚Äì Extracts the month from datetime.  
```df['COLUMN'].dt.quarter``` ‚Äì Extracts the quarter of the year.  
```df['COLUMN'].dt.day``` ‚Äì Extracts the day of the month.  
```df['COLUMN'].dt.hour``` ‚Äì Extracts the hour.  
```df['COLUMN'].dt.minute``` ‚Äì Extracts the minute.  
```df['COLUMN'].dt.second``` ‚Äì Extracts the second.  
```df['COLUMN'].dt.day_name()``` ‚Äì Gets the day name (e.g., Monday).

In [12]:
df['date'].dt.year

0       2012
1       2012
2       2012
3       2012
4       2012
        ... 
1456    2015
1457    2015
1458    2015
1459    2015
1460    2015
Name: date, Length: 1461, dtype: int32

In [13]:
df['date'].dt.month

0        1
1        1
2        1
3        1
4        1
        ..
1456    12
1457    12
1458    12
1459    12
1460    12
Name: date, Length: 1461, dtype: int32

In [14]:
df['date'].dt.quarter

0       1
1       1
2       1
3       1
4       1
       ..
1456    4
1457    4
1458    4
1459    4
1460    4
Name: date, Length: 1461, dtype: int32

In [15]:
df['date'].dt.day

0        1
1        2
2        3
3        4
4        5
        ..
1456    27
1457    28
1458    29
1459    30
1460    31
Name: date, Length: 1461, dtype: int32

In [16]:
df['date'].dt.hour

0       0
1       0
2       0
3       0
4       0
       ..
1456    0
1457    0
1458    0
1459    0
1460    0
Name: date, Length: 1461, dtype: int32

In [17]:
df['date'].dt.minute

0       0
1       0
2       0
3       0
4       0
       ..
1456    0
1457    0
1458    0
1459    0
1460    0
Name: date, Length: 1461, dtype: int32

In [18]:
df['date'].dt.second

0       0
1       0
2       0
3       0
4       0
       ..
1456    0
1457    0
1458    0
1459    0
1460    0
Name: date, Length: 1461, dtype: int32

In [19]:
df['date'].dt.day_name()

0          Sunday
1          Monday
2         Tuesday
3       Wednesday
4        Thursday
          ...    
1456       Sunday
1457       Monday
1458      Tuesday
1459    Wednesday
1460     Thursday
Name: date, Length: 1461, dtype: object

### üïì Accessing Date Properties

```df.loc[ROW, 'COLUMN'].date()``` ‚Äì Extracts date part (no time).  
```df.loc[ROW, 'COLUMN'].day_name()``` ‚Äì Gets name of the weekday.  
```df.loc[ROW, 'COLUMN'].month_name()``` ‚Äì Gets name of the month.  
```df.loc[ROW, 'COLUMN'].day``` ‚Äì Extracts day.  
```df.loc[ROW, 'COLUMN'].week``` ‚Äì Extracts week number of the year.  
```df.loc[ROW, 'COLUMN'].dayofweek``` ‚Äì Extracts day of week as integer.  
```df.loc[ROW, 'COLUMN'].month``` ‚Äì Extracts numeric month.  
```df.loc[ROW, 'COLUMN'].year``` ‚Äì Extracts year.

In [20]:
df.loc[8, 'date']

Timestamp('2012-01-09 00:00:00')

In [21]:
df.loc[8, 'date'].date()

datetime.date(2012, 1, 9)

In [22]:
df.loc[8, 'date'].day_name()

'Monday'

In [23]:
df.loc[8, 'date'].month_name()

'January'

In [24]:
df.loc[8, 'date'].day

9

In [25]:
df.loc[8, 'date'].week

2

In [26]:
df.loc[8, 'date'].dayofweek

0

In [27]:
df.loc[0, 'date'].month

1

In [28]:
df.loc[0, 'date'].year

2012

### üïì Time Differences with Timedelta

```df['COLUMN'].min()``` ‚Äì Finds earliest datetime.  
```df['COLUMN'].max()``` ‚Äì Finds latest datetime.  
```df['COLUMN'].max() - df['COLUMN'].min()``` ‚Äì Computes timedelta between two dates.

In [29]:
df['date'].min()

Timestamp('2012-01-01 00:00:00')

In [30]:
df['date'].max()

Timestamp('2015-12-31 00:00:00')

In [31]:
timedelta = df['date'].max() - df['date'].min()
timedelta

Timedelta('1460 days 00:00:00')

### üìÜ Working with Period and Offsets

```pd.Period('VALUE')``` ‚Äì Creates a period object (e.g., year, month).  
```period.start_time``` ‚Äì Returns start time of the period.  
```period.end_time``` ‚Äì Returns end time of the period.  
```period += pd.offsets.FREQ(AMOUNT)``` ‚Äì Offsets a period forward or backward in time.  

In [32]:
year = pd.Period('2021')
year.start_time

Timestamp('2021-01-01 00:00:00')

In [33]:
year.end_time

Timestamp('2021-12-31 23:59:59.999999999')

In [34]:
month = pd.Period('2022-01')
day = pd.Period('2022-01', freq='d')
hour = pd.Period('2022-02-09 16:00:00', freq='h')
hour

Period('2022-02-09 16:00', 'h')

In [35]:
hour += pd.offsets.Hour(+2)
hour

Period('2022-02-09 18:00', 'h')

In [36]:
week = pd.date_range('2022-2-7', periods=7)
for day in week:
    print(f'{day.day_of_week}-{day.day_name()}\t{day.date()}')

0-Monday	2022-02-07
1-Tuesday	2022-02-08
2-Wednesday	2022-02-09
3-Thursday	2022-02-10
4-Friday	2022-02-11
5-Saturday	2022-02-12
6-Sunday	2022-02-13


### üåç Timezones and Localization

```df['COLUMN'].dt.tz_localize('TIMEZONE')``` ‚Äì Localizes datetime column to a timezone.  
```df['COLUMN'].dt.tz_convert('TIMEZONE')``` ‚Äì Converts timezone of a datetime column.

In [37]:
timezone_df =  df.copy()
timezone_df['date_utc'] = timezone_df['date'].dt.tz_localize('UTC')
timezone_df['date_utc'].head()

0   2012-01-01 00:00:00+00:00
1   2012-01-02 00:00:00+00:00
2   2012-01-03 00:00:00+00:00
3   2012-01-04 00:00:00+00:00
4   2012-01-05 00:00:00+00:00
Name: date_utc, dtype: datetime64[ns, UTC]

In [38]:
timezone_df['date_pacific'] = timezone_df['date_utc'].dt.tz_convert('US/Pacific')
timezone_df[['date_utc', 'date_pacific']].head()

Unnamed: 0,date_utc,date_pacific
0,2012-01-01 00:00:00+00:00,2011-12-31 16:00:00-08:00
1,2012-01-02 00:00:00+00:00,2012-01-01 16:00:00-08:00
2,2012-01-03 00:00:00+00:00,2012-01-02 16:00:00-08:00
3,2012-01-04 00:00:00+00:00,2012-01-03 16:00:00-08:00
4,2012-01-05 00:00:00+00:00,2012-01-04 16:00:00-08:00


### üîç Filtering and Slicing Dates

```df['COLUMN'] > pd.to_datetime('DATE')``` ‚Äì Filters rows on a datetime condition.  
```df.index > pd.Timestamp('DATE')``` ‚Äì Filters rows on a Timestamp condition.  
```df.loc['YEAR']``` ‚Äì Slices all rows for a specific year.  
```df.loc['START_DATE' : 'END_DATE']``` ‚Äì Slices rows between two dates.

In [39]:
time_filter = (df['date'] > pd.to_datetime('2013'))
df.loc[time_filter].head()

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather
367,2013-01-02,0.0,6.1,-1.1,3.2,sun
368,2013-01-03,4.1,6.7,-1.7,3.0,rain
369,2013-01-04,2.5,10.0,2.2,2.8,rain
370,2013-01-05,3.0,6.7,4.4,3.1,rain
371,2013-01-06,2.0,7.2,2.8,3.0,rain


In [40]:
filter_df = df.copy()
filter_df.set_index('date', inplace=True)
filter_df.sort_index(inplace=True)

period_filter = (filter_df.index >= pd.Timestamp('2013-01-02')) & (filter_df.index < pd.Timestamp('2013-01-07'))
filter_df.loc[period_filter].head()

Unnamed: 0_level_0,precipitation,temp_max,temp_min,wind,weather
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013-01-02,0.0,6.1,-1.1,3.2,sun
2013-01-03,4.1,6.7,-1.7,3.0,rain
2013-01-04,2.5,10.0,2.2,2.8,rain
2013-01-05,3.0,6.7,4.4,3.1,rain
2013-01-06,2.0,7.2,2.8,3.0,rain


In [41]:
slice_df = df.copy()
slice_df.set_index('date', inplace=True)
slice_df.sort_index(inplace=True)
slice_df.loc['2013'].head()

Unnamed: 0_level_0,precipitation,temp_max,temp_min,wind,weather
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013-01-01,0.0,5.0,-2.8,2.7,sun
2013-01-02,0.0,6.1,-1.1,3.2,sun
2013-01-03,4.1,6.7,-1.7,3.0,rain
2013-01-04,2.5,10.0,2.2,2.8,rain
2013-01-05,3.0,6.7,4.4,3.1,rain


In [42]:
slice_df.loc['2013-01-02' : '2013-01-09']

Unnamed: 0_level_0,precipitation,temp_max,temp_min,wind,weather
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013-01-02,0.0,6.1,-1.1,3.2,sun
2013-01-03,4.1,6.7,-1.7,3.0,rain
2013-01-04,2.5,10.0,2.2,2.8,rain
2013-01-05,3.0,6.7,4.4,3.1,rain
2013-01-06,2.0,7.2,2.8,3.0,rain
2013-01-07,2.3,10.0,4.4,7.3,rain
2013-01-08,16.3,11.7,5.6,6.3,rain
2013-01-09,38.4,10.0,1.7,5.1,rain


### üìä Frequency Conversion, Resampling and Rolling

```df.asfreq('FREQ')``` ‚Äì Changes frequency without aggregation.  
```df.resample('FREQ')['COLUMN'].mean()``` ‚Äì Resamples data and calculates mean.  
```df['COLUMN'].rolling(window=N).mean()``` ‚Äì Computes rolling mean over a window.

In [43]:
weekly_df = df.copy()
weekly_df.set_index('date', inplace=True)
weekly_df.sort_index(inplace=True)

weekly_df.asfreq('W').head()

Unnamed: 0_level_0,precipitation,temp_max,temp_min,wind,weather
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-01-01,0.0,12.8,5.0,4.7,drizzle
2012-01-08,0.0,10.0,2.8,2.0,sun
2012-01-15,5.3,1.1,-3.3,3.2,snow
2012-01-22,6.1,6.7,2.2,4.8,rain
2012-01-29,27.7,9.4,3.9,4.5,rain


In [44]:
resample_df = df.copy()
resample_df.set_index('date', inplace=True)
resample_df.sort_index(inplace=True)

monthly_temp = resample_df.resample('ME')['temp_max'].mean()
monthly_temp.head()

date
2012-01-31     7.054839
2012-02-29     9.275862
2012-03-31     9.554839
2012-04-30    14.873333
2012-05-31    17.661290
Freq: ME, Name: temp_max, dtype: float64

In [45]:
resample_df = df.copy()
resample_df.set_index('date', inplace=True)
resample_df.sort_index(inplace=True)

monthly_temp = resample_df.resample('YE')['temp_min'].mean()
monthly_temp.head()

date
2012-12-31    7.289617
2013-12-31    8.153973
2014-12-31    8.662466
2015-12-31    8.835616
Freq: YE-DEC, Name: temp_min, dtype: float64

In [46]:
rolling_df = df.copy()
rolling_df.set_index('date', inplace=True)
rolling_df.sort_index(inplace=True)

rolling_df['temp_max'] = rolling_df['temp_max'].rolling(window=3).mean()
rolling_df['temp_max'].head()

date
2012-01-01          NaN
2012-01-02          NaN
2012-01-03    11.700000
2012-01-04    11.500000
2012-01-05    10.933333
Name: temp_max, dtype: float64

### üîß Handling Missing Data with .interpolate()

```df['COLUMN'].interpolate(limit_direction='both')``` ‚Äì Fills missing values using interpolation.

In [47]:
interpolate_df = rolling_df.copy()
interpolate_df['temp_max'].isna().sum()

np.int64(2)

In [48]:
interpolate_df['temp_max'] = interpolate_df['temp_max'].interpolate(limit_direction='both')
interpolate_df.head()

Unnamed: 0_level_0,precipitation,temp_max,temp_min,wind,weather
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-01-01,0.0,11.7,5.0,4.7,drizzle
2012-01-02,10.9,11.7,2.8,4.5,rain
2012-01-03,0.8,11.7,7.2,2.3,rain
2012-01-04,20.3,11.5,5.6,4.7,rain
2012-01-05,1.3,10.933333,2.8,6.1,rain


### üìà Calculating Lagged Values and Differences

```df['COLUMN'].shift(N)``` ‚Äì Shifts values by N periods (e.g., previous day's value).  
```df['COLUMN'].diff()``` ‚Äì Calculates the difference between current and previous value.  
```df['COLUMN'].pct_change()``` ‚Äì Calculates percentage change from previous value.

In [49]:
shift_df = df.copy()
shift_df.set_index('date', inplace=True)
shift_df.sort_index(inplace=True)

shift_df['temp_max_prev_day'] = shift_df['temp_max'].shift(1)
shift_df[['temp_max', 'temp_max_prev_day']].head()

Unnamed: 0_level_0,temp_max,temp_max_prev_day
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2012-01-01,12.8,
2012-01-02,10.6,12.8
2012-01-03,11.7,10.6
2012-01-04,12.2,11.7
2012-01-05,8.9,12.2


In [50]:
diff_df = df.copy()
diff_df.set_index('date', inplace=True)
diff_df.sort_index(inplace=True)

diff_df['temp_max_diff'] = diff_df['temp_max'].diff()
diff_df[['temp_max', 'temp_max_diff']].head(3)

Unnamed: 0_level_0,temp_max,temp_max_diff
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2012-01-01,12.8,
2012-01-02,10.6,-2.2
2012-01-03,11.7,1.1


In [51]:
pctchange_df = df.copy()
pctchange_df.set_index('date', inplace=True)
pctchange_df.sort_index(inplace=True)

pctchange_df['temp_max_diff_pct'] = diff_df['temp_max'].pct_change()
pctchange_df[['temp_max', 'temp_max_diff_pct']].head()

Unnamed: 0_level_0,temp_max,temp_max_diff_pct
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2012-01-01,12.8,
2012-01-02,10.6,-0.171875
2012-01-03,11.7,0.103774
2012-01-04,12.2,0.042735
2012-01-05,8.9,-0.270492


### üëâ Next Topic: [Plotting](./10-plotting-visualization.ipynb)

Learn how to visualize DataFrames in pandas.