# Working with Dates and Times

In [2]:
import pandas as pd
import datetime as dt

## Review of Python's datetime Module
- The `datetime` module is built into the core Python programming language.
- The common alias for the `datetime` module is `dt`.
- A module is a Python source file; think of like an internal library that Python loads on demand.
- The `datetime` module includes `date` and `datetime` classes for representing dates and datetimes.
- The `date` constructor accepts arguments for year, month, and day. Python defaults to 0 for any missing values.
- The `datetime` constructor accepts arguments for year, month, day, hour, minute, and second.

In [19]:
someday= dt.date(2025, 1, 28)

print(someday.day)
print(someday.month)
print(someday.year)
print('-'*10, end= '\n\n')

dt.datetime(2025, 1, 28)
dt.datetime(2025, 1, 28, 12, 9)
dt.datetime(2025, 1, 28, 12, 9, 43, 11)

somedatetime= dt.datetime(2025, 1, 28, 12, 9, 43, 11)
print(somedatetime.year)
print(somedatetime.month)
print(somedatetime.day)
print(somedatetime.hour)
print(somedatetime.minute)
print(somedatetime.second)
print(somedatetime.microsecond)

28
1
2025
----------

2025
1
28
12
9
43
11


## The Timestamp and DatetimeIndex Objects

- Pandas ships with several classes related to datetimes.
- The **Timestamp** is similar to Python's **datetime** object (but with expanded functionality).
- A **DatetimeIndex** is an index of **Timestamp** objects.
- The **Timestamp** constructor accepts a string, a **datetime** object, or equivalent arguments to the **datetime** clas.

In [28]:
pd.Timestamp(2024, 3, 12)
pd.Timestamp(2025, 1, 31, 13, 47, 23)
pd.Timestamp(dt.date(2015, 11, 10))
pd.Timestamp(dt.datetime(2018, 10, 15, 16, 12, 48))
pd.Timestamp('2023-12-25')
pd.Timestamp('05/11/2001') # MM/dd/yyyy
pd.Timestamp('2024-10-29 15:42:33')

Timestamp('2024-10-29 15:42:33')

In [33]:
pd.Series([pd.Timestamp('2024-10-29 15:42:33')]).iloc[0]

Timestamp('2024-10-29 15:42:33')

In [37]:
pd.DatetimeIndex(['2005-10-12', '2010-11-24', '2029-06-23'])

index= pd.DatetimeIndex(['2005-10-12', '2010-11-24', '2029-06-23'])
index

DatetimeIndex(['2005-10-12', '2010-11-24', '2029-06-23'], dtype='datetime64[ns]', freq=None)

In [39]:
type(index)
type(index[0])

pandas._libs.tslibs.timestamps.Timestamp

- We can understand the datetime index as a list of timestamp objects, but we can also have an original Series of it

## Create Range of Dates with pd.date_range Function
- The `date_range` function generates and returns a **DatetimeIndex** holding a sequence of dates.
- The function requires 2 of the 3 following parameters: `start`, `end`, and `period`.
- With `start` and `end`, Pandas will assume a daily period/interval.
- Every element within a **DatetimeIndex** is a **Timestamp**.

In [63]:
pd.date_range(start= '2025-01-01', end= '2025-01-07')
pd.date_range(start= '2025-01-01', end= '2025-01-07', freq= 'D') #getting the exact same result above
pd.date_range(start= '2025-01-01', end= '2025-01-07', freq= '2d')
pd.date_range(start= '2014-11-20', end= '2014-12-31', freq= 'B') # business days (monday-friday)
pd.date_range(start= '2025-01-01', end= '2025-01-31', freq= 'W') # week days (intervals of 1 week starting from sunday)
pd.date_range(start= '2025-01-01', end= '2025-01-31', freq= 'W-TUE') # week days (intervals of 1 week starting from tuesday)

pd.date_range(start= '2024-01-01', end= '2024-01-31', freq='H') # values hourly spaced (default: 1h)
pd.date_range(start= '2024-01-01', end= '2024-01-31', freq='6H')

pd.date_range(start= '2025-01-01', end= '2025-12-31', freq= 'M') # dates monthly spaced, but always considering the end of each month
pd.date_range(start= '2025-01-01', end= '2025-12-31', freq= 'MS') # dates monthly spaced, but considering the start of each month
pd.date_range(start= '2025-01-01', end= '2050-12-31', freq='A') # dates annualy spaced
pd.date_range(start= '2025-01-01', end= '2050-12-31', freq='AS-FEB') # dates annualy spaced every feb, and always taking the start of each month

pd.date_range(start='2025-01-28', freq= 'D', periods= 25) # we're taking 25 occurrences of normal days ahead of the starting point
pd.date_range(start= '2025-01-28', freq='B', periods= 50) # we're taking 50 occurrences of business days ahead of the starting point
pd.date_range(end= '2025-01-28', freq='D', periods= 28) # we're taking 28 occurrences of normal days behind the starting point

DatetimeIndex(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04',
               '2025-01-05', '2025-01-06', '2025-01-07', '2025-01-08',
               '2025-01-09', '2025-01-10', '2025-01-11', '2025-01-12',
               '2025-01-13', '2025-01-14', '2025-01-15', '2025-01-16',
               '2025-01-17', '2025-01-18', '2025-01-19', '2025-01-20',
               '2025-01-21', '2025-01-22', '2025-01-23', '2025-01-24',
               '2025-01-25', '2025-01-26', '2025-01-27', '2025-01-28'],
              dtype='datetime64[ns]', freq='D')

## The dt Attribute
- The `dt` attribute reveals a `DatetimeProperties` object with attributes/methods for working with datetimes. It is similar to the `str` attribute for string methods.
- The `DatetimeProperties` object has attributes like `day`, `month`, and `year` to reveal information about each date in the **Series**.
- The `day_name` method returns the written day of the week.
- Attributes like `is_month_end` and `is_quarter_start` return Boolean **Series**.

In [69]:
bunch_of_dates= pd.Series(pd.date_range(start= '2000-01-01', end= '2020-12-31', freq= '24D 4.5H'))
bunch_of_dates

0     2000-01-01 00:00:00
1     2000-01-25 04:30:00
2     2000-02-18 09:00:00
3     2000-03-13 13:30:00
4     2000-04-06 18:00:00
              ...        
313   2020-09-22 16:30:00
314   2020-10-16 21:00:00
315   2020-11-10 01:30:00
316   2020-12-04 06:00:00
317   2020-12-28 10:30:00
Length: 318, dtype: datetime64[ns]

In [78]:
bunch_of_dates.dt.day
bunch_of_dates.dt.month
bunch_of_dates.dt.year
bunch_of_dates.dt.hour
bunch_of_dates.dt.day_of_year
bunch_of_dates.dt.day_of_week.map({0: 'Sunday', 1: 'Monday', 2: 'Tuesday', 3: 'Wednesday', 4: 'Thursday', 5: 'Friday', 6: 'Saturday'})
bunch_of_dates.dt.day_name()

0      Saturday
1       Tuesday
2        Friday
3        Monday
4      Thursday
         ...   
313     Tuesday
314      Friday
315     Tuesday
316      Friday
317      Monday
Length: 318, dtype: object

In [82]:
bunch_of_dates.dt.day_of_week

0      5
1      1
2      4
3      0
4      3
      ..
313    1
314    4
315    1
316    4
317    0
Length: 318, dtype: int32

In [81]:
sorted(bunch_of_dates.dt.day_of_week.unique())

[0, 1, 2, 3, 4, 5, 6]

## Selecting Rows from a DataFrame with a DateTimeIndex
- The `iloc` accessor is available for index position-based extraction.
- The `loc` accessor accepts strings or **Timestamps** to extract by index label/value. Note that Python's `datetime` objects will not work.
- Use list slicing to extract a sequence of dates. The `truncate` method is another alternative.

In [93]:
stocks= pd.read_csv('ibm.csv', parse_dates=['Date'], date_format='%Y-%m-%d', index_col= 'Date').sort_index()
stocks.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-02,5.0461,5.0461,4.98716,4.98716,593562.955237
1962-01-03,4.98716,5.03292,4.98716,5.03292,445175.034277
1962-01-04,5.03292,5.03292,4.98052,4.98052,399513.586679
1962-01-05,4.97389,4.97389,4.87511,4.88166,559321.480565
1962-01-08,4.88166,4.88166,4.75059,4.78972,833273.771393


In [99]:
stocks.iloc[300]
stocks.loc['1962-01-03']
stocks.loc[pd.Timestamp(1962, 1, 3)]
stocks.loc['2010-04-13':'2013-11-25']
stocks.loc[pd.Timestamp(2010, 4, 13):pd.Timestamp(2013, 11, 25)]
stocks.truncate('2010-04-13','2013-11-25')

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-04-13,84.2767,85.0419,83.9908,84.7773,1.037824e+07
2010-04-14,85.2351,86.3632,85.0565,86.2335,1.300633e+07
2010-04-15,85.7553,86.1651,85.5503,85.9923,9.776484e+06
2010-04-16,85.8636,86.8307,85.5854,85.8187,1.453137e+07
2010-04-19,85.6703,86.8990,85.6703,86.8550,1.715547e+07
...,...,...,...,...,...
2013-11-19,127.3670,128.4510,127.0350,127.7970,6.634746e+06
2013-11-20,127.7680,128.4700,127.3770,127.7380,5.285105e+06
2013-11-21,128.0020,128.1580,126.5190,127.0170,6.464296e+06
2013-11-22,126.5970,127.6120,124.1080,125.0940,1.103185e+07


## The DateOffset Object
- A **DateOffset** object adds time to a **Timestamp** to arrive at a new **Timestamp**.
- The **DateOffset** constructor accepts `days`, `weeks`, `months`, `years` parameters, and more.
- We can pass a **DateOffset** object to the `freq` parameter of the `pd.date_range` function.

In [111]:
stocks= pd.read_csv('ibm.csv', parse_dates=['Date'], date_format='%Y-%m-%d', index_col= 'Date').sort_index()
stocks.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-02,5.0461,5.0461,4.98716,4.98716,593562.955237
1962-01-03,4.98716,5.03292,4.98716,5.03292,445175.034277
1962-01-04,5.03292,5.03292,4.98052,4.98052,399513.586679
1962-01-05,4.97389,4.97389,4.87511,4.88166,559321.480565
1962-01-08,4.88166,4.88166,4.75059,4.78972,833273.771393


In [102]:
stocks.index

DatetimeIndex(['1962-01-02', '1962-01-03', '1962-01-04', '1962-01-05',
               '1962-01-08', '1962-01-09', '1962-01-10', '1962-01-11',
               '1962-01-12', '1962-01-15',
               ...
               '2023-09-28', '2023-09-29', '2023-10-02', '2023-10-03',
               '2023-10-04', '2023-10-05', '2023-10-06', '2023-10-09',
               '2023-10-10', '2023-10-11'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

In [110]:
stocks.index + pd.DateOffset(days= 5)
stocks.index - pd.DateOffset(days= 5)
stocks.index - pd.DateOffset(months= 5)
stocks.index - pd.DateOffset(years= 3)
stocks.index + pd.DateOffset(hours= 3)
stocks.index - pd.DateOffset(years= 1, months= 2, hours= 1.57)

DatetimeIndex(['1960-11-01 22:25:48', '1960-11-02 22:25:48',
               '1960-11-03 22:25:48', '1960-11-04 22:25:48',
               '1960-11-07 22:25:48', '1960-11-08 22:25:48',
               '1960-11-09 22:25:48', '1960-11-10 22:25:48',
               '1960-11-11 22:25:48', '1960-11-14 22:25:48',
               ...
               '2022-07-27 22:25:48', '2022-07-28 22:25:48',
               '2022-08-01 22:25:48', '2022-08-02 22:25:48',
               '2022-08-03 22:25:48', '2022-08-04 22:25:48',
               '2022-08-05 22:25:48', '2022-08-08 22:25:48',
               '2022-08-09 22:25:48', '2022-08-10 22:25:48'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

In [118]:
stocks.index.month.unique()

Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], dtype='int32', name='Date')

In [122]:
# Find the IBM Stock Price at every one of my birthdays (November 5, 2001)

stocks.loc[ (stocks.index.day == 5) & (stocks.index.month == 11) & (stocks.index.year >= 2001)]

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2001-11-05,72.2475,72.6691,71.6746,72.2631,8673211.0
2002-11-05,53.7128,53.8836,52.8725,53.6679,15800260.0
2003-11-05,58.1611,58.3972,57.8214,58.1445,9581367.0
2004-11-05,60.7112,61.445,60.7112,61.2899,10210090.0
2007-11-05,75.6291,75.6291,74.1427,74.5087,10888740.0
2008-11-05,61.0566,61.3659,58.9359,59.093,13250010.0
2009-11-05,80.2383,81.1459,80.1437,80.8795,10179210.0
2010-11-05,95.9136,96.548,95.7125,96.5382,7603387.0
2012-11-05,129.593,131.359,129.369,130.812,4061326.0
2013-11-05,123.162,123.376,121.941,122.059,8894517.0


In [128]:
# antoher way to solve it
birthdays_datetimeindexes= pd.Series(pd.date_range(start= '2001-11-05', end= '2025-01-01', freq= pd.DateOffset(years=1)))
stocks.loc[stocks.index.isin(birthdays_datetimeindexes)]

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2001-11-05,72.2475,72.6691,71.6746,72.2631,8673211.0
2002-11-05,53.7128,53.8836,52.8725,53.6679,15800260.0
2003-11-05,58.1611,58.3972,57.8214,58.1445,9581367.0
2004-11-05,60.7112,61.445,60.7112,61.2899,10210090.0
2007-11-05,75.6291,75.6291,74.1427,74.5087,10888740.0
2008-11-05,61.0566,61.3659,58.9359,59.093,13250010.0
2009-11-05,80.2383,81.1459,80.1437,80.8795,10179210.0
2010-11-05,95.9136,96.548,95.7125,96.5382,7603387.0
2012-11-05,129.593,131.359,129.369,130.812,4061326.0
2013-11-05,123.162,123.376,121.941,122.059,8894517.0


## Specialized Date Offsets
- Pandas nests more specialized date offsets in `pd.tseries.offsets`.
- We can add a different amount of time to each date (for example, month end, quarter end, year begin)

In [1]:
import pandas as pd

In [2]:
stocks= pd.read_csv('ibm.csv', parse_dates=['Date'], index_col='Date').sort_index()
stocks.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-02,5.0461,5.0461,4.98716,4.98716,593562.955237
1962-01-03,4.98716,5.03292,4.98716,5.03292,445175.034277
1962-01-04,5.03292,5.03292,4.98052,4.98052,399513.586679
1962-01-05,4.97389,4.97389,4.87511,4.88166,559321.480565
1962-01-08,4.88166,4.88166,4.75059,4.78972,833273.771393


In [13]:
stocks.index + pd.tseries.offsets.MonthEnd() # adding days until get the end of each specific month
stocks.index - pd.tseries.offsets.MonthEnd() # subtracting days until get the end of the very previous month

stocks.index - pd.tseries.offsets.QuarterEnd() # will round the dates until the end of the nearest previous quarter respective to that date
stocks.index + pd.tseries.offsets.QuarterEnd() # will round the dates until the end of the nearest next quarter respective to that date

stocks.index + pd.tseries.offsets.QuarterBegin(startingMonth= 1) # will round the dates until the nearest next quarter begining of the year (we specify starting month 1 to get april as a quarter begin, not march)
stocks.index - pd.tseries.offsets.QuarterBegin(startingMonth= 1) # will round the dates until the nearest previous quarter begining of the year corresponding to each date

stocks.index + pd.tseries.offsets.YearEnd() # will round the dates until the end of the corresponding year
stocks.index - pd.tseries.offsets.YearEnd() # will round the dates until the end of the previous corresponding year

DatetimeIndex(['1961-12-31', '1961-12-31', '1961-12-31', '1961-12-31',
               '1961-12-31', '1961-12-31', '1961-12-31', '1961-12-31',
               '1961-12-31', '1961-12-31',
               ...
               '2022-12-31', '2022-12-31', '2022-12-31', '2022-12-31',
               '2022-12-31', '2022-12-31', '2022-12-31', '2022-12-31',
               '2022-12-31', '2022-12-31'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

## Timedeltas
- A **Timedelta** is a pandas object that represents a duration (an amount of time).
- Subtracting two **Timestamp** objects will yield a **Timedelta** object (this applies to subtracting a **Series** from another **Series**).
- The **Timedelta** constructor accepts parameters for time as well as string descriptions.

In [22]:
pd.Timestamp('2025-01-29') - pd.Timestamp('2024-01-31')
pd.Timestamp(2025,12,31, 12, 43, 23) - pd.Timestamp(2032,3,4)

pd.Timedelta(days= 3, hours= 31, minutes= 12, seconds= 43, milliseconds= 31, microseconds= 12)
pd.Timedelta('3 days 5 hours 49.342 minutes')

Timedelta('3 days 05:49:20.520000')

In [44]:
ecommerce= pd.read_csv('ecommerce.csv', parse_dates=['order_date', 'delivery_date'], date_format= '%m/%d/%y', index_col= 'ID').sort_index()
ecommerce.head()

Unnamed: 0_level_0,order_date,delivery_date
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1998-05-24,1999-02-05
2,1992-04-22,1998-03-06
4,1991-02-10,1992-08-26
5,1992-07-21,1997-11-20
7,1993-09-02,1998-06-10


In [40]:
ecommerce.dtypes

ID                        int64
order_date       datetime64[ns]
delivery_date    datetime64[ns]
dtype: object

In [53]:
ecommerce['delivery_time'] = (ecommerce['delivery_date'] - ecommerce['order_date'])#.apply(lambda value: value/pd.Timedelta(days=365)) # to see, for example, result in years
ecommerce.head()

Unnamed: 0_level_0,order_date,delivery_date,delivery_time
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1998-05-24,1999-02-05,257 days
2,1992-04-22,1998-03-06,2144 days
4,1991-02-10,1992-08-26,563 days
5,1992-07-21,1997-11-20,1948 days
7,1993-09-02,1998-06-10,1742 days


In [54]:
ecommerce['order_date'] + ecommerce['delivery_time']

ID
1     1999-02-05
2     1998-03-06
4     1992-08-26
5     1997-11-20
7     1998-06-10
         ...    
990   1996-02-02
991   1998-03-30
993   1998-04-27
994   1993-06-13
997   1991-10-03
Length: 501, dtype: datetime64[ns]

In [57]:
ecommerce['delivery_time'].max()
ecommerce['delivery_time'].min()
ecommerce['delivery_time'].mean()

Timedelta('1217 days 22:53:53.532934128')