# 9. Working with Dates and Times I


In [1]:
import pandas as pd
import datetime as dt

### Table of Contents

1. Review of Python's datetime Module
2. The Timestamp and DatetimeIndex Objects
3. Create Range of Dates with pd.date_range Function
4. The dt Attribute
5. Selecting Rows from a DataFrame with a DateTimeIndex
6. The DateOffset Object
7. Specialized Date Offsets

## 1. Review of Python's datetime Module

- The `datetime` module is built into the core Python programming language.
- The common alias for the `datetime` module is `dt`.
- A module is a Python source file; think of it like an internal library that Python loads on demand.
- The `datetime` module includes `date` and `datetime` classes for representing dates and datetimes.
- The `date` constructor accepts arguments for year, month, and day. Python defaults to 0 for any missing values.
- The `datetime` constructor accepts arguments for year, month, day, hour, minute, and second.

In [2]:
someday = dt.date(1990, 4, 27)

In [3]:
someday.year
someday.day
someday.month

4

In [4]:
str(someday)

'1990-04-27'

In [5]:
str(dt.datetime(2010, 1, 10, 8, 13, 57))

'2010-01-10 08:13:57'

In [6]:
sometime =dt.datetime(2018,4,14,8,50,44)

In [7]:
sometime.day
sometime.month
sometime.year
sometime.hour
sometime.minute

50

## 2. The Timestamp and DatetimeIndex Objects

- Pandas ships with several classes related to datetimes.
- The **Timestamp** is similar to Python's `datetime` object (but with expanded functionality).
- A **DatetimeIndex** is an index of **Timestamp** objects.
- The **Timestamp** constructor accepts a string, a `datetime` object, or equivalent arguments to the `datetime` class.


In [8]:
pd.Timestamp('2015-03-31')
pd.Timestamp('2015/03/31')
pd.Timestamp('2015, 3, 31')
pd.Timestamp('1,1,2010')
pd.Timestamp('19/12/2015')
pd.Timestamp('2010-03-08 08:14:58')
pd.Timestamp('2021-04-07 06:14:50 PM')

Timestamp('2021-04-07 18:14:50')

In [9]:
pd.Timestamp(dt.date(2015, 1,1))

Timestamp('2015-01-01 00:00:00')

In [10]:
pd.Timestamp(dt.datetime(2000, 2, 3, 21, 35, 59))

Timestamp('2000-02-03 21:35:59')

In [11]:
dates = ['2015-1-6','2015-5-6','2015-2-2']
dtIndex = pd.DatetimeIndex(dates)

In [12]:
values= [100,200,400]

In [13]:
pd.Series(data = values , index=dtIndex)

2015-01-06    100
2015-05-06    200
2015-02-02    400
dtype: int64

## 3. Create Range of Dates with pd.date_range Function

- The `date_range` function generates and returns a **DatetimeIndex** holding a sequence of dates.
- The function requires 2 of the 3 following parameters: `start`, `end`, and `period`.
- With `start` and `end`, Pandas will assume a daily period/interval.
- Every element within a **DatetimeIndex** is a **Timestamp**.


In [14]:
times = pd.date_range(start='2016', end='2018',freq='D')

In [15]:
type(times[0])

pandas._libs.tslibs.timestamps.Timestamp

In [16]:
times = pd.date_range(start='2016', end='2018',freq='w')

  times = pd.date_range(start='2016', end='2018',freq='w')


In [17]:
times

DatetimeIndex(['2016-01-03', '2016-01-10', '2016-01-17', '2016-01-24',
               '2016-01-31', '2016-02-07', '2016-02-14', '2016-02-21',
               '2016-02-28', '2016-03-06',
               ...
               '2017-10-29', '2017-11-05', '2017-11-12', '2017-11-19',
               '2017-11-26', '2017-12-03', '2017-12-10', '2017-12-17',
               '2017-12-24', '2017-12-31'],
              dtype='datetime64[ns]', length=105, freq='W-SUN')

In [18]:
pd.date_range(start='2016', end='2018',freq='w-FRI')

DatetimeIndex(['2016-01-01', '2016-01-08', '2016-01-15', '2016-01-22',
               '2016-01-29', '2016-02-05', '2016-02-12', '2016-02-19',
               '2016-02-26', '2016-03-04',
               ...
               '2017-10-27', '2017-11-03', '2017-11-10', '2017-11-17',
               '2017-11-24', '2017-12-01', '2017-12-08', '2017-12-15',
               '2017-12-22', '2017-12-29'],
              dtype='datetime64[ns]', length=105, freq='W-FRI')

In [19]:
pd.date_range(start='2016', end='2018',freq='5H')

  pd.date_range(start='2016', end='2018',freq='5H')


DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 05:00:00',
               '2016-01-01 10:00:00', '2016-01-01 15:00:00',
               '2016-01-01 20:00:00', '2016-01-02 01:00:00',
               '2016-01-02 06:00:00', '2016-01-02 11:00:00',
               '2016-01-02 16:00:00', '2016-01-02 21:00:00',
               ...
               '2017-12-29 23:00:00', '2017-12-30 04:00:00',
               '2017-12-30 09:00:00', '2017-12-30 14:00:00',
               '2017-12-30 19:00:00', '2017-12-31 00:00:00',
               '2017-12-31 05:00:00', '2017-12-31 10:00:00',
               '2017-12-31 15:00:00', '2017-12-31 20:00:00'],
              dtype='datetime64[ns]', length=3509, freq='5h')

In [20]:
pd.date_range(start='2016', end='2018',freq='M')

  pd.date_range(start='2016', end='2018',freq='M')


DatetimeIndex(['2016-01-31', '2016-02-29', '2016-03-31', '2016-04-30',
               '2016-05-31', '2016-06-30', '2016-07-31', '2016-08-31',
               '2016-09-30', '2016-10-31', '2016-11-30', '2016-12-31',
               '2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30',
               '2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31',
               '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31'],
              dtype='datetime64[ns]', freq='ME')

In [21]:
pd.date_range(start='2016', end='2018',freq='A')


  pd.date_range(start='2016', end='2018',freq='A')


DatetimeIndex(['2016-12-31', '2017-12-31'], dtype='datetime64[ns]', freq='YE-DEC')

In [22]:
pd.date_range(start='2014', periods=25)

DatetimeIndex(['2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04',
               '2014-01-05', '2014-01-06', '2014-01-07', '2014-01-08',
               '2014-01-09', '2014-01-10', '2014-01-11', '2014-01-12',
               '2014-01-13', '2014-01-14', '2014-01-15', '2014-01-16',
               '2014-01-17', '2014-01-18', '2014-01-19', '2014-01-20',
               '2014-01-21', '2014-01-22', '2014-01-23', '2014-01-24',
               '2014-01-25'],
              dtype='datetime64[ns]', freq='D')

## 4. The dt Attribute

- The `dt` attribute reveals a **DatetimeProperties** object with attributes/methods for working with datetimes. It is similar to the `str` attribute for string methods.
- The **DatetimeProperties** object has attributes like `day`, `month`, and `year` to reveal information about each date in the **Series**.
- The `day_name` method returns the written day of the week.
- Attributes like `is_month_end` and `is_quarter_start` return Boolean **Series**.


In [23]:
bunch_of_dates = pd.Series(pd.date_range(start='2000',end='2016', freq='2D'))

In [24]:
bunch_of_dates.head()

0   2000-01-01
1   2000-01-03
2   2000-01-05
3   2000-01-07
4   2000-01-09
dtype: datetime64[ns]

In [25]:
bunch_of_dates.dt.day

0        1
1        3
2        5
3        7
4        9
        ..
2918    24
2919    26
2920    28
2921    30
2922     1
Length: 2923, dtype: int32

In [26]:
bunch_of_dates.dt.month # Boolean Series
bunch_of_dates.dt.day
bunch_of_dates.dt.month
bunch_of_dates.dt.year
bunch_of_dates.dt.hour
bunch_of_dates.dt.day_of_year
bunch_of_dates.dt.day_name()
bunch_of_dates.dt.is_month_end
bunch_of_dates.dt.is_month_start
bunch_of_dates.dt.is_quarter_start
bunch_of_dates[bunch_of_dates.dt.is_quarter_start] # and subset 

0      2000-01-01
91     2000-07-01
137    2000-10-01
183    2001-01-01
228    2001-04-01
456    2002-07-01
502    2002-10-01
548    2003-01-01
593    2003-04-01
776    2004-04-01
1004   2005-07-01
1050   2005-10-01
1096   2006-01-01
1141   2006-04-01
1369   2007-07-01
1415   2007-10-01
1461   2008-01-01
1552   2008-07-01
1598   2008-10-01
1644   2009-01-01
1689   2009-04-01
1917   2010-07-01
1963   2010-10-01
2009   2011-01-01
2054   2011-04-01
2237   2012-04-01
2465   2013-07-01
2511   2013-10-01
2557   2014-01-01
2602   2014-04-01
2830   2015-07-01
2876   2015-10-01
2922   2016-01-01
dtype: datetime64[ns]

## 5. Selecting Rows from a DataFrame with a DateTimeIndex

- The `iloc` accessor is available for index position-based extraction.
- The `loc` accessor accepts strings or **Timestamps** to extract by index label/value. Note that Python's `datetime` objects will not work.
- Use list slicing to extract a sequence of dates. The `truncate` method is another alternative.


In [27]:
stocks = pd.read_csv("ibm.csv", parse_dates=["Date"],index_col="Date").sort_index()
stocks.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,129.063095,129.942642,128.843216,129.46463,105.346336,3293436
2020-01-03,127.695984,128.92926,127.686424,128.432129,104.506195,2482890
2020-01-06,127.552582,128.336517,127.342255,128.202682,104.319489,2537073
2020-01-07,127.810707,129.024857,127.533463,128.288712,104.389488,3232977
2020-01-08,128.59465,129.885284,128.030594,129.359467,105.260765,4545916


In [28]:
stocks.iloc[300]
stocks.loc["2020-01-07"]
stocks.loc[pd.Timestamp(2020, 1, 8)]
stocks.loc["2020-01-07":"2020-12-31"]
stocks.loc[pd.Timestamp(2020, 1, 7):pd.Timestamp(2020, 12, 31)]
stocks.truncate("2020-01-07", "2020-12-31")

stocks.loc["2020-01-07", "Close"]
stocks.loc["2020-01-07", "High":"Close"]

stocks.loc[pd.Timestamp(2020, 1, 7):pd.Timestamp(2020, 12, 31), "High":"Close"]

Unnamed: 0_level_0,High,Low,Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-07,129.024857,127.533463,128.288712
2020-01-08,129.885284,128.030594,129.359467
2020-01-09,130.774384,129.359467,130.726578
2020-01-10,131.806885,130.315491,130.678772
2020-01-13,130.630981,129.130020,130.592728
...,...,...,...
2020-12-24,119.598473,118.747612,119.206497
2020-12-28,121.032501,118.986618,119.330788
2020-12-29,119.961761,117.820267,118.355644
2020-12-30,119.359467,118.193115,118.871895


## 6. The DateOffset Object

- A **DateOffset** object adds time to a **Timestamp** to arrive at a new **Timestamp**.
- The **DateOffset** constructor accepts `days`, `weeks`, `months`, `years` parameters, and more.
- We can pass a **DateOffset** object to the `freq` parameter of the `pd.date_range` function.


In [29]:
stocks = pd.read_csv("ibm.csv", parse_dates=["Date"],index_col="Date").sort_index()
stocks.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,129.063095,129.942642,128.843216,129.46463,105.346336,3293436
2020-01-03,127.695984,128.92926,127.686424,128.432129,104.506195,2482890
2020-01-06,127.552582,128.336517,127.342255,128.202682,104.319489,2537073
2020-01-07,127.810707,129.024857,127.533463,128.288712,104.389488,3232977
2020-01-08,128.59465,129.885284,128.030594,129.359467,105.260765,4545916


In [30]:
stocks.index

DatetimeIndex(['2020-01-02', '2020-01-03', '2020-01-06', '2020-01-07',
               '2020-01-08', '2020-01-09', '2020-01-10', '2020-01-13',
               '2020-01-14', '2020-01-15',
               ...
               '2024-04-17', '2024-04-18', '2024-04-19', '2024-04-22',
               '2024-04-23', '2024-04-24', '2024-04-25', '2024-04-26',
               '2024-04-29', '2024-04-30'],
              dtype='datetime64[ns]', name='Date', length=1089, freq=None)

In [31]:
stocks.index + pd.DateOffset(days=5)
stocks.index - pd.DateOffset(days=5)
stocks.index + pd.DateOffset(months=3)
stocks.index + pd.DateOffset(years=1)
stocks.index + pd.DateOffset(hours=7)

stocks.index + pd.DateOffset(years=1, months=3, days=2, hours=14, minutes=23, seconds=12)

DatetimeIndex(['2021-04-04 14:23:12', '2021-04-05 14:23:12',
               '2021-04-08 14:23:12', '2021-04-09 14:23:12',
               '2021-04-10 14:23:12', '2021-04-11 14:23:12',
               '2021-04-12 14:23:12', '2021-04-15 14:23:12',
               '2021-04-16 14:23:12', '2021-04-17 14:23:12',
               ...
               '2025-07-19 14:23:12', '2025-07-20 14:23:12',
               '2025-07-21 14:23:12', '2025-07-24 14:23:12',
               '2025-07-25 14:23:12', '2025-07-26 14:23:12',
               '2025-07-27 14:23:12', '2025-07-28 14:23:12',
               '2025-07-31 14:23:12', '2025-08-01 14:23:12'],
              dtype='datetime64[ns]', name='Date', length=1089, freq=None)

In [32]:
# Find the IBM stock price on every one of my birthdays since 2020
birthdays = pd.date_range(start="2020-02-13", end="2024-02-13", freq=pd.DateOffset(years=1))
birthdays

DatetimeIndex(['2020-02-13', '2021-02-13', '2022-02-13', '2023-02-13',
               '2024-02-13'],
              dtype='datetime64[ns]', freq='<DateOffset: years=1>')

In [33]:
stocks[stocks.index.isin(birthdays)]

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-02-13,147.581268,148.021027,146.663483,147.523895,121.2948,3996452
2023-02-13,136.0,137.389999,135.850006,137.350006,131.261826,4403000
2024-02-13,184.279999,184.770004,182.360001,183.699997,183.699997,4290500


## 7. Specialized Date Offsets

- Pandas nests more specialized date offsets in `pd.tseries.offsets`.
- We can add a different amount of time to each date (for example, month end, quarter end, year begin).


In [34]:
stocks = pd.read_csv("ibm.csv", parse_dates=["Date"],index_col="Date").sort_index()
stocks.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-02,129.063095,129.942642,128.843216,129.46463,105.346336,3293436
2020-01-03,127.695984,128.92926,127.686424,128.432129,104.506195,2482890
2020-01-06,127.552582,128.336517,127.342255,128.202682,104.319489,2537073
2020-01-07,127.810707,129.024857,127.533463,128.288712,104.389488,3232977
2020-01-08,128.59465,129.885284,128.030594,129.359467,105.260765,4545916


In [35]:
stocks.index + pd.tseries.offsets.MonthEnd()
stocks.index - pd.tseries.offsets.MonthEnd()

stocks.index + pd.tseries.offsets.QuarterEnd()
stocks.index - pd.tseries.offsets.QuarterEnd()

stocks.index + pd.tseries.offsets.QuarterBegin(startingMonth=1)
stocks.index - pd.tseries.offsets.QuarterBegin(startingMonth=1)

stocks.index + pd.tseries.offsets.YearEnd()
stocks.index - pd.tseries.offsets.YearBegin()


DatetimeIndex(['2020-01-01', '2020-01-01', '2020-01-01', '2020-01-01',
               '2020-01-01', '2020-01-01', '2020-01-01', '2020-01-01',
               '2020-01-01', '2020-01-01',
               ...
               '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01',
               '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01',
               '2024-01-01', '2024-01-01'],
              dtype='datetime64[ns]', name='Date', length=1089, freq=None)

## 8. Timedeltas

- A **Timedelta** is a pandas object that represents a duration (an amount of time).
- Subtracting two **Timestamp** objects will yield a **Timedelta** object (this applies to subtracting a **Series** from another **Series**).
- The **Timedelta** constructor accepts parameters for time as well as string descriptions.


In [36]:
timeA = pd.Timestamp('2010-01-31')
timeB = pd.Timestamp('2016-01-31')
timeA - timeB

Timedelta('-2191 days +00:00:00')

In [37]:
type(timeA - timeB)

pandas._libs.tslibs.timedeltas.Timedelta

In [38]:
pd.Timedelta(days = 4, weeks = 8, hours = 12)

Timedelta('60 days 12:00:00')

In [39]:
pd.Timedelta("6 hours 12 minutes")

Timedelta('0 days 06:12:00')

In [40]:
shipping = pd.read_csv('ecommerce.csv', index_col = 'ID', parse_dates = ['order_date', 'delivery_date'])
shipping.head(3)

  shipping = pd.read_csv('ecommerce.csv', index_col = 'ID', parse_dates = ['order_date', 'delivery_date'])
  shipping = pd.read_csv('ecommerce.csv', index_col = 'ID', parse_dates = ['order_date', 'delivery_date'])


Unnamed: 0_level_0,order_date,delivery_date
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1998-05-24,1999-02-05
2,1992-04-22,1998-03-06
4,1991-02-10,1992-08-26


In [41]:
shipping['Delivery Time'] = shipping['delivery_date'] - shipping['order_date']

In [42]:
shipping.head(3)

Unnamed: 0_level_0,order_date,delivery_date,Delivery Time
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1998-05-24,1999-02-05,257 days
2,1992-04-22,1998-03-06,2144 days
4,1991-02-10,1992-08-26,563 days


In [43]:
shipping.dtypes

order_date        datetime64[ns]
delivery_date     datetime64[ns]
Delivery Time    timedelta64[ns]
dtype: object

In [44]:
mask = shipping['Delivery Time'] > '365 Days'
shipping[mask]

Unnamed: 0_level_0,order_date,delivery_date,Delivery Time
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2,1992-04-22,1998-03-06,2144 days
4,1991-02-10,1992-08-26,563 days
5,1992-07-21,1997-11-20,1948 days
7,1993-09-02,1998-06-10,1742 days
9,1990-01-25,1994-10-02,1711 days
...,...,...,...
986,1990-12-10,1992-12-16,737 days
990,1991-06-24,1996-02-02,1684 days
991,1991-09-09,1998-03-30,2394 days
993,1990-11-16,1998-04-27,2719 days
