# Working with Dates and Times

In [2]:
import pandas as pd
import datetime as dt

## Review of Python's datetime Module
- The `datetime` module is built into the core Python programming language.
- The common alias for the `datetime` module is `dt`.
- A module is a Python source file; think of like an internal library that Python loads on demand.
- The `datetime` module includes `date` and `datetime` classes for representing dates and datetimes.
- The `date` constructor accepts arguments for year, month, and day. Python defaults to 0 for any missing values.
- The `datetime` constructor accepts arguments for year, month, day, hour, minute, and second.

In [3]:
today = dt.date(2024, 5, 25)

In [4]:
today.year

2024

## The Timestamp and DatetimeIndex Objects

- Pandas ships with several classes related to datetimes.
- The **Timestamp** is similar to Python's **datetime** object (but with expanded functionality).
- A **DatetimeIndex** is an index of **Timestamp** objects.
- The **Timestamp** constructor accepts a string, a **datetime** object, or equivalent arguments to the **datetime** clas.

## Create Range of Dates with pd.date_range Function
- The `date_range` function generates and returns a **DatetimeIndex** holding a sequence of dates.
- The function requires 2 of the 3 following parameters: `start`, `end`, and `period`.
- With `start` and `end`, Pandas will assume a daily period/interval.
- Every element within a **DatetimeIndex** is a **Timestamp**.

## The dt Attribute
- The `dt` attribute reveals a `DatetimeProperties` object with attributes/methods for working with datetimes. It is similar to the `str` attribute for string methods.
- The `DatetimeProperties` object has attributes like `day`, `month`, and `year` to reveal information about each date in the **Series**.
- The `day_name` method returns the written day of the week.
- Attributes like `is_month_end` and `is_quarter_start` return Boolean **Series**.

## Selecting Rows from a DataFrame with a DateTimeIndex
- The `iloc` accessor is available for index position-based extraction.
- The `loc` accessor accepts strings or **Timestamps** to extract by index label/value. Note that Python's `datetime` objects will not work.
- Use list slicing to extract a sequence of dates. The `truncate` method is another alternative.

In [8]:
ibm = pd.read_csv("ibm.csv", parse_dates=["Date"], index_col="Date").sort_index()

In [10]:
ibm.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-02,5.0461,5.0461,4.98716,4.98716,593562.955237
1962-01-03,4.98716,5.03292,4.98716,5.03292,445175.034277
1962-01-04,5.03292,5.03292,4.98052,4.98052,399513.586679
1962-01-05,4.97389,4.97389,4.87511,4.88166,559321.480565
1962-01-08,4.88166,4.88166,4.75059,4.78972,833273.771393


In [12]:
ibm.iloc[32]
ibm.loc["1962-01-08"]

Open           4.881660
High           4.881660
Low            4.750590
Close          4.789720
Volume    833273.771393
Name: 1962-01-08 00:00:00, dtype: float64

In [13]:
ibm.loc["1962-01-08":"1967-01-08"]

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-08,4.88166,4.88166,4.75059,4.78972,833273.771393
1962-01-09,4.81618,4.90820,4.81618,4.84867,753373.923140
1962-01-10,4.85531,4.88166,4.85531,4.85531,456586.809823
1962-01-11,4.86878,4.90820,4.86878,4.90820,490831.358513
1962-01-12,4.92119,4.95417,4.92119,4.92119,673473.050214
...,...,...,...,...,...
1966-12-30,6.03832,6.09726,6.03832,6.07775,334831.146942
1967-01-03,6.07775,6.13015,6.03159,6.03159,347002.204672
1967-01-04,6.03159,6.04496,5.92638,5.96561,773158.296153
1967-01-05,5.96561,6.02495,5.96561,6.02495,474856.717158


In [14]:
ibm.truncate("1962-01-08", "1967-01-08") #same as slice

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-08,4.88166,4.88166,4.75059,4.78972,833273.771393
1962-01-09,4.81618,4.90820,4.81618,4.84867,753373.923140
1962-01-10,4.85531,4.88166,4.85531,4.85531,456586.809823
1962-01-11,4.86878,4.90820,4.86878,4.90820,490831.358513
1962-01-12,4.92119,4.95417,4.92119,4.92119,673473.050214
...,...,...,...,...,...
1966-12-30,6.03832,6.09726,6.03832,6.07775,334831.146942
1967-01-03,6.07775,6.13015,6.03159,6.03159,347002.204672
1967-01-04,6.03159,6.04496,5.92638,5.96561,773158.296153
1967-01-05,5.96561,6.02495,5.96561,6.02495,474856.717158


In [20]:
ibm.loc["1966-06-23", ["High", "Low", "Close"]] # you can also select a row with the first argument and then the columns with the subsequent arguments

High     5.93321
Low      5.82157
Close    5.82157
Name: 1966-06-23 00:00:00, dtype: float64

In [21]:
ibm = pd.read_csv("ibm.csv", parse_dates=["Date"], index_col="Date").sort_index()
ibm.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-02,5.0461,5.0461,4.98716,4.98716,593562.955237
1962-01-03,4.98716,5.03292,4.98716,5.03292,445175.034277
1962-01-04,5.03292,5.03292,4.98052,4.98052,399513.586679
1962-01-05,4.97389,4.97389,4.87511,4.88166,559321.480565
1962-01-08,4.88166,4.88166,4.75059,4.78972,833273.771393


In [25]:
ibm.index + pd.DateOffset(days=5) # adds five days to each date.  You can also subtract. 

DatetimeIndex(['1962-01-07', '1962-01-08', '1962-01-09', '1962-01-10',
               '1962-01-13', '1962-01-14', '1962-01-15', '1962-01-16',
               '1962-01-17', '1962-01-20',
               ...
               '2023-10-03', '2023-10-04', '2023-10-07', '2023-10-08',
               '2023-10-09', '2023-10-10', '2023-10-11', '2023-10-14',
               '2023-10-15', '2023-10-16'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

In [26]:
ibm.index + pd.DateOffset(years=1, months=2, days=3, hours=4, minutes=5, seconds=6) #You can also do days, weeks, months or years

DatetimeIndex(['1963-03-05 04:05:06', '1963-03-06 04:05:06',
               '1963-03-07 04:05:06', '1963-03-08 04:05:06',
               '1963-03-11 04:05:06', '1963-03-12 04:05:06',
               '1963-03-13 04:05:06', '1963-03-14 04:05:06',
               '1963-03-15 04:05:06', '1963-03-18 04:05:06',
               ...
               '2024-12-01 04:05:06', '2024-12-02 04:05:06',
               '2024-12-05 04:05:06', '2024-12-06 04:05:06',
               '2024-12-07 04:05:06', '2024-12-08 04:05:06',
               '2024-12-09 04:05:06', '2024-12-12 04:05:06',
               '2024-12-13 04:05:06', '2024-12-14 04:05:06'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

In [28]:
#find IBM stock price on a specific date - June 23
bday = pd.date_range("1966-06-23", end="2023-06-23", freq=pd.DateOffset(years=1))

DatetimeIndex(['1966-06-23', '1967-06-23', '1968-06-23', '1969-06-23',
               '1970-06-23', '1971-06-23', '1972-06-23', '1973-06-23',
               '1974-06-23', '1975-06-23', '1976-06-23', '1977-06-23',
               '1978-06-23', '1979-06-23', '1980-06-23', '1981-06-23',
               '1982-06-23', '1983-06-23', '1984-06-23', '1985-06-23',
               '1986-06-23', '1987-06-23', '1988-06-23', '1989-06-23',
               '1990-06-23', '1991-06-23', '1992-06-23', '1993-06-23',
               '1994-06-23', '1995-06-23', '1996-06-23', '1997-06-23',
               '1998-06-23', '1999-06-23', '2000-06-23', '2001-06-23',
               '2002-06-23', '2003-06-23', '2004-06-23', '2005-06-23',
               '2006-06-23', '2007-06-23', '2008-06-23', '2009-06-23',
               '2010-06-23', '2011-06-23', '2012-06-23', '2013-06-23',
               '2014-06-23', '2015-06-23', '2016-06-23', '2017-06-23',
               '2018-06-23', '2019-06-23', '2020-06-23', '2021-06-23',
      

In [55]:
ibm = pd.read_csv("ibm.csv", parse_dates=["Date"], index_col="Date").sort_index()
bday = pd.date_range(start="1966-06-23", end="2023-06-23", freq="QS") #extract quarterly dates 
print(bday)

DatetimeIndex(['1966-07-01', '1966-10-01', '1967-01-01', '1967-04-01',
               '1967-07-01', '1967-10-01', '1968-01-01', '1968-04-01',
               '1968-07-01', '1968-10-01',
               ...
               '2021-01-01', '2021-04-01', '2021-07-01', '2021-10-01',
               '2022-01-01', '2022-04-01', '2022-07-01', '2022-10-01',
               '2023-01-01', '2023-04-01'],
              dtype='datetime64[ns]', length=228, freq='QS-JAN')


## Specialized Date Offsets
- Pandas nests more specialized date offsets in `pd.tseries.offsets`.
- We can add a different amount of time to each date (for example, month end, quarter end, year begin)

In [56]:
ibm = pd.read_csv("ibm.csv", parse_dates=["Date"], index_col="Date").sort_index()
ibm.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-02,5.0461,5.0461,4.98716,4.98716,593562.955237
1962-01-03,4.98716,5.03292,4.98716,5.03292,445175.034277
1962-01-04,5.03292,5.03292,4.98052,4.98052,399513.586679
1962-01-05,4.97389,4.97389,4.87511,4.88166,559321.480565
1962-01-08,4.88166,4.88166,4.75059,4.78972,833273.771393


In [57]:
ibm.index + pd.tseries.offsets.MonthEnd()#shifts all dates to monthend

DatetimeIndex(['1962-01-31', '1962-01-31', '1962-01-31', '1962-01-31',
               '1962-01-31', '1962-01-31', '1962-01-31', '1962-01-31',
               '1962-01-31', '1962-01-31',
               ...
               '2023-09-30', '2023-09-30', '2023-10-31', '2023-10-31',
               '2023-10-31', '2023-10-31', '2023-10-31', '2023-10-31',
               '2023-10-31', '2023-10-31'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

In [59]:
ibmquarter = ibm.index + pd.tseries.offsets.QuarterEnd()#shifts all dates to monthend

In [60]:
ibmquarter

DatetimeIndex(['1962-03-31', '1962-03-31', '1962-03-31', '1962-03-31',
               '1962-03-31', '1962-03-31', '1962-03-31', '1962-03-31',
               '1962-03-31', '1962-03-31',
               ...
               '2023-09-30', '2023-09-30', '2023-12-31', '2023-12-31',
               '2023-12-31', '2023-12-31', '2023-12-31', '2023-12-31',
               '2023-12-31', '2023-12-31'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

## Timedeltas
- A **Timedelta** is a pandas object that represents a duration (an amount of time).
- Subtracting two **Timestamp** objects will yield a **Timedelta** object (this applies to subtracting a **Series** from another **Series**).
- The **Timedelta** constructor accepts parameters for time as well as string descriptions.

In [62]:
# find the number of days in a date range
pd.Timestamp("2023-06-23") - pd.Timestamp("2023-01-01")

Timedelta('173 days 00:00:00')

In [63]:
pd.Timedelta(days=3, hours=2, minutes=5)

Timedelta('3 days 02:05:00')

In [70]:
ecommerce = pd.read_csv("ecommerce.csv", index_col="ID", parse_dates=["order_date", "delivery_date"], date_format="%m/%d/%y")
ecommerce.head()

Unnamed: 0_level_0,order_date,delivery_date
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1998-05-24,1999-02-05
2,1992-04-22,1998-03-06
4,1991-02-10,1992-08-26
5,1992-07-21,1997-11-20
7,1993-09-02,1998-06-10


In [76]:
ecommerce["delivery days"] = ecommerce["delivery_date"] - ecommerce["order_date"]
ecommerce.head()

Unnamed: 0_level_0,order_date,delivery_date,days,delivery days
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1998-05-24,1999-02-05,257 days,257 days
2,1992-04-22,1998-03-06,2144 days,2144 days
4,1991-02-10,1992-08-26,563 days,563 days
5,1992-07-21,1997-11-20,1948 days,1948 days
7,1993-09-02,1998-06-10,1742 days,1742 days


In [86]:
ecommerce["delivery days"].max()
ecommerce["delivery days"].min()
ecommerce["delivery days"].std()
avg_delivery_days = ecommerce["delivery days"].mean()
avg_delivery_days

Timedelta('1217 days 22:53:53.532934128')