# Working with Dates and Times

In [71]:
import pandas as pd
import datetime as dt

## Review of Python's datetime Module
- The `datetime` module is built into the core Python programming language.
- The common alias for the `datetime` module is `dt`.
- A module is a Python source file; think of like an internal library that Python loads on demand.
- The `datetime` module includes `date` and `datetime` classes for representing dates and datetimes.
- The `date` constructor accepts arguments for year, month, and day. Python defaults to 0 for any missing values.
- The `datetime` constructor accepts arguments for year, month, day, hour, minute, and second.

In [72]:
pd.Timestamp(2025,5,5)
pd.Timestamp(2025,5,5,15,30,00)

Timestamp('2025-05-05 15:30:00')

In [73]:
# We can also pass a python datetime object
pd.Timestamp(dt.date(2025,5,5))
pd.Timestamp(dt.datetime(2025,5,5,14,30))

Timestamp('2025-05-05 14:30:00')

In [74]:
pd.Timestamp("2025-05-05")
pd.Timestamp("2025/05/05")
pd.Timestamp("2025-05-05 15:30:20")

Timestamp('2025-05-05 15:30:20')

##### Loading `timestamp` object into a series

In [75]:
pd.Series(pd.Timestamp("2025-05-05 15:30:20"))

0   2025-05-05 15:30:20
dtype: datetime64[ns]

## The Timestamp and DatetimeIndex Objects

- Pandas ships with several classes related to datetimes.
- The **Timestamp** is similar to Python's **datetime** object (but with expanded functionality).
- A **DatetimeIndex** is an index of **Timestamp** objects.
- The **Timestamp** constructor accepts a string, a **datetime** object, or equivalent arguments to the **datetime** clas.

In [76]:
# When a timestamp is used as an index to a dataframe
pd.DatetimeIndex(["2025-05-05","2025-03-20","2025-01-10"])

DatetimeIndex(['2025-05-05', '2025-03-20', '2025-01-10'], dtype='datetime64[ns]', freq=None)

In [77]:
dt_index = pd.DatetimeIndex([dt.date(2025,3,20),
                  "2025-01-10"])
dt_index[0]
dt_index[1]

Timestamp('2025-01-10 00:00:00')

In [78]:
type(dt_index)

pandas.core.indexes.datetimes.DatetimeIndex

## Create Range of Dates with pd.date_range Function
- The `date_range` function generates and returns a **DatetimeIndex** holding a sequence of dates.
- The function requires 2 of the 3 following parameters: `start`, `end`, and `period`.
- With `start` and `end`, Pandas will assume a daily period/interval.
- Every element within a **DatetimeIndex** is a **Timestamp**.

In [79]:
# Generate a simple daterange spanning from 1stMay to 10thMay
pd.date_range(start="2025-05-01",end="2025-05-10",freq="D")
# Generate a daterange from 1st to 10thMay with every alternate day
pd.date_range(start="2025-05-01",end="2025-05-10",freq="2D")

DatetimeIndex(['2025-05-01', '2025-05-03', '2025-05-05', '2025-05-07',
               '2025-05-09'],
              dtype='datetime64[ns]', freq='2D')

In [80]:
# Generate a daterange consisting of only business days (weekdays/ working days) from 1stMay to 10thMay
pd.date_range(start="2025-05-01",end="2025-05-10",freq="B")
# The skipped days are the weekends

DatetimeIndex(['2025-05-01', '2025-05-02', '2025-05-05', '2025-05-06',
               '2025-05-07', '2025-05-08', '2025-05-09'],
              dtype='datetime64[ns]', freq='B')

In [81]:
# Generate a daterange consisting of weekly frequency --Defaults to Sunday but can be modified
pd.date_range(start="2025-05-01",end="2025-05-30",freq="W")

DatetimeIndex(['2025-05-04', '2025-05-11', '2025-05-18', '2025-05-25'], dtype='datetime64[ns]', freq='W-SUN')

In [82]:
pd.date_range(start="2025-05-01",end="2025-05-30",freq="W-FRI")

DatetimeIndex(['2025-05-02', '2025-05-09', '2025-05-16', '2025-05-23',
               '2025-05-30'],
              dtype='datetime64[ns]', freq='W-FRI')

In [83]:
# Generate a daterange consisting of hourly frequency --presenting more granular data ideally require for time-series data analysis

pd.date_range(start="2025-05-01",end="2025-05-10",freq="h")

pd.date_range(start="2025-05-01",end="2025-05-10",freq="4h")

DatetimeIndex(['2025-05-01 00:00:00', '2025-05-01 04:00:00',
               '2025-05-01 08:00:00', '2025-05-01 12:00:00',
               '2025-05-01 16:00:00', '2025-05-01 20:00:00',
               '2025-05-02 00:00:00', '2025-05-02 04:00:00',
               '2025-05-02 08:00:00', '2025-05-02 12:00:00',
               '2025-05-02 16:00:00', '2025-05-02 20:00:00',
               '2025-05-03 00:00:00', '2025-05-03 04:00:00',
               '2025-05-03 08:00:00', '2025-05-03 12:00:00',
               '2025-05-03 16:00:00', '2025-05-03 20:00:00',
               '2025-05-04 00:00:00', '2025-05-04 04:00:00',
               '2025-05-04 08:00:00', '2025-05-04 12:00:00',
               '2025-05-04 16:00:00', '2025-05-04 20:00:00',
               '2025-05-05 00:00:00', '2025-05-05 04:00:00',
               '2025-05-05 08:00:00', '2025-05-05 12:00:00',
               '2025-05-05 16:00:00', '2025-05-05 20:00:00',
               '2025-05-06 00:00:00', '2025-05-06 04:00:00',
               '2025-05-

In [84]:
# Generate a daterange consisting of monthly frequency consisting of month-starts
pd.date_range(start="2025-01-01",end="2025-12-31",freq="MS")

DatetimeIndex(['2025-01-01', '2025-02-01', '2025-03-01', '2025-04-01',
               '2025-05-01', '2025-06-01', '2025-07-01', '2025-08-01',
               '2025-09-01', '2025-10-01', '2025-11-01', '2025-12-01'],
              dtype='datetime64[ns]', freq='MS')

In [85]:
# Generate a daterange consisting of monthly frequency consisting of month-ends
pd.date_range(start="2025-01-01",end="2025-12-31",freq="ME")


DatetimeIndex(['2025-01-31', '2025-02-28', '2025-03-31', '2025-04-30',
               '2025-05-31', '2025-06-30', '2025-07-31', '2025-08-31',
               '2025-09-30', '2025-10-31', '2025-11-30', '2025-12-31'],
              dtype='datetime64[ns]', freq='ME')

In [86]:
# another way of generating the datespan -- using periods parameter
pd.date_range(start="2025-01-01",periods=25,freq="D")

DatetimeIndex(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04',
               '2025-01-05', '2025-01-06', '2025-01-07', '2025-01-08',
               '2025-01-09', '2025-01-10', '2025-01-11', '2025-01-12',
               '2025-01-13', '2025-01-14', '2025-01-15', '2025-01-16',
               '2025-01-17', '2025-01-18', '2025-01-19', '2025-01-20',
               '2025-01-21', '2025-01-22', '2025-01-23', '2025-01-24',
               '2025-01-25'],
              dtype='datetime64[ns]', freq='D')

In [87]:
# The periods param can also be used backwards
pd.date_range(end="2025-01-01",periods=25,freq="D")

DatetimeIndex(['2024-12-08', '2024-12-09', '2024-12-10', '2024-12-11',
               '2024-12-12', '2024-12-13', '2024-12-14', '2024-12-15',
               '2024-12-16', '2024-12-17', '2024-12-18', '2024-12-19',
               '2024-12-20', '2024-12-21', '2024-12-22', '2024-12-23',
               '2024-12-24', '2024-12-25', '2024-12-26', '2024-12-27',
               '2024-12-28', '2024-12-29', '2024-12-30', '2024-12-31',
               '2025-01-01'],
              dtype='datetime64[ns]', freq='D')

## The dt Attribute
- The `dt` attribute reveals a `DatetimeProperties` object with attributes/methods for working with datetimes. It is similar to the `str` attribute for string methods.
- The `DatetimeProperties` object has attributes like `day`, `month`, and `year` to reveal information about each date in the **Series**.
- The `day_name` method returns the written day of the week.
- Attributes like `is_month_end` and `is_quarter_start` return Boolean **Series**.

In [88]:
sample = pd.Series(pd.date_range(start="2025-01-01", end="2025-05-30",freq="10D 4h"))
sample

0    2025-01-01 00:00:00
1    2025-01-11 04:00:00
2    2025-01-21 08:00:00
3    2025-01-31 12:00:00
4    2025-02-10 16:00:00
5    2025-02-20 20:00:00
6    2025-03-03 00:00:00
7    2025-03-13 04:00:00
8    2025-03-23 08:00:00
9    2025-04-02 12:00:00
10   2025-04-12 16:00:00
11   2025-04-22 20:00:00
12   2025-05-03 00:00:00
13   2025-05-13 04:00:00
14   2025-05-23 08:00:00
dtype: datetime64[ns]

##### We can now access various components of the date-time object. This can be useful in filtering records or when applying functions

In [89]:
sample.dt.day
sample.dt.month
sample.dt.day_name()
sample.dt.month_name()

0      January
1      January
2      January
3      January
4     February
5     February
6        March
7        March
8        March
9        April
10       April
11       April
12         May
13         May
14         May
dtype: object

In [90]:
sample.dt.is_month_end
sample.dt.is_month_start


0      True
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
dtype: bool

In [91]:
sample[sample.dt.is_month_start]

0   2025-01-01
dtype: datetime64[ns]

In [92]:
sample[sample.dt.is_quarter_start]

0   2025-01-01
dtype: datetime64[ns]

## Selecting Rows from a DataFrame with a DateTimeIndex
- The `iloc` accessor is available for index position-based extraction.
- The `loc` accessor accepts strings or **Timestamps** to extract by index label/value. Note that Python's `datetime` objects will not work.
- Use list slicing to extract a sequence of dates. The `truncate` method is another alternative.

In [93]:
stocks = pd.read_csv("ibm.csv",parse_dates=["Date"],index_col=["Date"]).sort_index()
stocks.head(3)

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-02,5.0461,5.0461,4.98716,4.98716,593562.955237
1962-01-03,4.98716,5.03292,4.98716,5.03292,445175.034277
1962-01-04,5.03292,5.03292,4.98052,4.98052,399513.586679


In [94]:
stocks.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 15546 entries, 1962-01-02 to 2023-10-11
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Open    15546 non-null  float64
 1   High    15546 non-null  float64
 2   Low     15546 non-null  float64
 3   Close   15546 non-null  float64
 4   Volume  15545 non-null  float64
dtypes: float64(5)
memory usage: 728.7 KB


##### We can still fetch values purely based on index location using `iloc`. However directly using the dt accessors on indexes will be more helpful and also faster querying

In [95]:
stocks.iloc[200]

Open      3.199750e+00
High      3.213030e+00
Low       3.121300e+00
Close     3.121300e+00
Volume    1.917670e+06
Name: 1962-10-16 00:00:00, dtype: float64

In [96]:
# stocks.loc["2023-01-10"]
stocks.loc[pd.Timestamp(2023,1,10)]

Open          143.61
High          144.85
Low           142.90
Close         144.80
Volume    2152172.00
Name: 2023-01-10 00:00:00, dtype: float64

In [97]:
# we can also fetch over a range of dates
stocks.loc["2022-10-05":"2023-01-05"]

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2022-10-05,123.241,124.970,122.7660,124.258,3.251179e+06
2022-10-06,123.409,123.824,120.3350,120.790,5.135128e+06
2022-10-07,120.068,120.366,116.6790,117.420,4.553321e+06
2022-10-10,118.379,118.547,115.6610,116.363,6.061426e+06
2022-10-11,116.076,117.825,115.5620,116.412,4.091472e+06
...,...,...,...,...,...
2022-12-29,140.580,142.260,140.4500,141.060,2.337207e+06
2022-12-30,140.540,140.900,139.4500,140.890,2.858110e+06
2023-01-03,141.100,141.900,140.4800,141.550,3.338829e+06
2023-01-04,142.070,143.615,141.3675,142.600,3.869236e+06


In [98]:
stocks.loc["2022-10-05":"2023-01-05","Open"]
# to return as a dataframe
stocks.loc["2022-10-05":"2023-01-05",["Open"]]


Unnamed: 0_level_0,Open
Date,Unnamed: 1_level_1
2022-10-05,123.241
2022-10-06,123.409
2022-10-07,120.068
2022-10-10,118.379
2022-10-11,116.076
...,...
2022-12-29,140.580
2022-12-30,140.540
2023-01-03,141.100
2023-01-04,142.070


In [99]:
stocks.truncate("2022-10-05","2023-01-05")

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2022-10-05,123.241,124.970,122.7660,124.258,3.251179e+06
2022-10-06,123.409,123.824,120.3350,120.790,5.135128e+06
2022-10-07,120.068,120.366,116.6790,117.420,4.553321e+06
2022-10-10,118.379,118.547,115.6610,116.363,6.061426e+06
2022-10-11,116.076,117.825,115.5620,116.412,4.091472e+06
...,...,...,...,...,...
2022-12-29,140.580,142.260,140.4500,141.060,2.337207e+06
2022-12-30,140.540,140.900,139.4500,140.890,2.858110e+06
2023-01-03,141.100,141.900,140.4800,141.550,3.338829e+06
2023-01-04,142.070,143.615,141.3675,142.600,3.869236e+06


## The DateOffset Object
- A **DateOffset** object adds time to a **Timestamp** to arrive at a new **Timestamp**.
- The **DateOffset** constructor accepts `days`, `weeks`, `months`, `years` parameters, and more.
- We can pass a **DateOffset** object to the `freq` parameter of the `pd.date_range` function.

In [100]:
# Adding an offset of 2 days to indexes
stocks.index + pd.DateOffset(days=2)
# Moving back in time with an offset of 2 years
stocks.index - pd.DateOffset(years=2)
# Adding offset of 5hrs to time
stocks.index - pd.DateOffset(hours=6)
# Adding multiple parameters as offset
stocks.index + pd.DateOffset(years=1,months=2,hours=6)

DatetimeIndex(['1963-03-02 06:00:00', '1963-03-03 06:00:00',
               '1963-03-04 06:00:00', '1963-03-05 06:00:00',
               '1963-03-08 06:00:00', '1963-03-09 06:00:00',
               '1963-03-10 06:00:00', '1963-03-11 06:00:00',
               '1963-03-12 06:00:00', '1963-03-15 06:00:00',
               ...
               '2024-11-28 06:00:00', '2024-11-29 06:00:00',
               '2024-12-02 06:00:00', '2024-12-03 06:00:00',
               '2024-12-04 06:00:00', '2024-12-05 06:00:00',
               '2024-12-06 06:00:00', '2024-12-09 06:00:00',
               '2024-12-10 06:00:00', '2024-12-11 06:00:00'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

### Find the IBM stock price on 15th Jan of each year
Using `DateOffset` first create a sample datetime index. This will be later used as condition filter.<br>
For the `freq` param , we'll pass the `DateOffset` module with an yearly interval to account for leap years in he date range

In [101]:
# Creating a sample datetime index that will be used as a filtering condition
pd.date_range(start="1962-01-15",end="2023-09-15",freq=pd.DateOffset(years=1))
type(pd.date_range(start="1962-01-15",end="2023-09-15",freq=pd.DateOffset(years=1)))

pandas.core.indexes.datetimes.DatetimeIndex

In [102]:
# Assigning to a variable to be used as a condition
jan = pd.date_range(start="1962-01-15",end="2023-09-15",freq=pd.DateOffset(years=1))
jan

DatetimeIndex(['1962-01-15', '1963-01-15', '1964-01-15', '1965-01-15',
               '1966-01-15', '1967-01-15', '1968-01-15', '1969-01-15',
               '1970-01-15', '1971-01-15', '1972-01-15', '1973-01-15',
               '1974-01-15', '1975-01-15', '1976-01-15', '1977-01-15',
               '1978-01-15', '1979-01-15', '1980-01-15', '1981-01-15',
               '1982-01-15', '1983-01-15', '1984-01-15', '1985-01-15',
               '1986-01-15', '1987-01-15', '1988-01-15', '1989-01-15',
               '1990-01-15', '1991-01-15', '1992-01-15', '1993-01-15',
               '1994-01-15', '1995-01-15', '1996-01-15', '1997-01-15',
               '1998-01-15', '1999-01-15', '2000-01-15', '2001-01-15',
               '2002-01-15', '2003-01-15', '2004-01-15', '2005-01-15',
               '2006-01-15', '2007-01-15', '2008-01-15', '2009-01-15',
               '2010-01-15', '2011-01-15', '2012-01-15', '2013-01-15',
               '2014-01-15', '2015-01-15', '2016-01-15', '2017-01-15',
      

In [103]:
stocks.index.isin(jan)

array([False, False, False, ..., False, False, False], shape=(15546,))

In [104]:
# This has pulled all records -- if available -- into the dataframe
# Note that there are 'missing' years for e.g. for 1966 & 1967 where there might not be 15-th Jan date available
stocks[stocks.index.isin(jan)]

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962-01-15,4.93436,4.95417,4.93436,4.94099,388098.7
1963-01-15,3.61374,3.64643,3.60076,3.62018,3173292.0
1964-01-15,4.73761,4.89503,4.73761,4.89503,2374262.0
1965-01-15,4.67837,4.74376,4.67837,4.73761,876649.2
1968-01-15,10.1047,10.2355,10.1047,10.1906,736627.7
1969-01-15,10.094,10.1262,10.0383,10.0998,633135.8
1970-01-15,12.2693,12.4761,12.2693,12.4761,642269.7
1971-01-15,10.2891,10.2999,10.1398,10.1398,1095810.0
1973-01-15,13.8249,14.0376,13.8249,13.8454,2091176.0
1974-01-15,9.69188,9.89977,9.67138,9.77776,988664.3


## Specialized Date Offsets
- Pandas nests more specialized date offsets in `pd.tseries.offsets`.
- We can add a different amount of time to each date (for example, month end, quarter end, year begin)
- Useful when we want to add / subtract by a specialized amount of time offset

In [105]:
stocks.index + pd.tseries.offsets.MonthEnd()

DatetimeIndex(['1962-01-31', '1962-01-31', '1962-01-31', '1962-01-31',
               '1962-01-31', '1962-01-31', '1962-01-31', '1962-01-31',
               '1962-01-31', '1962-01-31',
               ...
               '2023-09-30', '2023-09-30', '2023-10-31', '2023-10-31',
               '2023-10-31', '2023-10-31', '2023-10-31', '2023-10-31',
               '2023-10-31', '2023-10-31'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

In [106]:
stocks.index + pd.tseries.offsets.QuarterEnd()
stocks.index + pd.tseries.offsets.QuarterEnd(startingMonth=1)

DatetimeIndex(['1962-01-31', '1962-01-31', '1962-01-31', '1962-01-31',
               '1962-01-31', '1962-01-31', '1962-01-31', '1962-01-31',
               '1962-01-31', '1962-01-31',
               ...
               '2023-10-31', '2023-10-31', '2023-10-31', '2023-10-31',
               '2023-10-31', '2023-10-31', '2023-10-31', '2023-10-31',
               '2023-10-31', '2023-10-31'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

In [107]:
stocks.index + pd.tseries.offsets.YearBegin()

DatetimeIndex(['1963-01-01', '1963-01-01', '1963-01-01', '1963-01-01',
               '1963-01-01', '1963-01-01', '1963-01-01', '1963-01-01',
               '1963-01-01', '1963-01-01',
               ...
               '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01',
               '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01',
               '2024-01-01', '2024-01-01'],
              dtype='datetime64[ns]', name='Date', length=15546, freq=None)

## Timedeltas
- A **Timedelta** is a pandas object that represents a duration (an amount of time).
- Subtracting two **Timestamp** objects will yield a **Timedelta** object (this applies to subtracting a **Series** from another **Series**).
- The **Timedelta** constructor accepts parameters for time as well as string descriptions.
##### Timedelta represents timespan i.e. differences between two instances of time

In [108]:
# TimeDelta between two dates (25th May & 13th May)
pd.Timestamp(2025,5,25) - pd.Timestamp(2025,5,13)

Timedelta('12 days 00:00:00')

In [109]:
pd.Timestamp(2025,5,13) - pd.Timestamp(2025,5,25)

Timedelta('-12 days +00:00:00')

In [110]:
type(pd.Timestamp(2025,5,25) - pd.Timestamp(2025,5,13))

pandas._libs.tslibs.timedeltas.Timedelta

In [111]:
# Timedelta between timestamps on same day (9:30AM to 3::30PM)
pd.Timestamp(2025,5,5,15,30,00) - pd.Timestamp(2025,5,5,9,30,00)

Timedelta('0 days 06:00:00')

##### Just like DateOffset, we can create an a TimeDelta object that represents a difference of some time (or date or a combination of both)

In [112]:
# Timedelta object represnting a difference of 2days 12hrs
pd.Timedelta(days=2,hours=12)

Timedelta('2 days 12:00:00')

In [113]:
pd.Timedelta("5 hours")

Timedelta('0 days 05:00:00')

#### In the ecommerce dataset, we can use timedelta object to find the differnce between a customer ordering an item & recieving it. The ones with a large timedelta value would be a focus area for the delivery/ logistics team 

In [114]:
pd.read_csv("ecommerce.csv")

Unnamed: 0,ID,order_date,delivery_date
0,1,5/24/98,2/5/99
1,2,4/22/92,3/6/98
2,4,2/10/91,8/26/92
3,5,7/21/92,11/20/97
4,7,9/2/93,6/10/98
...,...,...,...
496,990,6/24/91,2/2/96
497,991,9/9/91,3/30/98
498,993,11/16/90,4/27/98
499,994,6/3/93,6/13/93


In [115]:
# Preliminary review ofthe dataset shows we can assign orderID as the row index
# As well as convert date columns to datetime objects
pd.read_csv("ecommerce.csv").info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 501 entries, 0 to 500
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   ID             501 non-null    int64 
 1   order_date     501 non-null    object
 2   delivery_date  501 non-null    object
dtypes: int64(1), object(2)
memory usage: 11.9+ KB


In [116]:
ecom = pd.read_csv("ecommerce.csv",index_col="ID",parse_dates=["order_date","delivery_date"],date_format="%m/%d/%y")
ecom.info()

<class 'pandas.core.frame.DataFrame'>
Index: 501 entries, 1 to 997
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   order_date     501 non-null    datetime64[ns]
 1   delivery_date  501 non-null    datetime64[ns]
dtypes: datetime64[ns](2)
memory usage: 11.7 KB


In [117]:
ecom

Unnamed: 0_level_0,order_date,delivery_date
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1998-05-24,1999-02-05
2,1992-04-22,1998-03-06
4,1991-02-10,1992-08-26
5,1992-07-21,1997-11-20
7,1993-09-02,1998-06-10
...,...,...
990,1991-06-24,1996-02-02
991,1991-09-09,1998-03-30
993,1990-11-16,1998-04-27
994,1993-06-03,1993-06-13


#### First we see an example of just adding a new column that shows the delivery time delta.

In [119]:
ecom["delivery_date"] - ecom["order_date"]
ecom["delivery_delta"] = ecom["delivery_date"] - ecom["order_date"]
ecom

Unnamed: 0_level_0,order_date,delivery_date,delivery_delta
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1998-05-24,1999-02-05,257 days
2,1992-04-22,1998-03-06,2144 days
4,1991-02-10,1992-08-26,563 days
5,1992-07-21,1997-11-20,1948 days
7,1993-09-02,1998-06-10,1742 days
...,...,...,...
990,1991-06-24,1996-02-02,1684 days
991,1991-09-09,1998-03-30,2394 days
993,1990-11-16,1998-04-27,2719 days
994,1993-06-03,1993-06-13,10 days


#### Next we'll see what would be the likely delivery date if it takes 1.5X times of the current delivery delta  

In [123]:
ecom["delivery_date"] + (ecom["delivery_delta"]*1.5)
ecom["delivery_1.5x"] = ecom["delivery_date"] + (ecom["delivery_delta"]*1.5)
ecom

Unnamed: 0_level_0,order_date,delivery_date,delivery_delta,delivery_1.5x
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1998-05-24,1999-02-05,257 days,2000-02-25 12:00:00
2,1992-04-22,1998-03-06,2144 days,2006-12-25 00:00:00
4,1991-02-10,1992-08-26,563 days,1994-12-18 12:00:00
5,1992-07-21,1997-11-20,1948 days,2005-11-20 00:00:00
7,1993-09-02,1998-06-10,1742 days,2005-08-05 00:00:00
...,...,...,...,...
990,1991-06-24,1996-02-02,1684 days,2003-01-02 00:00:00
991,1991-09-09,1998-03-30,2394 days,2008-01-28 00:00:00
993,1990-11-16,1998-04-27,2719 days,2009-06-26 12:00:00
994,1993-06-03,1993-06-13,10 days,1993-06-28 00:00:00


##### Now we investigate the data that came with the original dataset as well as the added columns

In [124]:
# first up, obviously finding the one order that has longest timedelta
ecom["delivery_delta"].max()

Timedelta('3583 days 00:00:00')

In [128]:
ecom["delivery_delta"].min()
ecom["delivery_delta"].mean()


Timedelta('1217 days 22:53:53.532934128')

In [136]:
# top 5 orders with shortest delivery times
ecom["delivery_delta"].nsmallest(n=5)

ID
898    8 days
19     9 days
612    9 days
994   10 days
310   16 days
Name: delivery_delta, dtype: timedelta64[ns]

In [141]:
ecom.nsmallest(n=5,columns="delivery_delta")

Unnamed: 0_level_0,order_date,delivery_date,delivery_delta,delivery_1.5x
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
898,1990-05-24,1990-06-01,8 days,1990-06-13 00:00:00
19,1998-05-10,1998-05-19,9 days,1998-06-01 12:00:00
612,1994-08-11,1994-08-20,9 days,1994-09-02 12:00:00
994,1993-06-03,1993-06-13,10 days,1993-06-28 00:00:00
310,1997-09-20,1997-10-06,16 days,1997-10-30 00:00:00
