* Time series data contains data points attached to sequential timestamps.
* Stock prices over time
* Daily, weekly, monthly sales
* Periodic measurements in a process
* Power or gas consumption rates over time
* Pandas is a great library for working on dates and times. It provides numerous functions for the process of analyzing and manipulating time series data.

In [1]:
import pandas as pd

df = pd.read_csv("Data/sales.csv")

df.head()

Unnamed: 0,purchase_date,sold_date,qty
0,2021-05-31 00:00:00,2021-07-15,316
1,2021-06-30 07:00:00,2022-01-12,495
2,2021-07-30 13:00:00,2021-09-14,312
3,2021-08-31 13:00:00,2022-02-05,416
4,2021-09-28 07:00:00,2021-12-30,349


## 1. Data types

Data types to represent time series data:

* **datetime64[ns]**:  This data type is mainly used for representing timestamps. It can be on the day level such as '2022-05-20' and May 1st, 2022. The precision of a timestamp can even be on a nanosecond level.

* **timedelta64[ns]**: This data type can be used for expressing differences in times. The units can be days, hours, minutes, and so on. It can take on both positive and negative values. For instance, if we subtract a future date from today, we will end up having a negative timedelta value.

* **period[freq]**: This data type represents fixed durations such as month, quarter, and year. In a sense, it is similar to timedelta but the durations are fixed. For instance, a period[M] data type can take the value of 2022-01 but cannot be 1 month or 2 months.

In [2]:
df.dtypes

purchase_date    object
sold_date        object
qty               int64
dtype: object

In [3]:
df = df.astype({
    
    "purchase_date": "datetime64[ns]",
    "sold_date": "datetime64[ns]"

})

df.dtypes

purchase_date    datetime64[ns]
sold_date        datetime64[ns]
qty                       int64
dtype: object

## 2. Day

* Date manipulation functions are available via the dt accessor

In [4]:
df

Unnamed: 0,purchase_date,sold_date,qty
0,2021-05-31 00:00:00,2021-07-15,316
1,2021-06-30 07:00:00,2022-01-12,495
2,2021-07-30 13:00:00,2021-09-14,312
3,2021-08-31 13:00:00,2022-02-05,416
4,2021-09-28 07:00:00,2021-12-30,349


In [5]:
df["sold_date"].dt.day

0    15
1    12
2    14
3     5
4    30
Name: sold_date, dtype: int32

## 3. Year

In [6]:
df["sold_date"].dt.year

0    2021
1    2022
2    2021
3    2022
4    2021
Name: sold_date, dtype: int32

## 4. Month - 1

In [7]:
df.head()

Unnamed: 0,purchase_date,sold_date,qty
0,2021-05-31 00:00:00,2021-07-15,316
1,2021-06-30 07:00:00,2022-01-12,495
2,2021-07-30 13:00:00,2021-09-14,312
3,2021-08-31 13:00:00,2022-02-05,416
4,2021-09-28 07:00:00,2021-12-30,349


In [8]:
df["sold_date"].dt.month

0     7
1     1
2     9
3     2
4    12
Name: sold_date, dtype: int32

## 5. Month - 2

In [9]:
df["sold_date"].dt.to_period("M")

0    2021-07
1    2022-01
2    2021-09
3    2022-02
4    2021-12
Name: sold_date, dtype: period[M]

In [10]:
df["sold_month"] = df["sold_date"].dt.to_period("M")

df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month
0,2021-05-31 00:00:00,2021-07-15,316,2021-07
1,2021-06-30 07:00:00,2022-01-12,495,2022-01
2,2021-07-30 13:00:00,2021-09-14,312,2021-09
3,2021-08-31 13:00:00,2022-02-05,416,2022-02
4,2021-09-28 07:00:00,2021-12-30,349,2021-12


In [11]:
df.dtypes

purchase_date    datetime64[ns]
sold_date        datetime64[ns]
qty                       int64
sold_month            period[M]
dtype: object

## 6. Month name

In [12]:
df["sold_date"].dt.month_name()

0         July
1      January
2    September
3     February
4     December
Name: sold_date, dtype: object

## 7. Weekday

In [13]:
df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month
0,2021-05-31 00:00:00,2021-07-15,316,2021-07
1,2021-06-30 07:00:00,2022-01-12,495,2022-01
2,2021-07-30 13:00:00,2021-09-14,312,2021-09
3,2021-08-31 13:00:00,2022-02-05,416,2022-02
4,2021-09-28 07:00:00,2021-12-30,349,2021-12


In [14]:
df["sold_date"].dt.dayofweek

0    3
1    2
2    1
3    5
4    3
Name: sold_date, dtype: int32

<img src="Assets/2021-July.png" class="juno_ui_theme_light" style="width:300px">

## 8. Day name

In [15]:
df["sold_date"].dt.day_name()

0     Thursday
1    Wednesday
2      Tuesday
3     Saturday
4     Thursday
Name: sold_date, dtype: object

## 6. isocalendar - 1

The isocalendar can be used for getting the ISO year, week number, and weekday from a date in a single step. It returns a DataFrame that contains these pieces of information in separate columns.

In [16]:
df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month
0,2021-05-31 00:00:00,2021-07-15,316,2021-07
1,2021-06-30 07:00:00,2022-01-12,495,2022-01
2,2021-07-30 13:00:00,2021-09-14,312,2021-09
3,2021-08-31 13:00:00,2022-02-05,416,2022-02
4,2021-09-28 07:00:00,2021-12-30,349,2021-12


In [17]:
df["sold_date"].dt.isocalendar()

Unnamed: 0,year,week,day
0,2021,28,4
1,2022,2,3
2,2021,37,2
3,2022,5,6
4,2021,52,4


* The first value of the sold date column is "2021-07-15" which is in the 28th week of 2021 and is the 4th day of the week. The weekday value for Monday is 1 so the 4th day is Thursday.

* When using the dayofweek method, the value for Monday is 0 so it returns 3 for Thursday.

In [18]:
df["sold_date"].dt.dayofweek

0    3
1    2
2    1
3    5
4    3
Name: sold_date, dtype: int32

## 10. isocalendar - 2

In [19]:
df["sold_date"].dt.isocalendar()

Unnamed: 0,year,week,day
0,2021,28,4
1,2022,2,3
2,2021,37,2
3,2022,5,6
4,2021,52,4


In [20]:
df["sold_date"].dt.isocalendar().week

0    28
1     2
2    37
3     5
4    52
Name: week, dtype: UInt32

## 11. Hour

In [21]:
df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month
0,2021-05-31 00:00:00,2021-07-15,316,2021-07
1,2021-06-30 07:00:00,2022-01-12,495,2022-01
2,2021-07-30 13:00:00,2021-09-14,312,2021-09
3,2021-08-31 13:00:00,2022-02-05,416,2022-02
4,2021-09-28 07:00:00,2021-12-30,349,2021-12


In [22]:
df["purchase_date"].dt.hour

0     0
1     7
2    13
3    13
4     7
Name: purchase_date, dtype: int32

## 12. Minute

In [23]:
df["purchase_date"].dt.minute

0    0
1    0
2    0
3    0
4    0
Name: purchase_date, dtype: int32

* Depending on the precision of the time value, we can use the second, microsecond, and nanosecond methods to extract the relevant part from time.

## 13. Month end

In [24]:
df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month
0,2021-05-31 00:00:00,2021-07-15,316,2021-07
1,2021-06-30 07:00:00,2022-01-12,495,2022-01
2,2021-07-30 13:00:00,2021-09-14,312,2021-09
3,2021-08-31 13:00:00,2022-02-05,416,2022-02
4,2021-09-28 07:00:00,2021-12-30,349,2021-12


In [25]:
df["purchase_date"].dt.is_month_end

0     True
1     True
2    False
3     True
4    False
Name: purchase_date, dtype: bool

## 14. Weekend

In [26]:
df["sold_date"].dt.day_name()

0     Thursday
1    Wednesday
2      Tuesday
3     Saturday
4     Thursday
Name: sold_date, dtype: object

In [27]:
df["sold_date"].dt.day_name().isin(["Saturday", "Sunday"])

0    False
1    False
2    False
3     True
4    False
Name: sold_date, dtype: bool

In [28]:
df[df["sold_date"].dt.day_name().isin(["Saturday", "Sunday"])]

Unnamed: 0,purchase_date,sold_date,qty,sold_month
3,2021-08-31 13:00:00,2022-02-05,416,2022-02


## 15. Time difference between dates

In [29]:
df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month
0,2021-05-31 00:00:00,2021-07-15,316,2021-07
1,2021-06-30 07:00:00,2022-01-12,495,2022-01
2,2021-07-30 13:00:00,2021-09-14,312,2021-09
3,2021-08-31 13:00:00,2022-02-05,416,2022-02
4,2021-09-28 07:00:00,2021-12-30,349,2021-12


In [30]:
df["sold_date"] - df["purchase_date"]

0    45 days 00:00:00
1   195 days 17:00:00
2    45 days 11:00:00
3   157 days 11:00:00
4    92 days 17:00:00
dtype: timedelta64[ns]

In [31]:
df["time_difference"] = df["sold_date"] - df["purchase_date"]

df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month,time_difference
0,2021-05-31 00:00:00,2021-07-15,316,2021-07,45 days 00:00:00
1,2021-06-30 07:00:00,2022-01-12,495,2022-01,195 days 17:00:00
2,2021-07-30 13:00:00,2021-09-14,312,2021-09,45 days 11:00:00
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,157 days 11:00:00
4,2021-09-28 07:00:00,2021-12-30,349,2021-12,92 days 17:00:00


In [32]:
df.dtypes

purchase_date       datetime64[ns]
sold_date           datetime64[ns]
qty                          int64
sold_month               period[M]
time_difference    timedelta64[ns]
dtype: object

## 16. Time difference in days

In [33]:
df["time_difference"].dt.days

0     45
1    195
2     45
3    157
4     92
Name: time_difference, dtype: int64

## 17. Time difference with Timedelta - 1

In [34]:
df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month,time_difference
0,2021-05-31 00:00:00,2021-07-15,316,2021-07,45 days 00:00:00
1,2021-06-30 07:00:00,2022-01-12,495,2022-01,195 days 17:00:00
2,2021-07-30 13:00:00,2021-09-14,312,2021-09,45 days 11:00:00
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,157 days 11:00:00
4,2021-09-28 07:00:00,2021-12-30,349,2021-12,92 days 17:00:00


In [35]:
df["time_difference"] / pd.Timedelta(days=1)

0     45.000000
1    195.708333
2     45.458333
3    157.458333
4     92.708333
Name: time_difference, dtype: float64

## 18. Time difference with Timedelta - 2

In [36]:
df["time_difference"] / pd.Timedelta(hours=1)

0    1080.0
1    4697.0
2    1091.0
3    3779.0
4    2225.0
Name: time_difference, dtype: float64

## 19. Time difference with NumPy timedelta - 1

* Pandas Timedelta does not support Month and Year

In [37]:
df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month,time_difference
0,2021-05-31 00:00:00,2021-07-15,316,2021-07,45 days 00:00:00
1,2021-06-30 07:00:00,2022-01-12,495,2022-01,195 days 17:00:00
2,2021-07-30 13:00:00,2021-09-14,312,2021-09,45 days 11:00:00
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,157 days 11:00:00
4,2021-09-28 07:00:00,2021-12-30,349,2021-12,92 days 17:00:00


In [38]:
import numpy as np

df["time_difference"] / np.timedelta64(1, 'M')

0    1.451613
1    6.313172
2    1.466398
3    5.079301
4    2.990591
Name: time_difference, dtype: float64

## 20. Time difference with NumPy timedelta - 2

In [39]:
df["time_difference"] / np.timedelta64(1, 'Y')

0    0.123288
1    0.536187
2    0.124543
3    0.431393
4    0.253995
Name: time_difference, dtype: float64

In [40]:
df["time_difference"] / np.timedelta64(1, 'D')

0     45.000000
1    195.708333
2     45.458333
3    157.458333
4     92.708333
Name: time_difference, dtype: float64

## 21. Add interval with DateOffset - 1

In [41]:
df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month,time_difference
0,2021-05-31 00:00:00,2021-07-15,316,2021-07,45 days 00:00:00
1,2021-06-30 07:00:00,2022-01-12,495,2022-01,195 days 17:00:00
2,2021-07-30 13:00:00,2021-09-14,312,2021-09,45 days 11:00:00
3,2021-08-31 13:00:00,2022-02-05,416,2022-02,157 days 11:00:00
4,2021-09-28 07:00:00,2021-12-30,349,2021-12,92 days 17:00:00


In [42]:
df["sold_date"] + pd.DateOffset(days=2)

0   2021-07-17
1   2022-01-14
2   2021-09-16
3   2022-02-07
4   2022-01-01
Name: sold_date, dtype: datetime64[ns]

## 22. Add interval with DateOffset - 2

In [43]:
df["sold_date"] + pd.DateOffset(hours=4)

0   2021-07-15 04:00:00
1   2022-01-12 04:00:00
2   2021-09-14 04:00:00
3   2022-02-05 04:00:00
4   2021-12-30 04:00:00
Name: sold_date, dtype: datetime64[ns]

In [44]:
df["sold_date"] = df["sold_date"] + pd.DateOffset(hours=4)

df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month,time_difference
0,2021-05-31 00:00:00,2021-07-15 04:00:00,316,2021-07,45 days 00:00:00
1,2021-06-30 07:00:00,2022-01-12 04:00:00,495,2022-01,195 days 17:00:00
2,2021-07-30 13:00:00,2021-09-14 04:00:00,312,2021-09,45 days 11:00:00
3,2021-08-31 13:00:00,2022-02-05 04:00:00,416,2022-02,157 days 11:00:00
4,2021-09-28 07:00:00,2021-12-30 04:00:00,349,2021-12,92 days 17:00:00


## 23. Add interval with DateOffset - 3

In [45]:
df["purchase_date"] = df["purchase_date"] + pd.DateOffset(minutes=30)

df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month,time_difference
0,2021-05-31 00:30:00,2021-07-15 04:00:00,316,2021-07,45 days 00:00:00
1,2021-06-30 07:30:00,2022-01-12 04:00:00,495,2022-01,195 days 17:00:00
2,2021-07-30 13:30:00,2021-09-14 04:00:00,312,2021-09,45 days 11:00:00
3,2021-08-31 13:30:00,2022-02-05 04:00:00,416,2022-02,157 days 11:00:00
4,2021-09-28 07:30:00,2021-12-30 04:00:00,349,2021-12,92 days 17:00:00


In [46]:
df["time_difference"] = df["sold_date"] - df["purchase_date"]

df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month,time_difference
0,2021-05-31 00:30:00,2021-07-15 04:00:00,316,2021-07,45 days 03:30:00
1,2021-06-30 07:30:00,2022-01-12 04:00:00,495,2022-01,195 days 20:30:00
2,2021-07-30 13:30:00,2021-09-14 04:00:00,312,2021-09,45 days 14:30:00
3,2021-08-31 13:30:00,2022-02-05 04:00:00,416,2022-02,157 days 14:30:00
4,2021-09-28 07:30:00,2021-12-30 04:00:00,349,2021-12,92 days 20:30:00


## 24. Subtraction with DateOffset

* In order to do subtract instead of adding, we can either do subtraction or use a negative value inside the function.

In [47]:
df["purchase_date"] - pd.DateOffset(hours=5)

0   2021-05-30 19:30:00
1   2021-06-30 02:30:00
2   2021-07-30 08:30:00
3   2021-08-31 08:30:00
4   2021-09-28 02:30:00
Name: purchase_date, dtype: datetime64[ns]

In [48]:
df["purchase_date"] + pd.DateOffset(hours=-5)

0   2021-05-30 19:30:00
1   2021-06-30 02:30:00
2   2021-07-30 08:30:00
3   2021-08-31 08:30:00
4   2021-09-28 02:30:00
Name: purchase_date, dtype: datetime64[ns]

The DateOffset function supports the following units:

* years
* months
* weeks
* days
* hours
* minutes
* seconds
* microseconds
* nanoseconds

## 25. Add interval with Timedelta - 1

In [49]:
df.head()

Unnamed: 0,purchase_date,sold_date,qty,sold_month,time_difference
0,2021-05-31 00:30:00,2021-07-15 04:00:00,316,2021-07,45 days 03:30:00
1,2021-06-30 07:30:00,2022-01-12 04:00:00,495,2022-01,195 days 20:30:00
2,2021-07-30 13:30:00,2021-09-14 04:00:00,312,2021-09,45 days 14:30:00
3,2021-08-31 13:30:00,2022-02-05 04:00:00,416,2022-02,157 days 14:30:00
4,2021-09-28 07:30:00,2021-12-30 04:00:00,349,2021-12,92 days 20:30:00


In [50]:
df["sold_date"] + pd.Timedelta(value=10, unit="D")

0   2021-07-25 04:00:00
1   2022-01-22 04:00:00
2   2021-09-24 04:00:00
3   2022-02-15 04:00:00
4   2022-01-09 04:00:00
Name: sold_date, dtype: datetime64[ns]

## 26. Add interval with Timedelta - 2

In [51]:
df["purchase_date"] + pd.Timedelta(value=5, unit="H")

0   2021-05-31 05:30:00
1   2021-06-30 12:30:00
2   2021-07-30 18:30:00
3   2021-08-31 18:30:00
4   2021-09-28 12:30:00
Name: purchase_date, dtype: datetime64[ns]

The TimeDelta function supports the following units:
* W and w represent a week
* D and d represent a day
* H and h represent an hour
* T and t represent a minute
* S and s represent a second
* L and l represent a millisecond
* U and u represent a microsecond
* N and n represent a nanosecond

## 27. The to_datetime function - 1

* The to_datetime function can be used for converting a scalar, Pandas Series, or Pandas DataFrame to a datetime object.

In [52]:
sample_df = pd.DataFrame({
    "year": [2021, 2022],
    "month": [12, 4],
    "day": [28, 19]
})

sample_df

Unnamed: 0,year,month,day
0,2021,12,28
1,2022,4,19


In [53]:
sample_df["date"] = pd.to_datetime(sample_df)

sample_df

Unnamed: 0,year,month,day,date
0,2021,12,28,2021-12-28
1,2022,4,19,2022-04-19


In [54]:
sample_df.dtypes

year              int64
month             int64
day               int64
date     datetime64[ns]
dtype: object

## 28. The to_datetime function - 2

In [55]:
pd.to_datetime("2022-11-28")

Timestamp('2022-11-28 00:00:00')

In [56]:
new_date = pd.to_datetime("2022-11-28")

sample_df["new_date"] = new_date

sample_df

Unnamed: 0,year,month,day,date,new_date
0,2021,12,28,2021-12-28,2022-11-28
1,2022,4,19,2022-04-19,2022-11-28


In [57]:
sample_df.dtypes

year                 int64
month                int64
day                  int64
date        datetime64[ns]
new_date    datetime64[ns]
dtype: object

## 29. The date_range function - 1

In [58]:
pd.date_range(start="2021-12-28", periods=5, freq="D")

DatetimeIndex(['2021-12-28', '2021-12-29', '2021-12-30', '2021-12-31',
               '2022-01-01'],
              dtype='datetime64[ns]', freq='D')

In [59]:
df = pd.DataFrame({
    
    "Date": pd.date_range(start="2022-11-28", periods=5, freq="D"),
    "Measurement": [1, 10, 25, 7, 12]
    
})

df

Unnamed: 0,Date,Measurement
0,2022-11-28,1
1,2022-11-29,10
2,2022-11-30,25
3,2022-12-01,7
4,2022-12-02,12


## 30. The date_range function - 2

In [60]:
pd.date_range(start="2022-11-28", end="2022-12-10", freq="D")

DatetimeIndex(['2022-11-28', '2022-11-29', '2022-11-30', '2022-12-01',
               '2022-12-02', '2022-12-03', '2022-12-04', '2022-12-05',
               '2022-12-06', '2022-12-07', '2022-12-08', '2022-12-09',
               '2022-12-10'],
              dtype='datetime64[ns]', freq='D')

In [61]:
pd.date_range(start="2022-11-28", end="2022-12-10", freq="2D")

DatetimeIndex(['2022-11-28', '2022-11-30', '2022-12-02', '2022-12-04',
               '2022-12-06', '2022-12-08', '2022-12-10'],
              dtype='datetime64[ns]', freq='2D')