in this lecture, we'll be looking at the time series and date functionality in pandas.

**Manipulating dates and times is quite flexible** in Pandas and thus **allows us to conduct more analysis** such as time series analysis.

pandas was originally created by Wes McKinney **to handle date and time data**.

In [2]:
import pandas as pd
import numpy as np

**Pandas has four main classes related times and Dates, Timestamp, DatetimeIndex, Period, and PeriodIndex**.

# 1. Timestamp :

**Timestamp class represents a single timestamp and associates values** with points in time.

**Timestamp is interchangeable** with Python's **datetime** in most cases.

# S1

We can **create a timestamp by passing a string including a date or/and time** such '9/29/2022 4:45PM'.

In [2]:
pd.Timestamp('9/29/2022 4:45PM')

Timestamp('2022-09-29 16:45:00')

In [3]:
type(pd.Timestamp('6:14PM'))

pandas._libs.tslibs.timestamps.Timestamp

In [4]:
pd.Timestamp('9/29/2022')

Timestamp('2022-09-29 00:00:00')

# S2

We can also **create a timestamp by passing multiple parameters** such as **year, month, date, hour, minute, or
second**, separately.

In [5]:
pd.Timestamp(2022, 9, 29, 16, 45)

Timestamp('2022-09-29 16:45:00')

In [6]:
pd.Timestamp(2022, 9, 29, 16, 59, 23)

Timestamp('2022-09-29 16:59:23')

**Timestamp** also has some useful **attributes**, such as **isoweekday()**, which **shows the weekday of the timestamp** note that 1 represents Monday and 7 represents Sunday.

In [7]:
pd.Timestamp('9/29/2022 4:55PM').isoweekday() # Thursday

4

In [8]:
pd.Timestamp(2022, 9, 29).isoweekday()

4

We can **extract** the specific **year, month, day, hour, minute, and second from a timestamp using their specific attributes**.

In [9]:
pd.Timestamp("9/29/2022 4:59:23PM").second

23

In [10]:
pd.Timestamp("9/29/2022 4:59:23PM").minute

59

In [11]:
pd.Timestamp(2022, 9, 29, 16, 59, 23).month

9

# 2. Period :

**Period** class **represents** a single **time span, such as a specific day or month**.

In [12]:
pd.Period('2/2019')

Period('2019-02', 'M')

In [13]:
type(pd.Period("9/2022"))

pandas._libs.tslibs.period.Period

In [14]:
pd.Period('2/23/2021')

Period('2021-02-23', 'D')

**Period objects represent the full timespan** that you specify. **Arithmetic on Period object is very easy and intuitive**, for instance, if we want to find out 5 months after January 2016, we simply plus 5.

In [15]:
pd.Period('9/2022') + 5

Period('2023-02', 'M')

If we want to find out two days before March 5th 2022, we
simply subtract 2

In [16]:
pd.Period('3/5/2022') - 2

Period('2022-03-03', 'D')

The key here is that the **period object encapsulates the granularity for arithmetic**.

# 3. DatetimeIndex :

**in DatetimeIndex**:

* the **index lable values** are a **Timestamp object**.
* **each Timestamp is the index** and **has a value associated with it**.

In [17]:
s1 = pd.Series(data= list('abc'), index= [pd.Timestamp('2022-09-29'),
                                          pd.Timestamp('2022-09-30'),
                                          pd.Timestamp('2022-10-30')])
s1

2022-09-29    a
2022-09-30    b
2022-10-30    c
dtype: object

In [18]:
type(s1.index)

pandas.core.indexes.datetimes.DatetimeIndex

# 4. PeriodIndex :

**in PeriodIndex**:

* the **index lable values** are a **Period object**.
* **each Timespan is the index** and **has a value associated with it**.

In [19]:
s2 = pd.Series(list('def'), [pd.Period('9/2022'),
                            pd.Period('2022-10'),
                            pd.Period('11/2022')])
s2

2022-09    d
2022-10    e
2022-11    f
Freq: M, dtype: object

In [20]:
type(s2.index)

pandas.core.indexes.period.PeriodIndex

# Converting to DatatimeIndex :

with using pandas built-in function **.to_datetime()**, we can **convert index lable of a dataframe or series object to DatetimeIndex** and **put them in a standard format**.

In [21]:
dates = ['2 March 2017', 'Aug 20, 2019', '2022-9-29', '12/2/21']

series = pd.Series(np.random.randint(0, 100, 4), index= dates)
series

2 March 2017    94
Aug 20, 2019    84
2022-9-29       64
12/2/21         11
dtype: int32

In [22]:
type(series.index)

pandas.core.indexes.base.Index

In [23]:
series.index = pd.to_datetime(series.index)
series

2017-03-02    94
2019-08-20    84
2022-09-29    64
2021-12-02    11
dtype: int32

In [24]:
type(series.index)

pandas.core.indexes.datetimes.DatetimeIndex

In [25]:
dates = ['2 March 2017', 'Aug 20, 2019', '2022-9-29', '12/2/21']

df = pd.DataFrame(np.random.randint(0, 100, (4, 2)),
                 index= dates, columns= ['v1', 'v2'])

df

Unnamed: 0,v1,v2
2 March 2017,40,59
"Aug 20, 2019",26,85
2022-9-29,28,37
12/2/21,39,14


In [26]:
type(df.index)

pandas.core.indexes.base.Index

In [27]:
df.index = pd.to_datetime(df.index)
df

Unnamed: 0,v1,v2
2017-03-02,40,59
2019-08-20,26,85
2022-09-29,28,37
2021-12-02,39,14


In [28]:
type(df.index)

pandas.core.indexes.datetimes.DatetimeIndex

# Converting to Timestamp :

we can also **pass a string including a date or/and time** to the **.to_datetime()** function **to convert it to a timestamp object**.

In [29]:
pd.to_datetime('9.29.22')

Timestamp('2022-09-29 00:00:00')

**.to_datetime()** also has options **to change the date parse order**. For example, we can pass in the argument **dayfirst = True to parse the date in European date**.

In [30]:
pd.to_datetime('9.10.22', dayfirst= True)

Timestamp('2022-10-09 00:00:00')

In [31]:
pd.to_datetime('9.29.22', dayfirst= True)

Timestamp('2022-09-29 00:00:00')

In [32]:
type(pd.to_datetime('9.29.22', dayfirst= True))

pandas._libs.tslibs.timestamps.Timestamp

# Timedelta :

**Timedelta objects are differences in times with period objects**, but **conceptually are similar**.



we get **a Timedelta object**, if we want to take the **differences between 2 different dates or/and times**.

In [33]:
diff = pd.Timestamp('9/29/2022') - pd.Timestamp('9/4/2022')
print(diff)
print(type(diff))

25 days 00:00:00
<class 'pandas._libs.tslibs.timedeltas.Timedelta'>


with using **.Timedelta()**, We can also find **what the date and time is for 12 days and three hours past September 29th, at 8:10 AM**.

In [34]:
pd.Timestamp('9.29.22 8:10AM') + pd.Timedelta('12D 3H')

Timestamp('2022-10-11 11:10:00')

# Offset :

__Offset__ :

* is **similar to Timedelta**.
* follows **specific calendar duration rules**.
* allows flexibility in terms of types of **time intervals**. 
* **has different classes such as Hour, Day, Week, Month, BusinessDay, MonthEnd, SemiMonthEnd, and etc classes**.

In [35]:
pd.Timestamp('1/4/2021')

Timestamp('2021-01-04 00:00:00')

In [36]:
pd.Timestamp('1/4/2021').isoweekday() # Monday

1

we can now add the timestamp with **a week ahead**.

In [37]:
pd.Timestamp('1.4.21') + pd.offsets.Week()

Timestamp('2021-01-11 00:00:00')

In [38]:
(pd.Timestamp('1.4.21') + pd.Timedelta('7D')) == (pd.Timestamp('1.4.21') + pd.offsets.Week())

True

In [39]:
pd.Timestamp('1.4.21') + pd.offsets.SemiMonthBegin()

Timestamp('2021-01-15 00:00:00')

In [40]:
pd.Timestamp('1.4.21') + pd.offsets.SemiMonthEnd()

Timestamp('2021-01-15 00:00:00')

In [8]:
pd.Timestamp('11/29/2019').weekday()

4

In [9]:
(pd.Timestamp('11/29/2019') + pd.offsets.MonthEnd()).weekday()

5

In [41]:
print(type(pd.offsets.BusinessDay()))

<class 'pandas._libs.tslibs.offsets.BusinessDay'>


In [42]:
pd.Timestamp(2022, 9, 30).isoweekday()

5

we can now add the timestamp with **a business day ahead**.

In [43]:
pd.Timestamp(2022, 9, 30) + pd.offsets.BusinessDay()

Timestamp('2022-10-03 00:00:00')

In [44]:
pd.Timestamp(2022, 9, 30) + pd.offsets.MonthBegin()

Timestamp('2022-10-01 00:00:00')

In [45]:
pd.Timestamp(2022, 9, 30) + pd.offsets.MonthEnd()

Timestamp('2022-10-31 00:00:00')

In [46]:
pd.Timestamp(2022, 9, 30) + pd.offsets.SemiMonthEnd()

Timestamp('2022-10-15 00:00:00')

In [47]:
pd.Timestamp(2022, 9, 30) + pd.offsets.SemiMonthBegin()

Timestamp('2022-10-01 00:00:00')

# Working with Dates in a DataFrame :

In [48]:
pd.Timestamp('9-30-2022').isoweekday()

5

Suppose we want to look at nine measurements, taken **bi-weekly, every Sunday**, starting in October 2016. 


**.date_range()** is a **built-in function in pandas**.

**Using date_range**, we can **create a DatetimeIndex**. 

In **data_range**:
* we have to either specify **the start or end date**. If it is not explicitly specified, by default, the date is considered the start date. 
* we have to specify **number of periods, and a frequency**. Here, we set __freq__ to __"2W-SUN"__, which means biweekly on Sunday.

In [49]:
dates = pd.date_range('9-30-2022', periods= 9, freq= '2W-SUN')
dates

DatetimeIndex(['2022-10-02', '2022-10-16', '2022-10-30', '2022-11-13',
               '2022-11-27', '2022-12-11', '2022-12-25', '2023-01-08',
               '2023-01-22'],
              dtype='datetime64[ns]', freq='2W-SUN')

There are many other frequencies that we can use. For example, we can do **business day**. for doing this, we set __freq__ to __'B'__.

In [50]:
pd.date_range('9.30.22', periods= 9, freq= 'B')

DatetimeIndex(['2022-09-30', '2022-10-03', '2022-10-04', '2022-10-05',
               '2022-10-06', '2022-10-07', '2022-10-10', '2022-10-11',
               '2022-10-12'],
              dtype='datetime64[ns]', freq='B')

we can do quarterly, with the **quarter start in June**. for doing this, we set __freq__ to __'QS-SEP'__.

In [51]:
pd.date_range('9.30.22', periods= 9, freq= 'QS-SEP')

DatetimeIndex(['2022-12-01', '2023-03-01', '2023-06-01', '2023-09-01',
               '2023-12-01', '2024-03-01', '2024-06-01', '2024-09-01',
               '2024-12-01'],
              dtype='datetime64[ns]', freq='QS-SEP')

we can do quarterly, with the **quarter start in October**. for doing this, we set **freq** to **'QS-OCT'**.


In [52]:
pd.date_range('9.30.22', periods= 9, freq= 'QS-OCT')

DatetimeIndex(['2022-10-01', '2023-01-01', '2023-04-01', '2023-07-01',
               '2023-10-01', '2024-01-01', '2024-04-01', '2024-07-01',
               '2024-10-01'],
              dtype='datetime64[ns]', freq='QS-OCT')

let's go back to our **bi-weekly on Sunday** example and **create a DataFrame** using these dates, and **some random data, and see what we can do with it.**

In [53]:
dates = pd.date_range('9/30/2022', periods= 9, freq= '2W-SUN')
df = pd.DataFrame({'v1': 100 + np.random.randint(-5, 10, 9).cumsum(),
                  'v2': 120 + np.random.randint(-5, 10, 9)}, index= dates)
df

Unnamed: 0,v1,v2
2022-10-02,100,117
2022-10-16,108,126
2022-10-30,105,124
2022-11-13,100,127
2022-11-27,108,122
2022-12-11,105,123
2022-12-25,106,116
2023-01-08,105,115
2023-01-22,111,127


we can check what day of the week a specific date is. For example, here we can see that all the dates in our index are on Sunday. Which matches the frequency that we set.

In [54]:
print(["Sunday" for date in df.index if date.isoweekday() == 7])

['Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday']


We can also use **.diff() to find the difference between each date's value**.

row 1:
* 108 - 100 = 8
* 126 - 117 = 9

row 2:
* 105 - 108 = -3
* 124 - 126 = -2

In [55]:
df.diff()

Unnamed: 0,v1,v2
2022-10-02,,
2022-10-16,8.0,9.0
2022-10-30,-3.0,-2.0
2022-11-13,-5.0,3.0
2022-11-27,8.0,-5.0
2022-12-11,-3.0,1.0
2022-12-25,1.0,-7.0
2023-01-08,-1.0,-1.0
2023-01-22,6.0,12.0


Suppose we want **to know what the mean count is for each month in our DataFrame**. We can do this using **.resample('M').mean()**. 

**Converting from a higher frequency to a lower frequency** is called **downsampling**.

for example, for **the 10th month** :

* (100 + 108 + 105) / 3 = 104.333333
* (117 + 126 + 124) / 3 = 122.333333

In [56]:
df.resample('M').mean()

Unnamed: 0,v1,v2
2022-10-31,104.333333,122.333333
2022-11-30,104.0,124.5
2022-12-31,105.5,119.5
2023-01-31,108.0,121.0


# datetime indexing :

we can **use partial string indexing to find values** from a **particular year or month**.

In [57]:
df.loc['2022']

Unnamed: 0,v1,v2
2022-10-02,100,117
2022-10-16,108,126
2022-10-30,105,124
2022-11-13,100,127
2022-11-27,108,122
2022-12-11,105,123
2022-12-25,106,116


In [58]:
df.loc['2023', :]

Unnamed: 0,v1,v2
2023-01-08,105,115
2023-01-22,111,127


we can do it from **a particular month**.

In [59]:
df.loc['2022-10']

Unnamed: 0,v1,v2
2022-10-02,100,117
2022-10-16,108,126
2022-10-30,105,124


# datetime slicing :

we can even **slice on a range of dates** For example, here **we only want the values from October 2022 onwards**.

In [60]:
df.loc['2022-12':]

Unnamed: 0,v1,v2
2022-12-11,105,123
2022-12-25,106,116
2023-01-08,105,115
2023-01-22,111,127
