<a href="https://colab.research.google.com/github/brunofbpaula/DataScience-UM-Coursera/blob/main/Pandas/DataFrame/DateTimeFunctionality.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Date/Time Functionality

Pandas has four main time related classes. Timestamp, DatetimeIndex, Period and PeriodIndex.

In [1]:
import pandas as pd
import numpy as np

## Timestamp

It represents a single timestamp and associate values with points in time. It is interchangeable with Python's datatime in most cases.

In [2]:
pd.Timestamp('09/29/2004 09:15AM')

Timestamp('2004-09-29 09:15:00')

In [4]:
# Passing multiple parameters such as year, month, date, hour and minute, separately
pd.Timestamp(2004, 9, 29, 9, 15)

Timestamp('2004-09-29 09:15:00')

In [5]:
# Function that shows the weekday of the timestamp
# One represents Monday and seven represents Sunday
pd.Timestamp(2004, 9, 29, 9, 15).isoweekday()

3

In [6]:
# It's also possible to find extract the specific year, month, day, hour, minute or second from a timestamp
pd.Timestamp(2023, 11, 2, 23, 30, 59).second

59

## Period

Represents a single time span, such as a specific day or month.

In [7]:
# Period that's October 2019
pd.Period('10/2019')

Period('2019-10', 'M')

In [8]:
# Period that's October 31 2021
pd.Period('10/31/2021')

Period('2021-10-31', 'D')

In [9]:
# Period objects represent the full specified timespan. Arithmetic on period is very easy and intuitive
pd.Period('09/2020') - 3

Period('2020-06', 'M')

In [12]:
pd.Period('04/19/2022') + 60

Period('2022-06-18', 'D')

## DatetimeIndex and PeriodIndex

The index of a timestamp is DatetimeIndex.

In [13]:
t1 = pd.Series(list('abc'), [pd.Timestamp('2016-09-01'), pd.Timestamp('2016-09-02'),
                             pd.Timestamp('2016-09-03')])
t1, type(t1.index)

(2016-09-01    a
 2016-09-02    b
 2016-09-03    c
 dtype: object,
 pandas.core.indexes.datetimes.DatetimeIndex)

In [14]:
t2 = pd.Series(list('def'), [pd.Period('2016-09'), pd.Period('2016-10'),
                             pd.Period('2016-11')])
t2, type(t2.index)

(2016-09    d
 2016-10    e
 2016-11    f
 Freq: M, dtype: object,
 pandas.core.indexes.period.PeriodIndex)

## Converting to Datatime

In [15]:
# Random dates with different date formats
d1 = ['2 June 2013', 'Aug 29, 2014', '2015-06-26', '7/12/16']

# Random data
ran = pd.DataFrame(np.random.randint(10, 100, (4,2)), index=d1,
                   columns=list('ab'))
ran

Unnamed: 0,a,b
2 June 2013,79,49
"Aug 29, 2014",26,53
2015-06-26,54,52
7/12/16,28,67


In [16]:
# Pandas has to_datetime function,
# where Pandas will try to convert these to Datetime and put them in a standard format.
ran.index = pd.to_datetime(ran.index)
ran

Unnamed: 0,a,b
2013-06-02,79,49
2014-08-29,26,53
2015-06-26,54,52
2016-07-12,28,67


In [17]:
# Pandas' to_datetime also has options to change the date parse order.
pd.to_datetime('4.7.12', dayfirst=True)

Timestamp('2012-07-04 00:00:00')

## Timedelta

Timedeltas are differences in times. This is not the same as a a period, but conceptually similar. For instance, if we want to take the difference between September 3rd and  September 1st, we get a Timedelta of two days.

In [18]:
pd.Timestamp('9/3/2016')-pd.Timestamp('9/1/2016')

Timedelta('2 days 00:00:00')

In [19]:
# Finding what date and time is 12 days and 3 hours past September 29th at 9:15AM
pd.Timestamp('9/29/2023 9:15AM') + pd.Timedelta('12D 3H')

Timestamp('2023-10-11 12:15:00')

## Offset

Offset is similar to timedelta, but it follows specific calendar duration rules. Offset allows flexibility in terms of types of time intervals. Besides hour, day, week, month, etc it also has business day, end of month, semi month begin, etc.

In [20]:
# Random timestamp
pd.Timestamp('9/4/2016').weekday()

6

In [21]:
# Adding the timestamp with one week ahead
pd.Timestamp('9/4/2016') + pd.offsets.Week()

Timestamp('2016-09-11 00:00:00')

In [22]:
# Trying to do the month end, then it should be the last day of September
pd.Timestamp('9/4/2016') + pd.offsets.MonthEnd()

Timestamp('2016-09-30 00:00:00')

## Working with Dates in DataFrame



In [23]:
# Suppose we want to look at nine measurements, taken bi-weekly, every Sunday,
# starting in October 2016. Using date_range, we can create this DatetimeIndex.
# In data_range, we have to either specify the start or end date. If it is not explicitly
# specified, by default, the date is considered the start date. Then we have to specify number
# of periods, and a frequency. Here, we set it to "2W-SUN", which means biweekly on Sunday

dates = pd.date_range('10-01-2016', periods=9, freq='2W-SUN')
dates

DatetimeIndex(['2016-10-02', '2016-10-16', '2016-10-30', '2016-11-13',
               '2016-11-27', '2016-12-11', '2016-12-25', '2017-01-08',
               '2017-01-22'],
              dtype='datetime64[ns]', freq='2W-SUN')

In [24]:
# Frequency in business days
pd.date_range('10-01-2016', periods=9, freq='B')

DatetimeIndex(['2016-10-03', '2016-10-04', '2016-10-05', '2016-10-06',
               '2016-10-07', '2016-10-10', '2016-10-11', '2016-10-12',
               '2016-10-13'],
              dtype='datetime64[ns]', freq='B')

In [26]:
# Quarterly frequency
pd.date_range('04-01-2016', periods=12, freq='QS-JUN')

DatetimeIndex(['2016-06-01', '2016-09-01', '2016-12-01', '2017-03-01',
               '2017-06-01', '2017-09-01', '2017-12-01', '2018-03-01',
               '2018-06-01', '2018-09-01', '2018-12-01', '2019-03-01'],
              dtype='datetime64[ns]', freq='QS-JUN')

In [27]:
# Now, let's go back to our weekly on Sunday example and create a DataFrame using these dates, and some random data

dates = pd.date_range('10-01-2016', periods=9, freq='2W-SUN')

df = pd.DataFrame({'Count 1': 100 + np.random.randint(-5, 10, 9).cumsum(),
                  'Count 2': 120 + np.random.randint(-5, 10, 9)}, index=dates)
df

Unnamed: 0,Count 1,Count 2
2016-10-02,97,126
2016-10-16,99,129
2016-10-30,96,123
2016-11-13,103,126
2016-11-27,112,123
2016-12-11,114,124
2016-12-25,112,128
2017-01-08,117,124
2017-01-22,119,128


In [30]:
# Checking if the frequency is correct (Frequency is biweekly on Sundays)
df.index.weekday

Int64Index([6, 6, 6, 6, 6, 6, 6, 6, 6], dtype='int64')

In [31]:
# Finding the difference between each date's values
df.diff()

Unnamed: 0,Count 1,Count 2
2016-10-02,,
2016-10-16,2.0,3.0
2016-10-30,-3.0,-6.0
2016-11-13,7.0,3.0
2016-11-27,9.0,-3.0
2016-12-11,2.0,1.0
2016-12-25,-2.0,4.0
2017-01-08,5.0,-4.0
2017-01-22,2.0,4.0


In [32]:
# Suppose we want to know what the mean count is for each month in our DataFrame. We can do this using
# resample. Converting from a higher frequency from a lower frequency is called downsampling.
df.resample('M').mean()

Unnamed: 0,Count 1,Count 2
2016-10-31,97.333333,126.0
2016-11-30,107.5,124.5
2016-12-31,113.0,126.0
2017-01-31,118.0,126.0


In [35]:
# For instance, we can use partial string indexing to find values from a particular year
df.loc['2017']

Unnamed: 0,Count 1,Count 2
2017-01-08,117,124
2017-01-22,119,128


In [37]:
# Or we can do it from a particular month
df.loc['2016-12']

Unnamed: 0,Count 1,Count 2
2016-12-11,114,124
2016-12-25,112,128


In [38]:
# Or we can even slice on a range of dates For example, here we only want the values from December 2016
# onwards.
df['2016-12':]

Unnamed: 0,Count 1,Count 2
2016-12-11,114,124
2016-12-25,112,128
2017-01-08,117,124
2017-01-22,119,128
