## Time Methods

In [1]:
from datetime import datetime

In [2]:
# To illustrate the order of arguments
my_year = 2022
my_month = 1
my_day = 1
my_hour = 1
my_minute = 8
my_second = 0

In [3]:
my_date = datetime(my_year,my_month,my_day)

In [4]:
my_date

datetime.datetime(2022, 1, 1, 0, 0)

In [5]:
my_date_time = datetime(my_year,my_month,my_day,my_hour,my_minute,my_second)

In [6]:
my_date_time

datetime.datetime(2022, 1, 1, 1, 8)

In [7]:
my_date.day

1

In [8]:
my_date_time.hour

1

## Pandas

## Converting to datetime

Often when data sets are stored, the time component may be a string. Pandas easily converts strings to datetime objects.

In [9]:
import pandas as pd

In [10]:
myser = pd.Series(['jan 1, 2022', '2022-01-01', None])

In [11]:
myser

0    jan 1, 2022
1     2022-01-01
2           None
dtype: object

In [12]:
myser[0]

'jan 1, 2022'

## pd.to_datetime()

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#converting-to-timestamps

In [13]:
pd.to_datetime(myser)

0   2022-01-01
1   2022-01-01
2          NaT
dtype: datetime64[ns]

In [14]:
pd.to_datetime(myser)[0]

Timestamp('2022-01-01 00:00:00')

In [15]:
obvi_euro_date = '31-12-2022'

In [16]:
# 10th of Dec OR 12th of October?
# We may need to tell pandas
euro_date = '10-12-2022'

In [17]:
pd.to_datetime(euro_date) 

Timestamp('2022-10-12 00:00:00')

In [18]:
pd.to_datetime(euro_date,dayfirst=True) 

Timestamp('2022-12-10 00:00:00')

## Custom Time String Formatting

Sometimes dates can have a non standard format, luckily you can always specify to pandas the format. You should also note this could speed up the conversion, so it may be worth doing even if pandas can parse on its own.

https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

In [19]:
style_date = '12--Dec--2022'

In [20]:
pd.to_datetime(style_date, format='%d--%b--%Y')

Timestamp('2022-12-12 00:00:00')

In [21]:
strange_date = '12th of Dec 2022'

In [22]:
pd.to_datetime(strange_date)

Timestamp('2022-12-12 00:00:00')

## Data

In [23]:
sales = pd.read_csv('RetailSales_BeerWineLiquor.csv')

In [24]:
sales.head(3)

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597


In [25]:
sales.iloc[0]['DATE']

'1992-01-01'

In [26]:
type(sales.iloc[0]['DATE'])

str

In [27]:
sales['DATE'] = pd.to_datetime(sales['DATE'])

In [28]:
sales.head(3)

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597


In [29]:
sales.iloc[0]['DATE']

Timestamp('1992-01-01 00:00:00')

In [30]:
type(sales.iloc[0]['DATE'])

pandas._libs.tslibs.timestamps.Timestamp

## Attempt to Parse Dates Automatically

parse_dates - bool or list of int or names or list of lists or dict, default False The behavior is as follows:


* If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones,
* the column or index will be returned unaltered as an object data type. 
* For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more.

In [31]:
sales = pd.read_csv('RetailSales_BeerWineLiquor.csv',parse_dates=[0])

In [32]:
sales.head(3)

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597


In [33]:
type(sales.iloc[0]['DATE'])

pandas._libs.tslibs.timestamps.Timestamp

## Resample

A common operation with time series data is resampling based on the time series index

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html

In [34]:
# Our index
sales.index

RangeIndex(start=0, stop=340, step=1)

In [35]:
sales = sales.set_index("DATE")

In [36]:
sales.head(3)

Unnamed: 0_level_0,MRTSSM4453USN
DATE,Unnamed: 1_level_1
1992-01-01,1509
1992-02-01,1541
1992-03-01,1597


* When calling .resample() you first need to pass in a rule parameter, then you need to call some sort of aggregation function.
* The rule parameter describes the frequency with which to apply the aggregation function (daily, monthly, yearly, etc.)
* The aggregation function is needed because, due to resampling, we need some sort of mathematical rule to join the rows (mean, sum, count, etc.)
* https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

In [37]:
# Yearly Means
sales.resample(rule='A').mean().head(3)

Unnamed: 0_level_0,MRTSSM4453USN
DATE,Unnamed: 1_level_1
1992-12-31,1807.25
1993-12-31,1794.833333
1994-12-31,1841.75


Resampling rule 'A' takes all of the data points in a given year, applies the aggregation function (in this case we calculate the mean), and reports the result as the last day of that year. Note 2020 in this data set was not complete.

## .dt Method Calls

* Once a column or index is in a datetime format, you can call a variety of methods off of the .dt library inside pandas
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html

In [38]:
sales = sales.reset_index()

In [39]:
sales.head(3)

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597


In [40]:
#help(sales['DATE'].dt)

In [41]:
sales['DATE'].dt.month.head(3)

0    1
1    2
2    3
Name: DATE, dtype: int64

In [42]:
sales['DATE'].dt.is_leap_year.head(3)

0    True
1    True
2    True
Name: DATE, dtype: bool

## Practice Practice and Practice