# ADVANCED PANDAS: TIME-SERIES ANALYSIS

## Today's Outline:
- Basics of Time-series Analysis
- Bonus: The Python *datetime* Module
- Pandas Time-series Functions
- Case-study

==========

# Basics of Time-series Analysis

- What & Why? (Features & Objectives)
    - Examples of Time-series Applications
- Time Series Basics Components
- Types & Formats

In [None]:
from IPython.display import Image
Image("data/ts-components.png")

==========

# Bonus: The Python *datetime* Module

| Type      	| Description                                                                                	|
|-----------	|--------------------------------------------------------------------------------------------	|
| date      	| Store calendar date (year, month, day) using the Gregorian calendar                        	|
| time      	| Store time of day as hours, minutes, seconds, and microseconds                             	|
| datetime  	| Stores both date and time                                                                  	|
| timedelta 	| Represents the difference between two datetime values (as days, seconds, and microseconds) 	|
| tzinfo    	| Base type for storing time zone information                                                	|

##### Basic Operations in datetime Module

In [None]:
# Importing the library
import datetime as dt

In [None]:
# Getting the current date & time
dt.datetime.now()

In [None]:
# Using 'date' sub-module
new_date = dt.date(2021,5,23)

In [None]:
new_date.day

In [None]:
# Using 'time' sub-module
new_time = dt.time(12,22,35,2345)

In [None]:
new_time.microsecond

In [None]:
# Using 'datetime' sub-module
new_datetime = dt.datetime(2021,5,23,11,30,15)

In [None]:
new_datetime.second

In [None]:
# Using 'timedelta' sub-module for the difference between dates
delta = dt.datetime(2011,1,7) - dt.datetime(2008,6,24,8,15)

In [None]:
delta.days

In [None]:
dt.datetime(2011,1,7) + dt.timedelta(12)

##### Convert a String into a datetime Object

In [None]:
# Use the 'strptime()' function to convert a string into a datetime object
dt.datetime.strptime('2011-1-3', '%Y-%m-%d')

| Type  	| Description                                                                                                                                 	|
|-------	|---------------------------------------------------------------------------------------------------------------------------------------------	|
| %Y    	| Four-digit year                                                                                                                             	|
| %y    	| Two-digit year                                                                                                                              	|
| %m    	| Two-digit month [01, 12]                                                                                                                    	|
| %d    	| Two-digit day [01, 31]                                                                                                                      	|
| %H    	| Hour (24-hour clock) [00, 23]                                                                                                               	|
| %I    	| Hour (12-hour clock) [01, 12]                                                                                                               	|
| %M    	| Two-digit minute [00, 59]                                                                                                                   	|
| %S    	| Second [00, 61] (seconds 60, 61 account for leap seconds)                                                                                   	|
| %w    	| Weekday as integer [0 (Sunday), 6]                                                                                                          	|
| %U    	| Week number of the year [00, 53]; Sunday is considered the first day of the week, and days before the first Sunday of the year are “week 0” 	|
| %W    	| Week number of the year [00, 53]; Monday is considered the first day of the week, and days before the first Monday of the year are “week 0” 	|
| %z    	| UTC time zone offset as +HHMM or -HHMM; empty if time zone naive                                                                            	|
| %F    	| Shortcut for %Y-%m-%d (e.g., 2012-4-18)                                                                                                     	|
| %D    	| Shortcut for %m/%d/%y (e.g., 04/18/12)                                                                                                      	|

In [None]:
# The opposite function of 'strptime' is the string formatting function 'strftime()'
type(dt.datetime.strftime(dt.date(2015,5,22), '%Y-%m-%d'))

In [None]:
# Can you convert 'feb 28 2021 3pm' string?
dt.datetime.strptime('feb 28 2021 3pm', '%b %d %Y %I%p')

In [None]:
# You can also use the 'parse()' function from the 'dateutil' module, which is more better

==========

## Pandas Time-series Functions
- Pandas Timestamp Functions
- Date Parsing
    - Using ***DatetimeIndex*** Object
    - Using pd.read_csv()'s ***'parse_dates'*** attribute
    - Dealing with ***pd.to_datetime()*** Method
- Indexing & Slicing Time-series
    - Custom Indexing using ***pd.date_range()***
    - Shifting Dates with ***pd.DateOffset()***
- Resampling Time Series Data
    - Resampling with ***resample()***
    - Changing the frequency Using ***asfreq()***

Documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

### Importing Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()

##### Pandas Timestamp Functions

In [None]:
# Creating a Timestamp object using the 'datetime' object
pd.Timestamp(dt.datetime(2012, 5, 1))

In [None]:
pd.Timestamp("2012-05-01")

In [None]:
pd.Timestamp(2012, 5, 1)

In [None]:
# You can create the TimeStamp object with any input formatting
ts = pd.Timestamp('3rd of April 2021')

In [None]:
type(ts)

In [None]:
ts.day_name()

In [None]:
ts.days_in_month

In [None]:
ts.quarter

### Date Parsing

##### Indexing a Dataset with a DatetimeIndex Object

Timestamp can serve as an index. Lists of Timestamp are automatically coerced to DatetimeIndex object.

In [None]:
dates = [pd.Timestamp("2012-05-01"), pd.Timestamp("2012-05-02"), pd.Timestamp("2012-05-03")]
dates

In [None]:
ts = pd.Series(np.random.randn(3), dates)
ts

In [None]:
ts.index

In [None]:
type(ts.index)

Let's Checkout a Real-life Data

In [None]:
# Let's checkout on a real data
temp = pd.read_csv('data/temps.csv')

In [None]:
temp.head()

In [None]:
temp.info()

In [None]:
# Converting to the 'datetime' object
temp['datetime'] = temp['datetime'].astype(np.datetime64)

In [None]:
temp.info()

In [None]:
# Setting the index with the date column is considered to be one of the best-practices when dealing with time-series data
temp.set_index('datetime', inplace=True)

In [None]:
temp.head()

In [None]:
temp.index

In [None]:
temp.index.day_name()

In [None]:
# Creating a new column to represent the respective name of the day
temp['Day_Name'] = temp.index.day_name()

In [None]:
temp.head()

In [None]:
temp.groupby('Day_Name').mean().round(2).idxmax()

In [None]:
temp.drop(columns='Day_Name', inplace=True)

In [None]:
temp.head()

##### Dealing with datetime object when importing data using pd.read_csv() method

In [None]:
# Use 'parse_dates' attribute to convert an object to a datetime object
temp = pd.read_csv('data/temps.csv', parse_dates=True, index_col=['datetime'])

In [None]:
temp.head()

In [None]:
temp.info()

In [None]:
temp.index[0]

In [None]:
type(temp.index[0])

##### Dealing with datetime object using pd.to_datetime() method

In [None]:
# Let's re-import the 'temp.csv' again and deal with pd.to_datetime() method
temp = pd.read_csv('data/temps.csv')

In [None]:
temp.head()

In [None]:
temp.info()

In [None]:
# Parsing date with 'pd.to_datetime()' function
pd.to_datetime(temp['datetime'])

In [None]:
temp = temp.set_index(pd.to_datetime(temp['datetime'])).drop('datetime', axis=1)

In [None]:
temp.head()

More about pd.to_datetime() method

In [None]:
pd.to_datetime("2015-05-20")

In [None]:
pd.to_datetime("2015-05-20 10:30:20")

In [None]:
pd.to_datetime("20150520")

In [None]:
pd.to_datetime("2015/05/20")

In [None]:
pd.to_datetime("2015 05 20")

In [None]:
#pd.to_datetime("2015-20-05")

In [None]:
pd.to_datetime("2015 May 20")

In [None]:
pd.to_datetime("May 2015 20")

In [None]:
pd.to_datetime("2015 20th may")

In [None]:
pd.to_datetime(["2015-05-20", "Feb 20 2015"])

In [None]:
pd.to_datetime(["2015-05-20", "Feb 20 2015", "Elephant"], errors="coerce")

### Date Indexing & Slicing

##### Indexing & Slicing Time-series

In [None]:
temp = pd.read_csv('data/temps.csv', parse_dates=['datetime'], index_col=['datetime'])

In [None]:
temp.head()

In [None]:
temp.info()

In [None]:
temp.loc["2013-01-01 01:00:00"]

In [None]:
temp.loc["2015"]

In [None]:
temp.loc["2015-05"]

In [None]:
temp.loc["2015-05-20"]

In [None]:
temp.loc["2015-01-01" : "2015-12-31"]

In [None]:
temp.loc["2015/05/20":]

In [None]:
temp.loc[:"2015-05-20"]

In [None]:
temp.loc["20FEBRUARY2015"]

In [None]:
temp.iloc[0:2]

##### Custom Indexing using *pd.date_range()*

In [None]:
start = dt.datetime(2011, 1, 1)
end = dt.datetime(2012, 1, 1)
index = pd.date_range(start, end)
index

In [None]:
rng = pd.date_range(start, end)
rng

In [None]:
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

More about *pd.date_range()* function

| Alias                    	| Offset type          	| Description                                                                                                                                                     	|
|--------------------------	|----------------------	|-----------------------------------------------------------------------------------------------------------------------------------------------------------------	|
| D                        	| Day                  	| Calendar daily                                                                                                                                                  	|
| B                        	| BusinessDay          	| Business daily                                                                                                                                                  	|
| H                        	| Hour                 	| Hourly                                                                                                                                                          	|
| T or min                 	| Minute               	| Minutely                                                                                                                                                        	|
| S                        	| Second               	| Secondly                                                                                                                                                        	|
| L or ms                  	| Milli                	| Millisecond (1/1,000 of 1 second)                                                                                                                               	|
| U                        	| Micro                	| Microsecond (1/1,000,000 of 1 second)                                                                                                                           	|
| M                        	| MonthEnd             	| Last calendar day of month                                                                                                                                      	|
| BM                       	| BusinessMonthEnd     	| Last business day (weekday) of month                                                                                                                            	|
| MS                       	| MonthBegin           	| First calendar day of month                                                                                                                                     	|
| BMS                      	| BusinessMonthBegin   	| First weekday of month                                                                                                                                          	|
| W-MON, W-TUE, ...        	| Week                 	| Weekly on given day of week (MON, TUE, WED, THU, FRI, SAT, or SUN)                                                                                              	|
| WOM-1MON, WOM-2MON, ...  	| WeekOfMonth          	| Generate weekly dates in the first, second, third, or fourth week of the month (e.g., WOM-3FRI for the third Friday of each month)                              	|
| Q-JAN, Q-FEB, ...        	| QuarterEnd           	| Quarterly dates anchored on last calendar day of each month, for year ending in indicated month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, or DEC) 	|
| BQ-JAN, BQ-FEB, ...      	| BusinessQuarterEnd   	| Quarterly dates anchored on last weekday day of each month, for year ending in indicated month                                                                  	|
| QS-JAN, QS-FEB, ...      	| QuarterBegin         	| Quarterly dates anchored on first calendar day of each month, for year ending in indicated month                                                                	|
| BQS-JAN, BQS-FEB, ...    	| BusinessQuarterBegin 	| Quarterly dates anchored on first weekday day of each month, for year ending in indicated month                                                                 	|
| A-JAN, A-FEB, ...        	| YearEnd              	| Annual dates anchored on last calendar day of given month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, or DEC)                                       	|
| BA-JAN, BA-FEB, ...      	| BusinessYearEnd      	| Annual dates anchored on last weekday of given month                                                                                                            	|
| AS-JAN, AS-FEB, ...      	| YearBegin            	| Annual dates anchored on first day of given month                                                                                                               	|
| BAS-JAN, BAS-FEB, ...    	| BusinessYearBegin    	| Annual dates anchored on first weekday of given month                                                                                                           	|

In [None]:
pd.date_range(start = "2015-07-01", end = "2015-07-31", freq= "D")

In [None]:
pd.date_range(start = "2015-07-01", periods = 31, freq = "D")

In [None]:
pd.date_range(end = "2015-07-31", periods = 31, freq = "D")

In [None]:
pd.date_range(start = "2015-07-01", end = "2015-07-31", freq = "B")

In [None]:
pd.date_range(start = "2015-07-31", periods = 10, freq = "H")

In [None]:
pd.date_range(start = "2015-07-01", periods = 6,  freq = "W")

In [None]:
pd.date_range(start = "2015-07-01", periods = 6,  freq = "W-Wed")

In [None]:
pd.date_range(start = "2015-07-14", periods = 6,  freq = "M")

In [None]:
pd.date_range(start = "2015-07-14", periods = 6,  freq = "MS")

In [None]:
pd.date_range(start = "2015-07-14", periods = 6,  freq = pd.DateOffset(months = 1))

In [None]:
pd.date_range(start = "2015-07-14", periods = 6,  freq = "Q")

In [None]:
pd.date_range(start = "2015-07-14", periods = 6,  freq = "QS")

In [None]:
pd.date_range(start = "2015-07-14", periods = 6,  freq = "QS-May")

In [None]:
pd.date_range(start = "2015-07-14", periods = 6,  freq = "A")

In [None]:
pd.date_range(start = "2015-07-14", periods = 6,  freq = "AS")

In [None]:
pd.date_range(start = "2015-07-14", periods = 6,  freq = "AS-Jul")

In [None]:
pd.date_range(end = "2018-11-24", periods = 10,  freq = pd.DateOffset(years = 1))

In [None]:
pd.date_range(start = "2015-07-01", periods = 10, freq = "3D8H")

##### Shifting Dates with pd.DateOffset()

In [None]:
do = pd.Timestamp('2016-07-22')

In [None]:
# do - 10

In [None]:
do - pd.DateOffset(days = 10)

In [None]:
temp.set_index(temp.index + pd.DateOffset(minutes=30))

### Resampling Time Series Data

##### Resampling Time Series with *resample()*

In [None]:
rng = pd.date_range("1/1/2012", periods=100, freq="S")
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts

In [None]:
ts.resample("5Min").sum()

In [None]:
ts.resample("5Min").mean()

In [None]:
# An open-high-low-close chart (also OHLC) is used to illustrate movements in the price of a financial instrument over time
ts.resample("5Min").ohlc()

Let's Checkout a Real-life Data

In [None]:
temp = pd.read_csv("temp.csv", parse_dates= ["datetime"], index_col = "datetime")

In [None]:
temp.head()

In [None]:
temp.info()

In [None]:
# resample() is a time-based groupby, followed by a reduction method on each of its groups.
list(temp.resample("D"))[1][1]

In [None]:
temp.head(25)

In [None]:
temp.resample("D").sum()

In [None]:
temp.resample("2H").first()

In [None]:
temp.resample("W").mean()

In [None]:
temp.resample("W-Wed").mean()

In [None]:
temp.resample("M").mean()

In [None]:
temp.resample("MS").mean()

In [None]:
temp.resample("MS", loffset="14D").mean()

In [None]:
temp.resample("Q").mean()

In [None]:
temp.resample("Q-Feb").mean()

In [None]:
temp.resample("Y").mean()

In [None]:
temp.resample("YS").mean()

In [None]:
temp.resample("M", kind = "period").mean()

In [None]:
temp.resample("W", kind = "period").mean()

In [None]:
temp.resample("Q", kind = "period").mean()

In [None]:
temp.resample("A", kind = "period").mean()

In [None]:
temp_m = temp.resample("M", kind = "period").mean()

In [None]:
temp_m

In [None]:
temp_m.info()

In [None]:
temp_m.index[0]

In [None]:
temp_m.plot(figsize = (15, 8), fontsize = 15)
plt.show()

In [None]:
temp.plot(figsize = (15, 8), fontsize = 15)
plt.show()

##### Changing the frequency Using *asfreq()* 

In [None]:
index = pd.date_range('1/1/2000', periods=4, freq='T')
series = pd.Series([0.0, None, 2.0, 3.0], index=index)
df = pd.DataFrame({'s':series})
df

In [None]:
df.asfreq(freq='30S')

In [None]:
df.asfreq(freq='30S', fill_value=9.0)

==========

# Case-study [Time-series Analysis]

- Note that this tutorial is inspired by: https://fivethirtyeight.com/features/how-fast-youll-abandon-your-new-years-resolutions/
- Google Trends Search: https://trends.google.com/trends/explore?date=all&q=diet,gym,finance
- DataCamp Full Tutorial: https://www.datacamp.com/community/tutorials/time-series-analysis-tutorial

### Importing Packages and Data

In [None]:
# Import packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()

In [None]:
df = pd.read_csv('data/multiTimeline.csv', skiprows=1)
df.head()

In [None]:
df.info()

### Wrangle Your Data

In [None]:
# The first thing that you want to do is rename the columns of your DataFrame
df.columns = ['month', 'diet', 'gym', 'finance']
df.head()

In [None]:
# Next, you'll turn the 'month' column into a DateTime data type and make it the index of the DataFrame
df.month = pd.to_datetime(df.month)

In [None]:
df.set_index('month', inplace=True)

In [None]:
df.head()

### Exploratory Data Analysis (EDA)

In [None]:
# Now it's time to explore your DataFrame visually
df.plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('Year', fontsize=20)

In [None]:
# you can also plot the 'diet' column by itself as a time series
df[['diet']].plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('Year', fontsize=20)

### Trends and Seasonality in Time Series Data

##### Identifying Trends in Time Series

There are several ways to think about identifying trends in time series. One popular way is by taking a *rolling average*, which means that, for each time point, you take the average of the points on either side of it. Note that the number of points is specified by a *window size*, which you need to choose.

*We need to remove the seasonality to identify the trends in the data*

In [None]:
diet = df[['diet']]
diet.rolling(12).mean().plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('Year', fontsize=20);

In [None]:
gym = df[['gym']]
gym.rolling(12).mean().plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('Year', fontsize=20);

In [None]:
df_rm = pd.concat([diet.rolling(12).mean(), gym.rolling(12).mean()], axis=1)
df_rm.plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('Year', fontsize=20);

==========

# THANK YOU!