# Introduction to Time Series with Pandas

Most of our data will have a datetime index, so let's learn how to deal with this sort of data with pandas!

## Python Datetime Review
In the course introduction section we discussed Python datetime objects.

In [1]:
from datetime import datetime

In [2]:
# To illustrate the order of arguments
my_year = 2017
my_month = 1
my_day = 2
my_hour = 13
my_minute = 30
my_second = 15

In [3]:
# January 2nd, 2017
my_date = datetime(my_year,my_month,my_day)

In [4]:
# Defaults to 0:00
my_date 

datetime.datetime(2017, 1, 2, 0, 0)

In [5]:
# January 2nd, 2017 at 13:30:15
my_date_time = datetime(my_year,my_month,my_day,my_hour,my_minute,my_second)

In [6]:
my_date_time

datetime.datetime(2017, 1, 2, 13, 30, 15)

You can grab any part of the datetime object you want

In [7]:
my_date.day

2

In [8]:
my_date_time.hour

13

## NumPy Datetime Arrays
We mentioned that NumPy handles dates more efficiently than Python's datetime format.<br>
The NumPy data type is called <em>datetime64</em> to distinguish it from Python's datetime.

In this section we'll show how to set up datetime arrays in NumPy. These will become useful later on in the course.<br>
For more info on NumPy visit https://docs.scipy.org/doc/numpy-1.15.4/reference/arrays.datetime.html

In [9]:
import numpy as np

In [10]:
# CREATE AN ARRAY FROM THREE DATES
np.array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64')

array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[D]')

<div class="alert alert-info"><strong>NOTE:</strong> We see the dtype listed as <tt>'datetime64[D]'</tt>. This tells us that NumPy applied a day-level date precision.<br>
    If we want we can pass in a different measurement, such as <TT>[h]</TT> for hour or <TT>[Y]</TT> for year.</div>

In [11]:
np.array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[h]')

array(['2016-03-15T00', '2017-05-24T00', '2018-08-09T00'],
      dtype='datetime64[h]')

Here separated by T and hours are 00

In [12]:
np.array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[Y]')

array(['2016', '2017', '2018'], dtype='datetime64[Y]')

## NumPy Date Ranges
Just as <tt>np.arange(start,stop,step)</tt> can be used to produce an array of evenly-spaced integers, we can pass a <tt>dtype</tt> argument to obtain an array of dates. Remember that the stop date is <em>exclusive</em>.

In [13]:
# AN ARRAY OF DATES FROM 6/1/18 TO 6/22/18 SPACED ONE WEEK APART
np.arange('2018-06-01', '2018-06-23', 7, dtype='datetime64[D]')

array(['2018-06-01', '2018-06-08', '2018-06-15', '2018-06-22'],
      dtype='datetime64[D]')

By omitting the step value we can obtain every value based on the precision.

In [14]:
# AN ARRAY OF DATES FOR EVERY YEAR FROM 1968 TO 1975
np.arange('1968', '1976', dtype='datetime64[Y]')

array(['1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975'],
      dtype='datetime64[Y]')

## Pandas Datetime Index

We'll usually deal with time series as a datetime index when working with pandas dataframes. Fortunately pandas has a lot of functions and methods to work with time series!<br>
For more on the pandas DatetimeIndex visit https://pandas.pydata.org/pandas-docs/stable/timeseries.html

In [15]:
# Index >>> DateTime
# Series
# DataFrame

data=np.random.randn(7,2)
cols=['A','B']
print(data)

[[ 2.21272770e-01  1.32192261e+00]
 [-8.78660507e-02  2.13159013e-02]
 [ 6.93947297e-01  1.32769163e+00]
 [-1.37674492e-03  1.31370701e+00]
 [-7.72437271e-01 -9.12022901e-01]
 [ 4.70644704e-01 -8.58539911e-01]
 [-2.26840682e-01 -1.55421297e+00]]


The simplest way to build a DatetimeIndex is with the <tt><strong>pd.date_range()</strong></tt> method:

In [16]:
import pandas as pd
idx=pd.date_range('1/1/2020',periods=7,freq='D')

In [17]:
df = pd.DataFrame(data,columns=cols,index=idx)

In [18]:
df

Unnamed: 0,A,B
2020-01-01,0.221273,1.321923
2020-01-02,-0.087866,0.021316
2020-01-03,0.693947,1.327692
2020-01-04,-0.001377,1.313707
2020-01-05,-0.772437,-0.912023
2020-01-06,0.470645,-0.85854
2020-01-07,-0.226841,-1.554213


In [19]:
df.index

DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07'],
              dtype='datetime64[ns]', freq='D')

In [20]:
df.index.max()

Timestamp('2020-01-07 00:00:00')

In [21]:
# Latest date index location
df.index.argmax()

6

<div class="alert alert-info"><strong>DatetimeIndex Frequencies:</strong> When we used <tt>pd.date_range()</tt> above, we had to pass in a frequency parameter <tt>'D'</tt>. This created a series of 7 dates spaced one day apart. We'll cover this topic in depth in upcoming lectures, but for now, a list of time series offset aliases like <tt>'D'</tt> can be found <a href='http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases'>here</a>.</div>

Another way is to convert incoming text with the <tt><strong>pd.to_datetime()</strong></tt> method:

In [22]:
df = pd.DataFrame({'year': [2015, 2016],
                  'month': [2, 3],
                  'day': [4, 5]})

In [23]:
pd.to_datetime(df)

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

A third way is to pass a list or an array of datetime objects into the <tt><strong>pd.DatetimeIndex()</strong></tt> method:

In [24]:
# importing pandas as pd 
import pandas as pd 
  
# Create the DatetimeIndex 
# Here 'W' represents Weekly frequency 
didx = pd.DatetimeIndex(start ='2000-01-10 06:30', freq ='W',  
                            periods = 3, tz ='Asia/Calcutta') 
  
# Print the DatetimeIndex 
print(didx) 

DatetimeIndex(['2000-01-16 06:30:00+05:30', '2000-01-23 06:30:00+05:30',
               '2000-01-30 06:30:00+05:30'],
              dtype='datetime64[ns, Asia/Calcutta]', freq='W-SUN')


  import sys


Notice that even though the dates came into pandas with a day-level precision, pandas assigns a nanosecond-level precision with the expectation that we might want this later on.

To set an existing column as the index, use <tt>.set_index()</tt><br>
><tt>df.set_index('Date',inplace=True)</tt>

<div class="alert alert-info"><strong>NOTE:</strong> Normally we would find index locations by running <tt>.idxmin()</tt> or <tt>.idxmax()</tt> on <tt>df['column']</tt> since <tt>.argmin()</tt> and <tt>.argmax()</tt> have been deprecated. However, we still use <tt>.argmin()</tt> and <tt>.argmax()</tt> on the index itself.</div>

## Done, let's move on!