# Week 15 Time Series Data

Time series data is a data set where instances are indexed by time. It is an important form of structured data in many fields such as finance, economics, ecology, neuroscience, and physics. 

Reading:
- Textbook, Chapter 11

In [1]:
# ! pip install pandas --upgrade --user

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
pd.__version__

'1.3.4'

## 1. Date and Time Data Types and Tools

In Python, the `datetime.datetime` class is widely used to represent date and time data.

In [5]:
from datetime import datetime

datetime.now()

datetime.datetime(2021, 12, 8, 12, 22, 5, 856542)

In [3]:
datetime.now().year

2021

In [4]:
datetime.now().day

8

In [5]:
datetime.now().month

12

We can use `datetime.timedelta` to represent the temporal difference between two `datetime` objects.

In [11]:
from datetime import timedelta

delta = timedelta(weeks=10)

datetime.now() + delta

datetime.datetime(2022, 2, 16, 11, 53, 40, 539807)

In [12]:
date1 = datetime(2019, 12, 12)
date2 = datetime.now()
date2 - date1

datetime.timedelta(days=727, seconds=42855, microseconds=642547)

**Convert between string and datetime**

In [13]:
# datetime to string
date = datetime(2011, 1, 3, 23, 30, 45)
str(date)

'2011-01-03 23:30:45'

In [14]:
# Convert to format "YYYY-MM-DD"
date.strftime("%Y/%m/%d %H:%M, %A")

'2011/01/03 23:30, Monday'

Datetime formats:
- %Y: Four-digit year
- %y: Two-digit year
- %m: Two-digit month
- %d: Two-digit day
- %H: Hour 0 - 23
- %I: Hour 1 - 12
- %M: Two-digit minute
- %S: Second
- %A: Weekday

[More on this](https://docs.python.org/2/library/datetime.html)

In [15]:
# Exercise: convert date to "01/03/2011"

date.strftime("%m/%d/%Y")

'01/03/2011'

In [17]:
# Exercise: convert date to "01-03-2011 23:30"

date.strftime("%m-%d-%Y %H:%M")

'01-03-2011 23:30'

**Parse a datetime string**

In [18]:
# String to datetime
from dateutil.parser import parse
parse("2011-01-03")

datetime.datetime(2011, 1, 3, 0, 0)

In [19]:
parse("Jan 31, 1997 10:45 PM")

datetime.datetime(1997, 1, 31, 22, 45)

In [20]:
# Many countries use format "DD/MM/YYYY". We need to set dayfirst=True
# so that the date is correctly recognized.
parse("06/12/2011", dayfirst=True)

datetime.datetime(2011, 12, 6, 0, 0)

In [21]:
parse("06/12/2011")

datetime.datetime(2011, 6, 12, 0, 0)

## 2. Time Series Basics

In [6]:
# Create a list of datetime objects
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 2, 7), datetime(2011, 2, 8),
         datetime(2011, 3, 10), datetime(2011, 3, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts

2011-01-02    1.093962
2011-01-05   -0.343157
2011-02-07    0.836853
2011-02-08   -0.381102
2011-03-10    1.658184
2011-03-12    1.525816
dtype: float64

In [23]:
# Select 01/05
ts['2011-01-05']

-1.7054351006893498

In [25]:
ts[1]

-1.7054351006893498

In [24]:
ts['01/05/2011']

-1.7054351006893498

In [26]:
ts['20110105']

-1.7054351006893498

In [27]:
# Select a range of dates
ts['2011-02']

2011-02-07   -0.140924
2011-02-08    1.342997
dtype: float64

In [28]:
ts['2011-02-01':'2011-02-8'] # the end datetime is also included

2011-02-07   -0.140924
2011-02-08    1.342997
dtype: float64

In [29]:
ts['2011-02-01':]

2011-02-07   -0.140924
2011-02-08    1.342997
2011-03-10    2.214361
2011-03-12    0.448840
dtype: float64

In [30]:
ts[:"2011-03-10"]

2011-01-02    0.233410
2011-01-05   -1.705435
2011-02-07   -0.140924
2011-02-08    1.342997
2011-03-10    2.214361
dtype: float64

## 3. Date Ranges

In [7]:
# manually populate a list of dates
dates = [datetime(2011, 1, 2), datetime(2011, 3, 10), datetime(2011, 4, 1)]
# ts[dates] # Pandas no longer supports missing indices
ts[ts.index.isin(dates)]

2011-01-02    1.093962
2011-03-10    1.658184
dtype: float64

In [33]:
# Create a range of dates
daterange = pd.date_range('2011-01-01', periods=8)
print(daterange)

DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
               '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08'],
              dtype='datetime64[ns]', freq='D')


In [34]:
daterange = pd.date_range('2011-01-01', periods=5, freq='2D')
print(daterange)

DatetimeIndex(['2011-01-01', '2011-01-03', '2011-01-05', '2011-01-07',
               '2011-01-09'],
              dtype='datetime64[ns]', freq='2D')


In [35]:
daterange = pd.date_range("2011-01-01", periods=5, freq="10H")
print(daterange)

DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 10:00:00',
               '2011-01-01 20:00:00', '2011-01-02 06:00:00',
               '2011-01-02 16:00:00'],
              dtype='datetime64[ns]', freq='10H')


In [8]:
# Sample business days only
daterange = pd.date_range("2011-01-01", periods=10, freq="B")
print(daterange)

DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
               '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14'],
              dtype='datetime64[ns]', freq='B')


In [10]:
# ts[daterange] # This is no longer supported by Pandas

In [11]:
ts[ts.index.isin(daterange)]

2011-01-05   -0.343157
dtype: float64

## 4. Shifting Data


In [12]:
prices = pd.DataFrame(np.random.rand(4) + 100,
                      index=pd.date_range('2019-11-01', periods=4),
                      columns=['Price'])
prices

Unnamed: 0,Price
2019-11-01,100.014337
2019-11-02,100.318233
2019-11-03,100.530102
2019-11-04,100.670527


In [None]:
prices - 100

In [14]:
from datetime import timedelta
# How to create a column storing yesterday's price?
for date in prices.index:
    yesterday = date - timedelta(days=1)
    if yesterday in prices.index:
        prices.loc[date, "Yesterday's Price"] = prices.loc[yesterday, "Price"]
prices

Unnamed: 0,Price,Yesterday's Price
2019-11-01,100.014337,
2019-11-02,100.318233,100.014337
2019-11-03,100.530102,100.318233
2019-11-04,100.670527,100.530102


In [15]:
prices['Price Change'] = prices['Price'] - prices['Yesterday\'s Price']
prices

Unnamed: 0,Price,Yesterday's Price,Price Change
2019-11-01,100.014337,,
2019-11-02,100.318233,100.014337,0.303896
2019-11-03,100.530102,100.318233,0.211869
2019-11-04,100.670527,100.530102,0.140425


In [16]:
prices = pd.DataFrame(np.random.rand(4) + 100,
                      index=pd.date_range('2019-11-01', periods=4),
                      columns=['Price'])
prices

Unnamed: 0,Price
2019-11-01,100.630773
2019-11-02,100.023517
2019-11-03,100.76473
2019-11-04,100.404824


In [17]:
prices_yesterday = prices.shift(1)
prices_yesterday

Unnamed: 0,Price
2019-11-01,
2019-11-02,100.630773
2019-11-03,100.023517
2019-11-04,100.76473


In [18]:
prices = pd.merge(prices, prices_yesterday, left_index=True, right_index=True,
                  suffixes=["Today", "Yesterday"])
prices

Unnamed: 0,PriceToday,PriceYesterday
2019-11-01,100.630773,
2019-11-02,100.023517,100.630773
2019-11-03,100.76473,100.023517
2019-11-04,100.404824,100.76473


In [19]:
# Exercise: Compute the percent changes between yesterday and today's price
# Formula: percent = (today's price - yesterday's price) / yesterday's price

prices['Computed'] = (prices['PriceToday'] - prices['PriceYesterday']) / prices['PriceYesterday']
prices

Unnamed: 0,PriceToday,PriceYesterday,Computed
2019-11-01,100.630773,,
2019-11-02,100.023517,100.630773,-0.006034
2019-11-03,100.76473,100.023517,0.00741
2019-11-04,100.404824,100.76473,-0.003572
