# Week 14
# Time Series Data

Time series data is a data set where instances are indexed by time. It is an important form of structured data in many fields such as finance, economics, ecology, neuroscience, and physics. 

Reading:
- Textbook, Chapter 11

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## 1. Date and Time Data Types and Tools

In Python, the `datetime.datetime` class is widely used to represent date and time data.

In [2]:
from datetime import datetime

datetime.now()

datetime.datetime(2020, 11, 30, 11, 11, 11, 38574)

In [3]:
datetime.now().year

2020

In [4]:
datetime.now().day

30

In [5]:
datetime.now().month

11

We can use `datetime.timedelta` to represent the temporal difference between two `datetime` objects.

In [7]:
from datetime import timedelta

delta = timedelta(10)

datetime.now() + delta

datetime.datetime(2020, 12, 10, 11, 13, 4, 132939)

In [8]:
date1 = datetime(2019, 12, 12)
date2 = datetime.now()
date2 - date1

datetime.timedelta(days=354, seconds=40472, microseconds=495459)

**Convert between string and datetime**

In [9]:
# datetime to string
date = datetime(2011, 1, 3, 23, 30, 45)
str(date)

'2011-01-03 23:30:45'

In [12]:
# Convert to format "YYYY-MM-DD"
date.strftime("%Y/%m/%d %H:%M, %A")

'2011/01/03 23:30, Monday'

Datetime formats:
- %Y: Four-digit year
- %y: Two-digit year
- %m: Two-digit month
- %d: Two-digit day
- %H: Hour 0 - 23
- %I: Hour 1 - 12
- %M: Two-digit minute
- %S: Second
- %A: Weekday

[More on this](https://docs.python.org/2/library/datetime.html)

In [None]:
# Exercise: convert date to "01/03/2011"



In [None]:
# Exercise: convert date to "01-03-2011 00:00"



**Parse a datetime string**

In [13]:
# String to datetime
from dateutil.parser import parse
parse("2011-01-03")

datetime.datetime(2011, 1, 3, 0, 0)

In [14]:
parse("Jan 31, 1997 10:45 PM")

datetime.datetime(1997, 1, 31, 22, 45)

In [15]:
# Many countries use format "DD/MM/YYYY". We need to set dayfirst=True
# so that the date is correctly recognized.
parse("06/12/2011", dayfirst=True)

datetime.datetime(2011, 12, 6, 0, 0)

In [16]:
parse("06/12/2011")

datetime.datetime(2011, 6, 12, 0, 0)

## 2. Time Series Basics

In [17]:
# Create a list of datetime objects
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 2, 7), datetime(2011, 2, 8),
         datetime(2011, 3, 10), datetime(2011, 3, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts

2011-01-02   -0.874433
2011-01-05    0.949954
2011-02-07    0.271973
2011-02-08    1.349570
2011-03-10    0.677004
2011-03-12   -0.484539
dtype: float64

In [18]:
# Select 01/05
ts['2011-01-05']

0.9499538788672598

In [19]:
ts[1]

0.9499538788672598

In [20]:
ts['01/05/2011']

0.9499538788672598

In [21]:
ts['20110105']

0.9499538788672598

In [22]:
# Select a range of dates
ts['2011-02']

2011-02-07    0.271973
2011-02-08    1.349570
dtype: float64

In [23]:
ts['2011-02-01':'2011-02-8'] # the end datetime is also included

2011-02-07    0.271973
2011-02-08    1.349570
dtype: float64

In [24]:
ts['2011-02-01':]

2011-02-07    0.271973
2011-02-08    1.349570
2011-03-10    0.677004
2011-03-12   -0.484539
dtype: float64

In [25]:
ts[:"2011-03-10"]

2011-01-02   -0.874433
2011-01-05    0.949954
2011-02-07    0.271973
2011-02-08    1.349570
2011-03-10    0.677004
dtype: float64

## 3. Date Ranges

In [28]:
# manually populate a list of dates
dates = [datetime(2011, 1, 2), datetime(2011, 3, 10), datetime(2011, 4, 1)]
# ts[dates] # Pandas no longer supports missing indices
ts[ts.index.isin(dates)]

2011-01-02   -0.874433
2011-03-10    0.677004
dtype: float64

In [29]:
# Create a range of dates
daterange = pd.date_range('2011-01-01', periods=8)
print(daterange)

DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
               '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08'],
              dtype='datetime64[ns]', freq='D')


In [30]:
daterange = pd.date_range('2011-01-01', periods=5, freq='2D')
print(daterange)

DatetimeIndex(['2011-01-01', '2011-01-03', '2011-01-05', '2011-01-07',
               '2011-01-09'],
              dtype='datetime64[ns]', freq='2D')


In [31]:
daterange = pd.date_range("2011-01-01", periods=5, freq="10H")
print(daterange)

DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 10:00:00',
               '2011-01-01 20:00:00', '2011-01-02 06:00:00',
               '2011-01-02 16:00:00'],
              dtype='datetime64[ns]', freq='10H')


In [32]:
# Sample business days only
daterange = pd.date_range("2011-01-01", periods=10, freq="B")
print(daterange)

DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
               '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14'],
              dtype='datetime64[ns]', freq='B')


In [33]:
ts[daterange]

2011-01-03         NaN
2011-01-04         NaN
2011-01-05    0.949954
2011-01-06         NaN
2011-01-07         NaN
2011-01-10         NaN
2011-01-11         NaN
2011-01-12         NaN
2011-01-13         NaN
2011-01-14         NaN
Freq: B, dtype: float64

In [34]:
ts[ts.index.isin(daterange)]

2011-01-05    0.949954
dtype: float64

## 4. Shifting Data


In [35]:
prices = pd.DataFrame(np.random.rand(4) + 100,
                      index=pd.date_range('2019-11-01', periods=4),
                      columns=['Price'])
prices

Unnamed: 0,Price
2019-11-01,100.09627
2019-11-02,100.954459
2019-11-03,100.523332
2019-11-04,100.269862


In [None]:
prices - 100

In [36]:
# How to create a column storing yesterday's price?
for date in prices.index:
    yesterday = date - timedelta(days=1)
    if yesterday in prices.index:
        prices.loc[date, "Yesterday's Price"] = prices.loc[yesterday, "Price"]
prices

Unnamed: 0,Price,Yesterday's Price
2019-11-01,100.09627,
2019-11-02,100.954459,100.09627
2019-11-03,100.523332,100.954459
2019-11-04,100.269862,100.523332


In [40]:
prices = pd.DataFrame(np.random.rand(4) + 100,
                      index=pd.date_range('2019-11-01', periods=4),
                      columns=['Price'])
prices

Unnamed: 0,Price
2019-11-01,100.402957
2019-11-02,100.927311
2019-11-03,100.862845
2019-11-04,100.304667


In [41]:
prices_yesterday = prices.shift(1)
prices_yesterday

Unnamed: 0,Price
2019-11-01,
2019-11-02,100.402957
2019-11-03,100.927311
2019-11-04,100.862845


In [42]:
prices = pd.merge(prices, prices_yesterday, left_index=True, right_index=True,
                  suffixes=["Today", "Yesterday"])
prices

Unnamed: 0,PriceToday,PriceYesterday
2019-11-01,100.402957,
2019-11-02,100.927311,100.402957
2019-11-03,100.862845,100.927311
2019-11-04,100.304667,100.862845


In [43]:
# Exercise: Compute the percent changes between yesterday and today's price
# Formula: percent = (today's price - yesterday's price) / yesterday's price

prices['Computed'] = (prices['PriceToday'] - prices['PriceYesterday']) / prices['PriceYesterday']
prices

Unnamed: 0,PriceToday,PriceYesterday,Computed
2019-11-01,100.402957,,
2019-11-02,100.927311,100.402957,0.005223
2019-11-03,100.862845,100.927311,-0.000639
2019-11-04,100.304667,100.862845,-0.005534
