# Chapter 11 Time Series

Time series data is a data set where instances are indexed by time. It is an important form of structured data in many fields such as finance, economics, ecology, neuroscience, and physics. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## 1. Date and Time Data Types and Tools

In Python, the `datetime.datetime` class is widely used to represent date and time data.

In [None]:
from datetime import datetime

datetime.now()

In [None]:
datetime.now().year

In [None]:
datetime.now().day

In [None]:
datetime.now().month

We can use `datetime.timedelta` to represent the temporal difference between two `datetime` objects.

In [None]:
from datetime import timedelta

delta = timedelta(10)

datetime.now() + delta

In [None]:
date1 = datetime(2019, 12, 12)
date2 = datetime.now()
date1 - date2

**Convert between string and datetime**

In [None]:
# datetime to string
date = datetime(2011, 1, 3, 23, 30, 45)
str(date)

In [None]:
# Convert to format "YYYY-MM-DD"
date.strftime("%Y/%m/%d %H:%M, %A")

Datetime formats:
- %Y: Four-digit year
- %y: Two-digit year
- %m: Two-digit month
- %d: Two-digit day
- %H: Hour 0 - 23
- %I: Hour 1 - 12
- %M: Two-digit minute
- %S: Second
- %w: Weekday

[More on this](https://docs.python.org/2/library/datetime.html)

In [None]:
# Exercise: convert date to "01/03/2011"
date = datetime(2011, 1, 3)
date.strftime("%m/%d/%Y") # the formats are case-sensitive

In [None]:
# Exercise: convert date to "01-03-2011 00:00"
date.strftime("%m-%d-%Y %H:%M")

In [None]:
# String to datetime
from dateutil.parser import parse
parse("2011-01-03").second

In [None]:
parse("Jan 31, 1997 10:45 PM")

In [None]:
# Many countries use format "DD/MM/YYYY". We need to set dayfirst=True
# so that the date is correctly recognized.
parse("06/12/2011", dayfirst=True)

In [None]:
parse("06/12/2011")

## 2. Time Series Basics

In [None]:
# Create a list of datetime objects
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 2, 7), datetime(2011, 2, 8),
         datetime(2011, 3, 10), datetime(2011, 3, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts

In [None]:
# Select 01/05
ts['2011-01-05']

In [None]:
ts[1]

In [None]:
ts['01/05/2011']

In [None]:
ts['20110105']

In [None]:
# Select a range of dates
ts['2011-02']

In [None]:
ts['2011-02-01':'2011-02-8'] # the end datetime is also included

In [None]:
ts['2011-02-01':]

In [None]:
ts[:"2011-03-10"]

## 3. Date Ranges

In [None]:
# manually populate a list of dates
dates = [datetime(2011, 1, 2), datetime(2011, 3, 10), datetime(2011, 4, 1)]
ts[dates]

In [None]:
# Create a range of dates
daterange = pd.date_range('2011-01-01', periods=8)
print(daterange)

In [None]:
daterange = pd.date_range('2011-01-01', periods=5, freq='2D')
print(daterange)

In [None]:
daterange = pd.date_range("2011-01-01", periods=5, freq="10H")
print(daterange)

In [None]:
# Sample business days only
daterange = pd.date_range("2011-01-01", periods=10, freq="B")
print(daterange)

In [None]:
ts[daterange]

In [None]:
ts[ts.index.isin(daterange)]

## 4. Shifting Data


In [None]:
prices = pd.DataFrame(np.random.rand(4) + 100,
                      index=pd.date_range('2019-11-01', periods=4),
                      columns=['Price'])
prices

In [None]:
prices - 100

In [None]:
# How to create a column storing yesterday's price?
for date in prices.index:
    yesterday = date - timedelta(days=1)
    if yesterday in prices.index:
#         print(prices.loc[yesterday])
        prices.loc[date, "Yesterday's Price"] = prices.loc[yesterday, "Price"]
prices

In [None]:
prices = pd.DataFrame(np.random.rand(4) + 100,
                      index=pd.date_range('2019-11-01', periods=4),
                      columns=['Price'])
prices_yesterday = prices.shift(1)
prices_yesterday

In [None]:
prices = pd.merge(prices, prices_yesterday, left_index=True, right_index=True,
                  suffixes=["Today", "Yesterday"])
prices

In [None]:
# Exercise: Compute the percent changes between yesterday and today's price
# Formula: percent = (today's price - yesterday's price) / yesterday's price
prices['PercentOfChange'] = (prices['PriceToday'] - prices['PriceYesterday']) \
                            / prices['PriceYesterday']
prices

## Analyzing Stock Prices

In [None]:
# Install pandas-datareader to download data
# https://pydata.github.io/pandas-datareader/devel/remote_data.html#tiingo
!python -m pip install --upgrade pip # upgrade pip
!pip install pandas-datareader # install package

In [None]:
import os
import pandas_datareader as pdr
api_key = "1add15c49f55eca6cfdce0109f11c6d477974b14"
# api_key = "Find-your-key-on-tiingo"
df = pdr.get_data_tiingo('GOOG', api_key=api_key)
df.head()

In [None]:
# Download the daily prices of Apple ("AAPL") from 2019-01-01 to 2019-12-01
aapl = pdr.get_data_tiingo("AAPL",
                           api_key=api_key,
                           start="2014-01-01",
                           end="2019-12-01")
aapl.head()

In [None]:
aapl.tail()

In [None]:
# Remove zeros from the index
aapl = aapl.reset_index()

In [None]:
aapl['date'] = aapl['date'].apply(lambda x: x.strftime("%Y-%m-%d"))
aapl.head()

In [None]:
aapl['date'] = aapl['date'].apply(parse)
aapl.head()

In [None]:
aapl = aapl.set_index('date')
aapl.head()

In [None]:
# Draw a line chart of close prices
aapl['adjClose'].plot(figsize=(15, 5))

**1. Check for missing values**

**2. Daily Change**

- Abosolute daily change is the difference between open and close.
- Relative daily change is ratio of absolute daily change and the open price, converted to a percentage. This is especially helpful if you want to compare the daily price change in multiple stocks.

**3. Day-To-Day Change**

The open price does not necessarily coincide with the previous close price, possibly due to change of price expections over night. Thus it is helpful to calculate percent changes of each close price over the previous close price.

**4. Monthly Performance**

For long-term investors, they may prefer monthly performance data. Create a new data frame containing adjOpen, adjHigh, adjLow, adjClose, and adjVolume for each month between 2014 and 2019.

**5. Moving Average**

The daily price is full of random ups and downs, making it difficult to see the long-term trend. It is important to find the average performance over a fairly longer period (10 days, 30 days, etc.). A moving average (or rolling average) is a calculation that find the average price for each period of time.

**6. Volatility**

Volatility means the risk of the stock over a period of time. It is often measured as the standard deviation of prices over each period. The higher the volatility, the risker the stock.

**7. Change Points**

Changepoints occur when the price goes from increasing to decreasing or vice versa). These times are extremely important because knowing when a stock will reach a peak or is about to take off could have significant economic benefits. However, it is trickly to properly recognize changepoints, since we cannot make decisions based on future data. For simplicity, let say a day is a positive changepoint if its day-to-day change is positive, while the three previous day-to-day changes are all negative. Similarly, we recognize a day as a negative changepoint if its day-to-day change is negative, while all three previous day-to-day changes are positive.

**8. Prediction**

Being able to predict future stock prices is every investor's dream. While it is very challenging to do so, data analysis does help. Today let's test a basic modeling technique called linear regression.