# Week 14
# Time Series Data

Time series data is a data set where instances are indexed by time. It is an important form of structured data in many fields such as finance, economics, ecology, neuroscience, and physics. 

Reading:
- Textbook, Chapter 11

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## 1. Date and Time Data Types and Tools

In Python, the `datetime.datetime` class is widely used to represent date and time data.

In [None]:
from datetime import datetime

datetime.now()

In [None]:
datetime.now().year

In [None]:
datetime.now().day

In [None]:
datetime.now().month

We can use `datetime.timedelta` to represent the temporal difference between two `datetime` objects.

In [None]:
from datetime import timedelta

delta = timedelta(10)

datetime.now() + delta

In [None]:
date1 = datetime(2022, 12, 12)
date2 = datetime.now()
date2 - date1

**Convert between string and datetime**

In [None]:
# datetime to string
date = datetime(2011, 1, 3, 23, 30, 45)
str(date)

In [None]:
# Convert to format "YYYY-MM-DD"
date.strftime("%Y/%m/%d %H:%M, %A")

Datetime formats:
- %Y: Four-digit year
- %y: Two-digit year
- %m: Two-digit month
- %d: Two-digit day
- %H: Hour 0 - 23
- %I: Hour 1 - 12
- %M: Two-digit minute
- %S: Second
- %A: Weekday

[More on this](https://docs.python.org/2/library/datetime.html)

In [None]:
# Exercise: convert date to "01/03/2011"



In [None]:
# Exercise: convert date to "01-03-2011 00:00"



**Parse a datetime string**

Besides `pd.to_datetime()`, we can use `dateutil` to parse a string representing date and time.

In [None]:
# String to datetime
from dateutil.parser import parse
parse("2011-01-03")

In [None]:
parse("Jan 31, 1997 10:45 PM")

In [None]:
# Many countries use format "DD/MM/YYYY". We need to set dayfirst=True
# so that the date is correctly recognized.
parse("06/12/2011", dayfirst=True)

In [None]:
parse("06/12/2011")

## 2. Time Series Basics

In [None]:
# Create a list of datetime objects
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 2, 7), datetime(2011, 2, 8),
         datetime(2011, 3, 10), datetime(2011, 3, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts

In [None]:
# Select 01/05
ts['2011-01-05']

In [None]:
ts[1]

In [None]:
ts['01/05/2011']

In [None]:
ts['20110105']

In [None]:
# Select a range of dates
ts['2011-02']

In [None]:
ts['2011-02-01':'2011-02-8'] # the end datetime is also included

In [None]:
ts['2011-02-01':]

In [None]:
ts[:"2011-03-10"]

As a real-world example, let's retrieve Tesla's stock price from Yahoo Finance.

In [None]:
# if you haven't installed yahoo finance library, execute the command below:
# Install yahoo finance library
!pip install yfinance

In [None]:
import yfinance as yf

In [None]:
# Download the daily prices of Tesla ("TSLA") from 2020-01-01 to 2020-11-30
tsla = yf.download('TSLA', start="2023-01-01", end="2023-04-30")

tsla.head()

In [None]:
# Extract the stock price on a particular day.
tsla.loc["20230101":"20230107"]

In [None]:
# Exercise: Draw a line chart of close prices


## 3. Date Ranges

In [None]:
# manually populate a list of dates
dates = [datetime(2011, 1, 2), datetime(2011, 3, 10), datetime(2011, 4, 1)]
# ts[dates] # Pandas no longer supports missing indices
ts[ts.index.isin(dates)]

In [None]:
# Create a range of dates
daterange = pd.date_range('2011-01-01', periods=8)
print(daterange)

In [None]:
daterange = pd.date_range('2011-01-01', periods=5, freq='2D')
print(daterange)

In [None]:
daterange = pd.date_range("2011-01-01", periods=5, freq="10H")
print(daterange)

In [None]:
# Sample business days only
daterange = pd.date_range("2011-01-01", periods=10, freq="B")
print(daterange)

In [None]:
# There will be an error if not all dates are included in the dataset.
# What can we do about this?
ts[daterange]

In [None]:
ts[ts.index.isin(daterange)]

## 4. Shifting Data


In [None]:
prices = pd.DataFrame(np.random.rand(4) + 100,
                      index=pd.date_range('2019-11-01', periods=4),
                      columns=['Price'])
prices

In [None]:
prices - 100

In [None]:
# How to create a column storing yesterday's price?
for date in prices.index:
    yesterday = date - timedelta(days=1)
    if yesterday in prices.index:
        prices.loc[date, "Yesterday's Price"] = prices.loc[yesterday, "Price"]
prices

In [None]:
prices = pd.DataFrame(np.random.rand(4) + 100,
                      index=pd.date_range('2019-11-01', periods=4),
                      columns=['Price'])
prices

In [None]:
prices_yesterday = prices.shift(1)
prices_yesterday

In [None]:
prices = pd.merge(prices, prices_yesterday, left_index=True, right_index=True,
                  suffixes=["Today", "Yesterday"])
prices

In [None]:
# Exercise: Calculate how much Tesla's stock price has increased or decreased
# at the end of each day, compared to previous day's closing price.



## 5. Analysis of Time Series Data

In [None]:
# Calculate how much percent has Tesla's price changed each day.



In [None]:
# On which days has Tesla price experience the most increase?
# How much had the price increased on those days?



In [None]:
# On which day has Tesla price experience the most decrease? How much decrease?



**Monthly Performance**

For long-term investors, they may prefer monthly performance data. Create a new data frame containing Open, High, Low, Close, and Volume for each month in 2020.

In [None]:
# Group the rows by the month value of their indices, then aggregate the data.



**Moving Average**

The daily price is full of random ups and downs, making it difficult to see the long-term trend. It is important to find the average performance over a fairly longer period (10 days, 30 days, etc.). A moving average (or rolling average) is a calculation that find the average price for each period of time.

In [None]:
# Create 10-day moving average
moving_avg = tsla.rolling(10).mean()

moving_avg.head()

In [None]:
# Plot daily prices and the moving average.

