# Week 12
# Time Series Data

Time series data is a data set where instances are indexed by time. It is an important form of structured data in many fields such as finance, economics, ecology, neuroscience, and physics.

Reading:
- Textbook, Chapter 11

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## 1. Date and Time Data Types and Tools

In Python, the `datetime.datetime` class is widely used to represent date and time data.

In [None]:
from datetime import datetime

datetime.now()

In [None]:
datetime.now().year

In [None]:
datetime.now().day

In [None]:
datetime.now().month

We can use `datetime.timedelta` to represent the temporal difference between two `datetime` objects.

In [None]:
from datetime import timedelta

delta = timedelta(10)

datetime.now() + delta

In [None]:
date1 = datetime(2022, 12, 12)
date2 = datetime.now()
date2 - date1

In [None]:
# Define a timedelta of 10 seconds
ten_seconds = timedelta(seconds=10)
print(ten_seconds)

In [None]:
# Define a time delta of 10 months
ten_months = timedelta(days=10*30)
print(ten_months)

**Convert between string and datetime**

In [None]:
# datetime to string
date = datetime(2011, 1, 3, 23, 30, 45)
str(date)

In [None]:
# Convert to format "YYYY-MM-DD"
date.strftime("%Y-%m-%d")

Datetime formats:
- %Y: Four-digit year
- %y: Two-digit year
- %m: Two-digit month
- %d: Two-digit day
- %H: Hour 0 - 23
- %I: Hour 1 - 12
- %M: Two-digit minute
- %S: Second
- %A: Weekday

[More on this](https://docs.python.org/2/library/datetime.html)

In [None]:
# Exercise: convert date to "01/03/2011"

date.strftime("%m/%d/%Y")

In [None]:
# Exercise: convert date to "01-03-2011 23:30"

date.strftime("%m-%d-%Y %H:%M")

**Parse a datetime string**

Besides `pd.to_datetime()`, we can use `dateutil` to parse a string representing date and time.

In [None]:
# String to datetime
from dateutil.parser import parse
parse("2011-01-03")

In [None]:
parse("Jan 31, 1997 10:45 PM")

In [None]:
# Many countries use format "DD/MM/YYYY". We need to set dayfirst=True
# so that the date is correctly recognized.
parse("06/12/2011", dayfirst=True)

In [None]:
parse("06/12/2011")

## 2. Time Series Basics

In [None]:
# Create a list of datetime objects
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 2, 7), datetime(2011, 2, 8),
         datetime(2011, 3, 10), datetime(2011, 3, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts

In [None]:
# Select 01/05
ts['2011-01-05']

In [None]:
ts[1]

In [None]:
ts['01/05/2011']

In [None]:
ts['20110105']

In [None]:
# Select a range of dates
ts['2011-02']

In [None]:
ts['2011-02-01':'2011-02-8'] # the end datetime is also included

In [None]:
ts['2011-02-01':]

In [None]:
ts[:"2011-03-10"]

As a real-world example, let's retrieve Tesla's stock price from Yahoo Finance.

In [None]:
# if you haven't installed yahoo finance library, execute the command below:
# Install yahoo finance library
!pip install yfinance

In [None]:
import yfinance as yf

In [None]:
# Download the daily prices of Tesla ("TSLA") from 2020-01-01 to 2020-11-30
tsla = yf.download('TSLA', start="2023-01-01", end="2023-04-30")

tsla.head()

In [None]:
# Extract the stock price on a particular day.
tsla.loc[:"2023-01-07"]

In [None]:
# Exercise: Draw a line chart of close prices
# How about showing the prices bewteen a start date and an end date?
startdate = "2023-03-01"
enddate = "2023-04-30"
plt.figure(figsize=(10, 4))
plt.plot(tsla[startdate:enddate].index,
         tsla.loc[startdate:enddate, 'Close'], 'b-')
plt.title("Tesla Stock Performance")

## 3. Date Ranges

In [None]:
# manually populate a list of dates
dates = [datetime(2011, 1, 2), datetime(2011, 3, 10), datetime(2011, 4, 1)]
# ts[dates] # Pandas no longer supports missing indices
ts[ts.index.isin(dates)]

In [None]:
# Create a range of dates
daterange = pd.date_range('2011-01-01', periods=8)
print(daterange)

In [None]:
daterange = pd.date_range('2011-01-01', periods=5, freq='2D')
print(daterange)

In [None]:
daterange = pd.date_range("2011-01-01", periods=5, freq="10H")
print(daterange)

In [None]:
# Sample business days only
daterange = pd.date_range("2011-01-01", periods=10, freq="B")
print(daterange)

In [None]:
# There will be an error if not all dates are included in the dataset.
# What can we do about this?
ts[daterange]

In [None]:
ts[ts.index.isin(daterange)]

## 4. Shifting Data


In [None]:
prices = pd.DataFrame(np.random.randn(4) * 10 + 100,
                      index=pd.date_range('2019-11-01', periods=4),
                      columns=['Price'])
prices

In [None]:
# How to create a column storing yesterday's price?
for date in prices.index:
    yesterday = date - timedelta(days=1)
    if yesterday in prices.index:
        prices.loc[date, "Yesterday's Price"] = prices.loc[yesterday, "Price"]
prices

In [None]:
prices_yesterday = prices['Price'].shift(1)
prices_yesterday

In [None]:
prices = pd.merge(prices, prices_yesterday, left_index=True, right_index=True,
                  suffixes=["Today", "Yesterday"])
prices

In [None]:
# Exercise: Calculate how much Tesla's stock price has increased or decreased
# at the end of each day, compared to previous day's closing price.
# Formula: difference = today's close price - yesterday's close price
tsla.head()

In [None]:
# Step 1: Add a column of the previous day's close price
tsla['Previous Close'] = tsla['Close'].shift(1)
tsla.head()

In [None]:
# Step 2: Calculate the price difference
tsla['Price Change'] = tsla['Close'] - tsla['Previous Close']
tsla.head()

In [None]:
# Can you find how many days has the price increased/decreased?

# Solution 1: Using a for loop

num_days_increase = 0
num_days_decrease = 0
for day in tsla.index:
    if tsla.loc[day, "Price Change"] > 0:
        num_days_increase += 1
    else:
        num_days_decrease += 1
print("Number of days increased:", num_days_increase)
print("Number of days decreased:", num_days_decrease)

In [None]:
# Solution 2: Use array arithmetics
(tsla["Price Change"] > 0).value_counts()

In [None]:
# On which day the stock price achieved the highest price jump?
int_index = tsla['Price Change'].argmax()
tsla.index[int_index]

In [None]:
tsla.loc['2023-01-24':'2023-01-29']

In [None]:
# Solution from ChatGPT
import yfinance as yf
from datetime import datetime, timedelta

# Set the start and end dates
start_date = datetime(2023, 1, 1)
end_date = datetime(2023, 4, 1)

# Retrieve Tesla's stock prices from Yahoo Finance
tesla_data = yf.download("TSLA", start=start_date, end=end_date)

# Calculate the daily price changes
tesla_data['Price Change'] = tesla_data['Close'].diff()

# Find the day with the highest price jump
max_price_jump_date = tesla_data['Price Change'].idxmax()

# Print the result
print("The day with the highest price jump was:", max_price_jump_date.date())

## 5. Analysis of Time Series Data

In [None]:
# Calculate how much percent has Tesla's price changed each day.

# tsla.head()

# Convert the dollar amount into a percentage.
# percentage = price change / previous close

tsla['% Change'] = tsla['Price Change'] / tsla['Previous Close'] * 100
tsla.head()

In [None]:
# On which days has Tesla price experience the most increase?
# How much had the price increased on those days?

tsla['% Change'].idxmax()

In [None]:
# On which day has Tesla price experience the most decrease? How much decrease?
day_of_most_decrease = tsla['% Change'].idxmin()
print(day_of_most_decrease)
print("Price of the day:", tsla.loc[day_of_most_decrease, 'Close'])
print("% of change:", tsla.loc[day_of_most_decrease, '% Change'])

In [None]:
# This is how ChatGPT handles this task:
import yfinance as yf

def find_max_drop_date(symbol, year):
    # Fetch historical stock data for the given symbol and year
    data = yf.download(symbol, start=f"{year}-01-01", end=f"{year}-12-31")

    # Calculate the daily price drops
    data['Drop'] = data['Close'].diff()

    # Find the date when the stock dropped the most
    max_drop_date = data['Drop'].idxmin().date()

    return max_drop_date

# Specify the stock symbol and year
symbol = 'TSLA'
year = 2023

# Find the day when Tesla stock dropped the most in 2023
max_drop_date = find_max_drop_date(symbol, year)

print(f"The day when Tesla stock dropped the most in 2023: {max_drop_date}")

**Monthly Performance**

For long-term investors, they may prefer monthly performance data. Create a new data frame containing Open, High, Low, Close, and Volume for each month in 2020.

In [None]:
# Group the rows by the month value of their indices, then aggregate the data.

tsla_2020 = yf.download("TSLA", start="2020-01-01", end="2020-12-31")
tsla_2020.head()

In [None]:
# If we only want to create the record for 2020-01, we can extract the data from
# that month.
tsla_202001 = tsla_2020.loc['2020-01']

# Get the monthly open price
first_day = tsla_202001.index[0]
print("First trading day in 2020-01:", first_day)
first_price = tsla_202001.loc[first_day, 'Open']
print("First price in 2020-01:", first_price)

# Get the monthly close price
last_day = tsla_202001.index[-1]
print("Last trading day in 2020-01:", last_day)
last_price = tsla_202001.loc[last_day, 'Close']
print("Last price in 2020-01:", last_price)

highest_price = tsla_202001['High'].max()
lowest_price = tsla_202001['Low'].min()
total_volume = tsla_202001['Volume'].sum()

print("Highest price in 2020-01:", highest_price)
print("Lowest price in 2020-01:", lowest_price)
print("Totla trading volumn in 2020-01:", total_volume)

# Let's add these data to a new data frame
tsla_monthly = pd.DataFrame(columns=['Open', 'High', 'Low', 'Close', 'Volumn'])
# tsla_monthly # It is empty at this moment

# Let's add a row for 2020-01:
tsla_monthly.loc["2020-01", :] = [first_price, highest_price,
                            lowest_price, last_price, total_volume]

tsla_monthly

In [None]:
# Use groupby() to split the data frame into 12 groups
groups = tsla_2020.groupby(tsla_2020.index.month)

# for idx, group in groups:
#     print(idx)
#     print(group)
#     break

first_prices = groups['Open'].first()
last_prices = groups['Close'].last()
highest_prices = groups['High'].max()
lowest_prices = groups['Low'].min()
total_volumes = groups['Volume'].sum()

tsla_monthly = pd.DataFrame({
    'Open': first_prices,
    'High': highest_prices,
    'Low': lowest_prices,
    'Close': last_prices,
    'Volume': total_volumes
})

tsla_monthly.set_index(pd.date_range("2020-01-01", periods=12, freq="1m"),
                      inplace=True)

tsla_monthly

In [None]:
# Solution by ChatGPT:
import yfinance as yf

# Specify the stock symbol and year
symbol = "TSLA"
year = 2020

# Retrieve the daily stock data for Tesla in 2020
data = yf.download(symbol, start=f"{year}-01-01", end=f"{year}-12-31")

# Aggregate the dataset to monthly data
monthly_data = data.resample('M').agg({'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last', 'Volume': 'sum'})

monthly_data.index = monthly_data.index.strftime('%Y-%m')

monthly_data

**Moving Average**

The daily price is full of random ups and downs, making it difficult to see the long-term trend. It is important to find the average performance over a fairly longer period (10 days, 30 days, etc.). A moving average (or rolling average) is a calculation that find the average price for each period of time.

In [None]:
# Create 10-day moving average
moving_avg = tsla.rolling(10)["Close"].mean()

moving_avg.head(15)

In [None]:
# Plot daily prices and the moving average.

plt.plot(tsla.index, tsla['Close'], 'b-')
plt.plot(moving_avg.index, moving_avg, 'y--')
plt.legend(['Daily prices', '10-day average'])

In [None]:
# Solution by ChatGPT
import yfinance as yf
import matplotlib.pyplot as plt

# Specify the stock symbol and year
symbol = 'TSLA'
year = 2023

# Retrieve the daily stock data for Tesla in 2023
data = yf.download(symbol, start=f"{year}-01-01", end=f"{year}-12-31")

# Calculate the 10-day moving average
data['MA10'] = data['Close'].rolling(window=10).mean()

# Plotting the daily prices and the 10-day moving average
plt.figure(figsize=(12, 6))
plt.plot(data.index, data['Close'], label='Daily Prices')
plt.plot(data.index, data['MA10'], label='10-day Moving Average')
plt.xlabel('Date')
plt.ylabel('Price')
plt.title('Tesla Stock Prices in 2023')
plt.legend()
plt.show()