# Time Series Analysis in Python

Welcome to your live training workspace! You can follow along as we go through an introduction to time series analysis in Python. To consult the solution, head over to the file browser and select `notebook-solution.ipynb`.


This first cell imports some of the main packages we will be using, as well as sets the visualization theme we will be using.

In [158]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
from datetime import datetime

# Set colors
dc_colors = ["#2B3A64", "#96aae3", "#C3681D", "#EFBD95", "#E73F74", "#80BA5A", "#E68310", "#008695", "#CF1C90", "#f97b72", "#4b4b8f", "#A5AA99"]

# Set template
pio.templates["dc"] = go.layout.Template(
    layout=dict(
    	font={"family": "Poppins, Sans-serif", "color": "#505050"},
        title={"font": {"family": "Poppins, Sans-serif", "color": "black"}, "yanchor": "top", "y": 0.92, "xanchor": "left", "x": 0.025},
    	plot_bgcolor="white",
    	paper_bgcolor="white",
    	hoverlabel=dict(bgcolor="white"),
    	margin=dict(l=100, r=50, t=75, b=70),
        colorway=dc_colors,
        xaxis=dict(showgrid=False),
        yaxis=dict(showgrid=True, 
                   gridwidth=0.1, 
                   gridcolor='lightgrey', 
                   showline=True,
                   nticks=10,
                   linewidth=1, 
                   linecolor='black', 
                   rangemode="tozero")
    )
) 

## Loading and Inspecting the Data
The first thing we will do is use the [`yfinance`](https://pypi.org/project/yfinance/) package to download market data from the Yahoo! Finance API.

We will define the date range that we want to use, as well as the ticker we want to download.

In [159]:
# Import yfinance


# Set the date range



# Set the ticker we want to use (GameStop)


# Get the data for the ticker GME


# Preview DataFrame


We can also use the `.describe()` method to get a sense of the data over the period.

In [160]:
# Get a numeric summary of the data


## Visualizing the data
Next, we can use a [Plotly line plot](https://plotly.com/python/line-charts/) to examine the data over time.

In [161]:
# Create a Plotly figure


# Show the plot


Let's add an annotation to make it clear when key events happened. We will cover [three key events](https://www.reuters.com/article/us-retail-trading-gamestop-timeline-idUSKBN2AI0IQ) in the timeline:
- The date that the new board was announced, and r/wallstreetbets began hyping the stock.
- The date when the trading app RobinHood restricted trading for GameStop (and some other stocks).
- An late February surge fueld by more activity on r/wallstreetbets.

_Note: due to a bug with Plotly, we need to use [`strptime()`](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior) to convert the dates to milliseconds to enable our annotations._

In [162]:
# Create a filtered DataFrame for early 2021


# Create a Plotly figure


# Define three key events


# Add these as lines


# Show the plot


Alternatively, we can use a [candlestick chart](https://plotly.com/python/candlestick-charts/) to get a good sense of price action.

In [163]:
# Define the candlestick data


# Create a candlestick figure   


# Show the plot


## Rolling averages

The data is quite noisy. We can also use a window function to calculate the rolling mean over a certain number of periods. In our case, we'll use the past 28 days of data. 

This also smooths out the line, and still gives day-by-day performance.

In [164]:
# Calculate the 28 day rolling mean price


# Plot the rolling average


# Show the plot


## Comparing to a benchmark
It would be nice to be able to compare the performance of GameStop against a stock market index such as the S&P 500 (an index tracking the performance of 500 large US companies).

In [165]:
# Get the data for the ticker GSPC


# Rename close columns


# Concatenate the data


# Preview the data


As you can see, the prices are on a much different scale than GameStop. Let's normalize the prices so they start at 100. To do this, we will:
- Divide all prices by the first price in the series.
- Multiply them by 100.

All prices will then be relative to the starting point. This way, we can compare large the change is between the two time series, regardless of their starting values.

In [166]:
# Select first prices


# Create normalized_prices


# Normalized


We will [`.melt()`](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) the DataFrame to make it easier to plot the two time series.

In [167]:
# Melt the DataFrame to assist with plotting


# Preview the newly formatted data


In [168]:
# Create a plot of the melted data


# Show the plot


## Plotting the Autocorrelation Function

Autocorrelation is the correlation of a time series with a lagged version of itself. Plotting it can give you an idea of how lagged periods correlate to the present period.

First, let's get some recent data from when GameStop seems to have stabilized.

In [169]:
# Get recent data for GME


# Preview the data


We will use the [`acf()`](https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.acf.html) function to generate the autocorrelation function for the most recent GameStop data.

In [170]:
# Import acf


# Calculate the acf array for the recent GameStop data


# Generate a scatter plot


# Fix the range and layout


# Show the plot


First we need to fix the index before making forecasts.

In [171]:
# Set the index to the correct period


# Set a new date index to handle the gaps




## Making Simple Forecasts
Finally, we are going to fit a model to the GameStop data up until the first of February and make a forecast. We are going to use an AR(1) model. 

$\large \quad \quad \quad \quad R\_t \quad \ \ = \quad \mu \quad + \quad \phi \quad R\_{t-1} \quad  \ + \quad \epsilon\_t$

An AR(1) model calculates the current value as a mean plus a fraction ( $ \phi $ ) of yesterday's value and some noise.
- If $ \phi $ is 0 then the process is just noise.
- If $ \phi $ is 1 then the process is a random walk.

In [172]:
# Import the ARIMA class


# Fit an AR(1) model to the data


# Print the model summary


## Comparing different models
We ran the model with one lagged parameter. But how does our model compare to one with a different order? We can use the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) to compare goodness of fit for different orders.

In [173]:
# Initialize an empty array


# Loop through a range of AR models and get the BIC

    
# Plot the BIC



It looks like the lowest BIC occurs at lag 1. We can now use the [`get_forecast`](https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.get_forecast.html) method to make estimates out of sample (i.e., past the range of our data).

In [182]:
# Get data up until a week ago


# Estimate an AR(1) model


# Create the forecasts as a DataFrame


# View the forecasts


## Bonus: Plot the forecast
Finally, we can create a Plotly chart to visualize the forecasts with the confidence intervals.

In [183]:
# Create a figure containing predicted, real, and CI values
fig = go.Figure([
    go.Scatter(
        name='True value',
        x=gme_recent.index.to_timestamp(),
        y=gme_recent["Close"],
        mode='lines'
    ),
    go.Scatter(
        name='Predicted value',
        x=preds.index.to_timestamp(),
        y=preds["mean"],
        mode='lines'
    ),
    go.Scatter(
        name='Upper',
        x=preds.index.to_timestamp(),
        y=preds["mean_ci_upper"],
        mode='lines',
        line=dict(color='lightblue', width=0)
    ),
    go.Scatter(
        name='Lower',
        x=preds.index.to_timestamp(),
        y=preds["mean_ci_lower"],
        mode='lines',
        line=dict(color='lightblue', width=0),
        fill="tonexty"
    ),
])

# Update the layout and show the plot
fig.update_layout(
    yaxis_title='Price',
    title='GameStop Price and Forecast over Time',
    showlegend=False,
    template="dc"
) 

fig.show()