# Predicting Data Using Time-Series Correlation

In this activity, you will get practice with the creating an hvPlot dual axis plot to juxtapose time-series data. You'll then use this and other data to analyze lead-lag relationships using time-series correlation.

### Import Libraries and Dependencies

In [11]:
# Import necessary libraries and dependencies
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
from pathlib import Path
import hvplot.pandas
%matplotlib inline

In [8]:
#describe the relationship between the two variables


## 1. Read in the S&P 500 stock volume and price data.

### Read in Files

In [24]:
# Import data
file_path = Path("..//Resources/spy_stock_volume.csv")
spy_df = pd.read_csv(file_path, index_col="Date", infer_datetime_format=True, parse_dates=True)
spy_df.sort_index(inplace=True)


# Read in data and index by date
spy_df = pd.read_csv(file_path, index_col="Date", infer_datetime_format=True, parse_dates=True)
spy_df.sort_index(inplace=True)

spy_df.head()


Unnamed: 0_level_0,close,volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-03-12 08:00:00,258.6,229683
2020-03-12 09:00:00,257.76,457488
2020-03-12 10:00:00,252.81,291881
2020-03-12 11:00:00,259.99,353484
2020-03-12 12:00:00,257.12,520699


## 2. Plot S&P 500's performance over time using an hvPlot line plot.

In [25]:
# Use hvPlot to visualize the closing price of the S&P500 over time.
#df.hvplot.line(xlabel="Date", ylabel="Closing Price", title="S&P 500 Closing Price")
spy_data['close'].hvplot.line(xlabel="Date", ylabel="Closing Price", title="S&P 500 Closing Price")


## 3. Based on this plot, slice to just a few months to where the market seems to have suffered a big decline. 

* This is meant to be a little subjective; pick the time you think is most volatile/downward.

In [26]:
# Slice to a volatile/downward trend period
df_slice = spy_df.loc["2020-07-01":"2020-07-31"]


# Preview the result (your results may vary depending on period selected)
df_slice.head()



Unnamed: 0_level_0,close,volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-07-01 08:00:00,309.81,75130
2020-07-01 09:00:00,310.6,92639
2020-07-01 10:00:00,310.26,62858
2020-07-01 11:00:00,310.78,62496
2020-07-01 12:00:00,310.74,37215


## 4. Using this downward sub-period, use hvPlot's ability to create two graphs, stacked one on top of each other. 

* Specifically, plot the hourly close price and hourly volume of shares traded. 
* Looking at this visual, does it apper there is any relationship between volume and close?

In [27]:
# Use hvPlot to visualize the close and volume data
# Plot each column on a separate axes using the following syntax
spy_df.hvplot(xlabel="Date", ylabel="Closing Price", title="S&P 500 Closing Price")   


## 5. Create a column called Lagged Volume, which is the volume column, but shifted back in time by one hour.

In [30]:
# Create a new volume column 
# This column should shift the volume back by one hour
spy_df['volume_shift'] = spy_df['volume'].shift(1)
spy_df.head()



Unnamed: 0_level_0,close,volume,volume_shift,Stock Volatility
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-03-12 08:00:00,258.6,229683,,
2020-03-12 09:00:00,257.76,457488,229683.0,
2020-03-12 10:00:00,252.81,291881,457488.0,
2020-03-12 11:00:00,259.99,353484,291881.0,
2020-03-12 12:00:00,257.12,520699,353484.0,0.020827


## 6. Create another column called Stock Volatility, which is the rolling standard deviation of SPY's stock price returns. 

* Consider using a 4-hour moving average, or experiment with your own horizon to see if it impacts predictability.

In [31]:
# Create a new variable called Stock Volatility
# This column should calculate the standard deviation of the closing stock price return data over a 4 period rolling window

spy_df['Stock Volatility'] = spy_df['close'].pct_change().rolling(window=4).std()
spy_df.head()


Unnamed: 0_level_0,close,volume,volume_shift,Stock Volatility
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-03-12 08:00:00,258.6,229683,,
2020-03-12 09:00:00,257.76,457488,229683.0,
2020-03-12 10:00:00,252.81,291881,457488.0,
2020-03-12 11:00:00,259.99,353484,291881.0,
2020-03-12 12:00:00,257.12,520699,353484.0,0.020827


In [32]:
# Use hvPlot to visualize the stock volatility

spy_df['Stock Volatility'].hvplot(xlabel="Date", ylabel="Stock Volatility", title="S&P 500 Stock Volatility")


## 7. Construct a column called Hourly Stock Return, which is the percentage return on the S&P at each hour.

In [33]:
# Create a new column called Hourly Stock Return.
# This column should calculate hourly return percentage of the closing price

spy_df['Hourly Stock Return'] = spy_df['close'].pct_change()
spy_df.head()


Unnamed: 0_level_0,close,volume,volume_shift,Stock Volatility,Hourly Stock Return
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-03-12 08:00:00,258.6,229683,,,
2020-03-12 09:00:00,257.76,457488,229683.0,,-0.003248
2020-03-12 10:00:00,252.81,291881,457488.0,,-0.019204
2020-03-12 11:00:00,259.99,353484,291881.0,,0.028401
2020-03-12 12:00:00,257.12,520699,353484.0,0.020827,-0.011039


## 8. Using these three columns, construct a correlation table, and answer the following questions:

* Does this hours trading volume predict the next hour's market volatility?
* Does this hours trading volume predict the next hour's market return?



In [35]:
# Construct correlation table of Stock Volatility, Lagged Volume, and Hourly Stock Return
 
 #downward_period[['stock Volatility, 'lagged volume', 'Hourly Stock Return']].corr()

 # Construct a scatter plot of Hourly Stock Return vs. Lagged Volume
# Use the following syntax
# df.hvplot.scatter(x='Hourly Stock Return', y='Lagged Volume', xlabel='Hourly Stock Return', ylabel='Lagged Volume', title='Hourly Stock Return vs. Lagged Volume')

downward_period = spy_df.loc["2020-07-01":"2020-07-31"]
downward_period[['Hourly Stock Return', 'volume_shift', 'Stock Volatility']].corr()

downward_period.hvplot.scatter(x='Hourly Stock Return', y='volume_shift', xlabel='Hourly Stock Return', ylabel='Lagged Volume', title='Hourly Stock Return vs. Lagged Volume')
