# **S&P Vs Hang Seng**


**1) Research Question**

I will be using the year to date data from the Hang Seng(Hong Kong stock index) and the S&P500(United states stock index). Looking simply at their relationship based on daily percentage change.
My null hypothesis is the Hang Seng and S&P 500 have statistically significant mean daily change?

In [4]:
#imports to run code
import pandas as pd 
import pandas_datareader as web
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import os

**Using date time to mark starting and ending period for data frame**

In [5]:
#starting and ending dates using only this year because the S&P 500 changed in January 2021
start = datetime(2021,1,2)
end = datetime.today() 
sp500 =[]
hang_seng =[]

In [6]:
# sp and hang seng declared as variables
stock1 = ["^GSPC"]
stock2 = ["^HSI"]

**2) Importing Data**

Importing year to date data on both indexes using yahoo finance.
This is the year to date data on the S&P 500.

In [8]:
#getting data from yahoo and creating dataframe
df_sp = web.DataReader(stock1,"yahoo",start,end)
df_sp = df_sp.reset_index()

RemoteDataError: No data fetched using 'YahooDailyReader'

In [None]:
#getting data from yahoo and creating dataframe
df_hs = web.DataReader(stock2, "yahoo",start,end)
df_hs = df_hs.reset_index()

**3)Cleaning data**

Read the data frames into a csv file to make the information more readable.
I got rid of the useless column and added a new row. 

In [None]:
#reading data into a csv file
df_sp.to_csv('sp500.csv', index=False)

In [None]:
#cleaning data to make more readable
pc_sp = pd.read_csv('sp500.csv')
pc_sp = pc_sp.drop(pc_sp.index[0])
top_row = pd.DataFrame({'Date':[''],'Adj Close':['S&P 500'],'Close':['S&P 500'],'High':['S&P 500'],'Low':['S&P 500'],'Open':['S&P 500'],'Volume':['S&P 500']})
pc_sp = pd.concat([top_row, pc_sp]).reset_index(drop = True)
pc_sp

In [None]:
#reading data into a csv file 
df_hs.to_csv('hang_seng.csv', index=False)


In [None]:
#cleaning data to make more readable
pc_hs = pd.read_csv('hang_seng.csv')
pc_hs = pc_hs.drop(pc_hs.index[0])
top_row = pd.DataFrame({'Date':[''],'Adj Close':['Hang Seng'],'Close':['Hang Seng'],'High':['Hang Seng'],'Low':['Hang Seng'],'Open':['Hang Seng'],'Volume':['Hang Seng']})
pc_hs = pd.concat([top_row, pc_hs]).reset_index(drop = True)
pc_hs

**4) Data Visualization**

Scatter plots to show each data point for the year to date of the S&P 500 and the Hang Seng. This is simply to show all the data charted as individual points in time and price. 

In [None]:
#sp500["Adj Close"] price YTD
plt.figure(figsize=(11,4))

y_sp= df_sp["Adj Close"]
x_sp = df_sp["Date"]
x = plt.gca().xaxis

# rotate the tick labels for the x axis
for item in x.get_ticklabels():
    item.set_rotation(45)


plt.title('S&P 500', size = 40)
plt.xlabel('Date', size = 20)
plt.ylabel('Price', size = 20)
plt.scatter(x_sp, y_sp, c='blue')

In [None]:
#hang seng["Adj Close"] price YTD
plt.figure(figsize=(11,4))

y_hs=df_hs["Adj Close"]
x_hs=df_hs["Date"]
x = plt.gca().xaxis

# rotate the tick labels for the x axis
for item in x.get_ticklabels():
    item.set_rotation(45)

plt.title('Hang Seng', size = 40)
plt.xlabel('Date', size = 20)
plt.ylabel('Price', size = 20)
plt.scatter(x_hs, y_hs, c='red',)

**4) Data Vizualisation**

Created line plots of the year to date data of the S&P 500 and the Hang Seng to show trend lines for both indexes. 

In [None]:
#Simple line chart or S&P 500 is price action
plt.figure(figsize= (12,2.8))

plt.plot(x_sp, y_sp, label= "S&P 500 Index",c='blue', linestyle="-.")

x = plt.gca().xaxis

# rotate the tick labels for the x axis
for item in x.get_ticklabels():
    item.set_rotation(45)
plt.legend(['S&P500'], prop={'size': 20})
plt.xlabel('Date', size = 18)
plt.ylabel('Price', size = 18)
plt.show()
#simple line charting the Hang Seng price action
plt.figure(figsize= (12,2.8))
plt.plot(x_hs, y_hs, label= "Hang Seng Index",c='red', linestyle=":")

x = plt.gca().xaxis

# rotate the tick labels for the x axis
for item in x.get_ticklabels():
    item.set_rotation(45)
plt.legend(['Hang Seng'], prop={'size': 20})
plt.xlabel('Date', size = 18)
plt.ylabel('Price', size = 18)
plt.show()

**5) A Model**

Plotting the daily change percentage to be used in our t-test. showing the plot gives a general idea of why we wil get the p-value we recieve. 

In [None]:
#percentage change to give an even metric for t-test 
plt.figure(figsize=(14,6))
a = ((df_hs["Close"]- df_hs["Open"])/(df_hs["Open"]))*100
b = ((df_sp["Close"]- df_sp["Open"])/(df_sp["Open"]))*100
plt.xlabel('Date', size =20)
plt.ylabel('Daily Percentage change YTD', size = 15)

x = plt.gca().xaxis

# rotate the tick labels for the x axis
for item in x.get_ticklabels():
    item.set_rotation(45)


plt.plot(x_hs, a, c='red', linestyle=":")
plt.plot(x_sp, b, c='blue', linestyle="-.")
#legend to show which represents which data
plt.legend(['Hang Seng','S&P 500'], prop={'size': 20})

**5) A Model**

Running an paired t-test to get our p-value. We will be using the paired T-test, because both samples are related. If the p-value is greater than .05 then we will confirm our null hypothesis that the mean percentage of daily change year to date of the S&P 500 and the Hang Seng show statistical significance. If the p-value is less than .05 then we will reject our null hypothesis that the mean percentage of daily change year to date are statistically significant. 

In [None]:
tStat, pValue =  stats.ttest_rel(a, b)
print("P-Value:{0} T-Statistic:{1}".format(pValue,tStat))


# **Conclusion**

By looking at our p-value it is clear that we must accept our null hypothesis.
Our p-value shows that these two sample sets show statistically signicant correlation. By using the mean daily change percentage we are able to see this on a percentage basis that allows us to access both samples equitably. Our next step would be to extend our time horizon and again run the same t-test to see if this is just for the year to date samples. If we can show further historical statistically significant correlation it would merit further inspection. 