# S&P 500 Data Project

## Project overview
The following project aims to get data on all stocks in the S&P500 index by using the Alpha Vantage API. Following this, we choose some big-tech company stocks and present their prices in an interactive figure. Lastly, we identify stocks exhibiting short-term momentum.

**Note :** 
Please don't run the cell that calls the API, since it will run for about 2 hours. We have completed that step locally and saved the resulting dataset locally.

## Imports and set magics:

In [None]:
# a. Import

import numpy as np
import pandas as pd
from alpha_vantage.timeseries import TimeSeries # Enables the use of Alpha Vantage stock API
import time
import matplotlib.pyplot as plt
import bs4 as bs
import pickle
import plotly.graph_objects as go
import requests

# b. Autoreload

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# local modules
# import dataproject

# Call, read, clean and present data

## Getting all S&P500 tickers

**Read all S&P500 tickers** from ``http://en.wikipedia.org/wiki/List_of_S%26P_500_companies`` and **save it** as variable:

In [None]:
# a. Read

def save_sp500_tickers():
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})
    tickers = []
    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)
        
    with open("sp500tickers.pickle","wb") as f:
        pickle.dump(tickers,f)
        
    return tickers

# b. Save

tickers = save_sp500_tickers()
tickers = [x[:-1] for x in tickers]

# c. Replacing ticker names that later produces errors:

tickers = [w.replace('BF.B', 'BF-B') for w in tickers]

The first 5 ticker namesnow looks like this:

In [None]:
print(tickers[0:5])

## Call API to get all S&P500 data

**Call Alpha Vantage API to get all S&P500 monthly stock prices** and **save it** as dataset:

In [None]:
api_key = 'IHXRIDZAXOXUHF8T' # PERSONAL AlphaVantage API key. Please, don't abuse/redistribute.

ETFs = np.array(tickers) # All S&P 500 tickers used.

ts = TimeSeries(key = api_key, output_format = 'pandas', indexing_type='date') # From alpha_vantage.

**DON'T RUN THE FOLLOWING CELL**:

In [None]:
price_data = pd.DataFrame()

# Loop through requested stocks.

print('Querying Securities, Estimated time: ' + str(round(len(ETFs)/5)) + ' minutes') # Message for estimated time left.
for x in range(len(ETFs)):
    print(str(ETFs[x]))

    if (x + 1) % 5 == 0:
        time.sleep(60) # The free version of the API is limited to 5 calls per minute.

    data, meta_data = ts.get_monthly_adjusted(symbol=str(ETFs[x])) 
    data = data['5. adjusted close'].iloc[::-1]             
    data = pd.DataFrame(data).rename(index=str, columns={'5. adjusted close' : str(ETFs[x])})
    price_data = pd.concat([price_data,data], axis=1, sort=False)
    
price_data = price_data.iloc[:300].iloc[::-1]

plt.plot(price_data)
plt.xticks([])

Create **local copy** for future references so the API that takes hours to run won't have to be called more than once:

In [None]:
localdata = price_data.copy()

localdata.to_csv(r'C:\Users\07anl_000\Desktop\Polit - KU\Kandidat\3. semester\Seminar Anv. Corp Fin\localdata.csv')

**Loading dataset from local path**:

In [None]:
df = pd.read_csv(r'C:\Users\07anl_000\Desktop\Polit - KU\Kandidat\3. semester\Seminar Anv. Corp Fin\localdata.csv', delimiter = ",", index_col = 'Unnamed: 0')
df = df.reindex(index = df.index[::-1]) # Changing the order of rows.
print(df.head())

**Subsetting** a couple of big tech company stocks to focus on in the analysis:

In [None]:
df_big = df[['MSFT', 'AMZN', 'AAPL', 'GOOG', 'FB']] # Choosing certain big tech stocks for the interactive figures.

df_big['Date'] = df.index

df_big['Date'] = pd.to_datetime(df_big.Date)

df_big['Date'] = df_big['Date'].dt.strftime('%m/%Y')

df_big = df_big.drop(df.index[0 : 169]) # Deleting rows to enhance the graph and only show returns from past couple of years.

print(df_big.head())

## Present stock data

**Creating** an **interactive figure** to show the development of stock prices:

In [None]:
# a. Initialize figure

fig = go.Figure()

# b. Add scatters for each big tech ticker

fig.add_trace(
    go.Scatter(x = list(df_big.Date), y = list(df_big.MSFT) , name = 'Microsoft'))

fig.add_trace(
    go.Scatter(x = list(df_big.Date), y = list(df_big.AMZN) , name = 'Amazon'))

fig.add_trace(
    go.Scatter(x = list(df_big.Date), y = list(df_big.AAPL) , name = 'Apple'))

fig.add_trace(
    go.Scatter(x = list(df_big.Date), y = list(df_big.FB) , name ='Facebook'))

fig.add_trace(
    go.Scatter(x = list(df_big.Date), y = list(df_big.GOOG) , name = 'Google'))

# c. Add slicer

fig.update_layout(
    updatemenus=[
        dict(
            active=0,
            buttons=list([
                dict(label="None",
                     method="update",
                     args=[{"visible": [True, True, True, True, True]}]),
                dict(label="Microsoft",
                     method="update",
                     args=[{"visible": [True, False, False, False, False]}]),
                dict(label="Amazon",
                     method="update",
                     args=[{"visible": [False, True, False, False, False]}]),
                dict(label="Apple",
                     method="update",
                     args=[{"visible": [False, False, True, False, False]}]),
                dict(label="Facebook",
                     method="update",
                     args=[{"visible": [False, False, False, True, False]}]),
                dict(label="Google",
                     method="update",
                     args=[{"visible": [False, False, False, False, True]}])
            ]),
        )
    ])

# d. Add creative title and display

fig.update_layout(title_text="STONKS" , yaxis_title = 'Adj. Close')


fig.show()

# Analyze the data for momentum signals

## Calculating returns and identifying momentum stocks

In [None]:
returns_df = df.copy() # Copy of dataset, to make return calculations

returns_df = returns_df.apply(lambda x: x.shift(-1)/x - 1, axis = 0) # Monthly returns 

nreturns_df = returns_df.shift(1, axis = 0) # Offsets the rows by 1, to make the results more intuitive

**Accumulating returns to define winners and losers**:

In [None]:
creturns_df = nreturns_df.copy()

creturns_df = creturns_df.apply(lambda x : x + x.shift(1) + x.shift(2) + x.shift(3) + x.shift(4) + x.shift(5), axis = 0) 
# Accumulating returns for the previous 6 months

creturns_df = creturns_df.shift(1, axis = 0) #Offsets values

print(creturns_df.iloc[0:10,:])

**Defining winners and losers based on distribution**

Stocks in the top 30 percentiles, have strong positive momentum signals, and should therefore be bought.

Likewise stocks in the bottom 30 percentiles have strong negative momentum signals and should therefore be shorted.

In [None]:
dreturns_df = creturns_df.copy()

dreturns_df = dreturns_df.transpose() # Transposes the dataset, so it's applicable for .describe()

perc = [0.3, 0.7] # Defining lower and upper bounds

dreturns_df.describe(percentiles = perc)

In [None]:
# Creating list of percentiles on different dates

percentiles = pd.DataFrame()

percentiles['lower bound'] = dreturns_df.quantile(0.3)
percentiles['upper bound'] = dreturns_df.quantile(0.7)

print(percentiles.iloc[[-4], [0,1]]) #

# Adding bounds to dataframe , to quickly compare whether stocks are winners or losers

creturns_df['lower bound'] = percentiles['lower bound']
creturns_df['upper bound'] = percentiles['upper bound']

print(creturns_df.tail())

## Comparing stock performance and bounds to see if they exhibit momentum

In [None]:
# Compares if the Big-Tech stocks are "Winners" or "Losers"

print(creturns_df.loc['29-11-2019 00:00' , ['MSFT', 'AMZN', 'AAPL', 'GOOG', 'FB' , 'lower bound' , 'upper bound']])

**Results:**

It is **evident** that neither **Microsoft**, **Google** nor **Facebook** exhibit momentum, granted they fall *within* the bounds.

**However**, the **Apple** stock exhibits **positive short-term momentum** (*long signal*), whilst the **Amazon** stock exhibits **negative short-term momentum** (*short signal*).

# Conclusion

We have shown how easily stock price data may be called from an API and how it is easily vizualised. Furthermore, we have analyzed the data and offered a trading recommendation based on a simple version of a momentum strategy (Evaluate 6 months and hold stocks for 6 months).