<a href="https://colab.research.google.com/github/byunsy/equal-weight-index-fund/blob/main/Equal_Weight_S%26P.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Equal-Weight S&P 500 Index Fund

This notebook is quite a short study on how to calculate how many shares of each S&P 500 constituent we should purchase to get an equally weighted index fund. 

In an equal-weight index fund, each stock carries the same weight in the index regardless of the size of the company. 

## Import Necessary Modules

In [1]:
import numpy as np 
import pandas as pd 
import requests 
import math 

In [None]:
from google.colab import files
uploaded = files.upload()

## Attain S&P Stock Listing

Get a list of all the companies in the S&P 500. 

In [3]:
sp500 = pd.read_csv('sp_500_stocks.csv')
sp500

Unnamed: 0,Symbol
0,A
1,AAL
2,AAP
3,AAPL
4,ABBV
...,...
500,YUM
501,ZBH
502,ZBRA
503,ZION


## API Call

We first need to have a test api token to use IEX Cloud APIs (This will remain private). You can receive sandbox Text APIs from the IEX Cloud API website. 

In [4]:
from iex_api import IEX_CLOUD_API_TOKEN

In [5]:
# To take an example of what we get from IEX Cloud, we will take Microsoft
symbol='MSFT'
api_url = f'https://sandbox.iexapis.com/stable/stock/{symbol}/quote?token={IEX_CLOUD_API_TOKEN}'
ms_data = requests.get(api_url).json()

ms_data

{'avgTotalVolume': 27033452,
 'calculationPrice': 'close',
 'change': 0.99,
 'changePercent': 0.00457,
 'close': 235.04,
 'closeSource': 'lcoiffia',
 'closeTime': 1646162305010,
 'companyName': 'Microsoft Corporation',
 'delayedPrice': 234.49,
 'delayedPriceTime': 1614659009947,
 'extendedChange': 0.76,
 'extendedChangePercent': 0.00337,
 'extendedPrice': 236.9,
 'extendedPriceTime': 1656436944033,
 'high': 236.68,
 'highSource': 'n  imede15arplui dyetec',
 'highTime': 1689210112930,
 'iexAskPrice': None,
 'iexAskSize': None,
 'iexBidPrice': None,
 'iexBidSize': None,
 'iexClose': 230.92,
 'iexCloseTime': 1669257949211,
 'iexLastUpdated': None,
 'iexMarketPercent': None,
 'iexOpen': 235.09,
 'iexOpenTime': 1671765145240,
 'iexRealtimePrice': None,
 'iexRealtimeSize': None,
 'iexVolume': None,
 'isUSMarketOpen': False,
 'lastTradeTime': 1651196572907,
 'latestPrice': 233.93,
 'latestSource': 'Close',
 'latestTime': 'January 22, 2021',
 'latestUpdate': 1653502668518,
 'latestVolume': 309

We can now get specific information about our data using indices. 

In [6]:
print("LATEST PRICE:", ms_data['latestPrice'])
print("MARKET CAPITALIZATION:", ms_data['marketCap'])

LATEST PRICE: 233.93
MARKET CAPITALIZATION: 1775547353116


**NOTE:**

Since we are using sandbox test APIs, the values returned are not real. 

## Data Preprocessing


We will now transfer our attained data into a data frame. 

In [None]:
df_columns = ['Symbol', 'Latest Price','Market Capitalization', 'Number Of Shares to Purchase']
df = pd.DataFrame(columns=df_columns)

for symbol in sp500['Symbol']:
    api_url = f'https://sandbox.iexapis.com/stable/stock/{symbol}/quote?token={IEX_CLOUD_API_TOKEN}'
    data = requests.get(api_url).json()
    df = df.append(pd.Series([symbol, data['latestPrice'], 
                              data['marketCap'], 'N/A'], index=df_columns), 
                              ignore_index = True)
df   

However, we immediately realize this takes quite a long time to process. This is because API calls can be time-consuming. One way to cope with this is to use batch API calls.

In [7]:
def chunks(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

In [9]:
df_columns = ['Symbol', 'Latest Price','Market Capitalization', 'Number Of Shares to Purchase']

symbol_batch = list(chunks(sp500['Symbol'], 100))
symbol_strings = []

for batch in symbol_batch:
    symbol_strings.append(','.join(batch))

df = pd.DataFrame(columns=df_columns)

for symbol_string in symbol_strings:

    batch_api_call_url = f'https://sandbox.iexapis.com/stable/stock/market/batch/?types=quote&symbols={symbol_string}&token={IEX_CLOUD_API_TOKEN}'
    data = requests.get(batch_api_call_url).json()

    for symbol in symbol_string.split(','):
        df = df.append(pd.Series([symbol, data[symbol]['quote']['latestPrice'], 
                                  data[symbol]['quote']['marketCap'], 'N/A'], 
                                  index=df_columns), ignore_index = True)

# print the data frame        
df

Unnamed: 0,Symbol,Latest Price,Market Capitalization,Number Of Shares to Purchase
0,A,128.71,38943562989,
1,AAL,16.09,9972159354,
2,AAP,164.37,11540042434,
3,AAPL,143.56,2374028206604,
4,ABBV,114.91,203867293572,
...,...,...,...,...
500,YUM,109.48,32913535941,
501,ZBH,163.02,34448537292,
502,ZBRA,414.75,22872921709,
503,ZION,49.72,8013721567,


## Calculate the Number of Shares to Purchase

Given a value of our portfolio, we can now calculate the number of shares of each constituent to purchase.




In [10]:
PORTFOLIO_SIZE = 1000000

position_size = float(PORTFOLIO_SIZE) / len(df.index)
print(position_size)

1980.1980198019803


In [11]:
position_size = float(PORTFOLIO_SIZE) / len(df.index)

for i in range(len(df['Symbol'])):
    df.loc[i, 'Number Of Shares to Purchase'] = math.floor(position_size / df['Latest Price'][i])

# print the data frame
df

Unnamed: 0,Symbol,Latest Price,Market Capitalization,Number Of Shares to Purchase
0,A,128.71,38943562989,15
1,AAL,16.09,9972159354,123
2,AAP,164.37,11540042434,12
3,AAPL,143.56,2374028206604,13
4,ABBV,114.91,203867293572,17
...,...,...,...,...
500,YUM,109.48,32913535941,18
501,ZBH,163.02,34448537292,12
502,ZBRA,414.75,22872921709,4
503,ZION,49.72,8013721567,39
