# Overview:

This notebook is used to get the index and constituent price data for the S&P 500 Equal Weight Consumer Staples Index (SPXEWCS) from Yahoo Finance. The data is then saved as csv files

The end result is daily price data for the index and its constituent stocks that were in the index at each time step from 2006 to 2023. The data is saved in the data folder as constituent_prices.csv and index_prices.csv. The data is saved in the data folder as constituent_prices.csv and index_prices.csv.

# 1) Get Index Constituent Stocks

- The following code uses data from Bloomberg giving stock weights for the S&P 500 Equal Weight Consumer Staples Index (SPXEWCS) at each rebalancing period (Quarter: June, Sep, Dec, March).

- At each time step, we find the stocks that will be a part of the index for the following quarter. We store the results in a dictionary titled time_constituent_dict that takes the form {time1: [ticker1, ticker2, ... ], time2: [ticker1, ticker2, ... ], ...}

- Note that the number and composition of constituent stocks is not the same at each period, as it changes over time. As a result, some stocks will be included at only some time steps. We will only consider stocks which are in the index at each time step.

In [1]:
import numpy as np
import pandas as pd
import yfinance as yf

In [2]:
# read bloomberg index data from 2006-2014
index_cos = pd.read_excel('data/06_14_index.XLSX', skiprows=10)
index_cos

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,6/30/2006,9/30/2006,12/31/2006,3/31/2007,6/30/2007,9/30/2007,12/31/2007,...,12/31/2011,3/31/2012,6/30/2012,9/30/2012,12/31/2012,3/31/2013,6/30/2013,9/30/2013,12/31/2013,3/31/2014
0,SPXEWCS,,,100.0,100.0,100.0,100.0,100.0,100.0,100.0,...,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0
1,,Consumer Staples,,82.09,82.18,84.19,84.3,84.66,84.53,84.48,...,92.71,92.91,92.76,92.72,92.88,92.92,95.08,95.1,94.89,95.11
2,,,ALTRIA GROUP INC,2.6,2.35,2.66,2.65,2.57,2.58,2.54,...,2.36,2.37,2.47,2.46,2.28,2.31,2.5,2.49,2.52,2.48
3,,,ARCHER-DANIELS-MIDLAND CO,2.65,2.53,2.58,2.8,2.48,2.54,2.73,...,2.39,2.38,2.18,2.43,2.43,2.37,2.61,2.57,2.62,2.46
4,,,AVON PRODUCTS INC,2.6,2.64,2.64,2.54,2.48,2.71,2.51,...,2.42,2.44,2.4,2.38,2.46,2.4,2.34,2.51,2.49,2.39
5,,,BROWN-FORMAN CORP-CLASS B,2.44,2.58,2.62,2.59,2.6,2.56,2.56,...,2.4,2.44,2.55,2.47,2.4,2.42,2.46,2.46,2.51,2.48
6,,,CAMPBELL SOUP CO,2.65,2.49,2.58,2.56,2.55,2.66,2.56,...,2.35,2.41,2.48,2.45,2.36,2.5,2.54,2.44,2.59,2.46
7,,,CLOROX COMPANY,2.5,2.57,2.66,2.62,2.49,2.54,2.57,...,2.37,2.37,2.37,2.48,2.34,2.4,2.48,2.46,2.42,2.43
8,,,COCA-COLA CO/THE,2.5,2.55,2.6,2.6,2.62,2.58,2.53,...,2.4,2.49,2.45,2.41,2.32,2.35,2.53,2.48,2.56,2.44
9,,,COCA-COLA EUROPACIFIC PARTNE,2.67,2.56,2.66,2.56,2.69,2.54,2.56,...,2.32,2.43,2.45,2.43,2.47,2.32,2.48,2.56,2.64,2.48


In [3]:
# drop unneeded columns and rows
index_cos.drop(columns=['Unnamed: 0', 'Unnamed: 1'], inplace = True)
index_cos = index_cos.iloc[:54,:]

# switch rows and columns to get data in time series format
index_cos = index_cos.T

# reset so that columns are named correctly
index_cos = index_cos.rename(columns = index_cos.loc['Unnamed: 2']).iloc[1:,:]

# drop weight columns - since index is equal weighted, we only need to know which stocks are in the index at each time step
index_cos.drop(columns=[np.nan], inplace = True)

# rename index to Date
index_cos.index.rename("Date", inplace = True)
index_cos

Unnamed: 0_level_0,ALTRIA GROUP INC,ARCHER-DANIELS-MIDLAND CO,AVON PRODUCTS INC,BROWN-FORMAN CORP-CLASS B,CAMPBELL SOUP CO,CLOROX COMPANY,COCA-COLA CO/THE,COCA-COLA EUROPACIFIC PARTNE,COLGATE-PALMOLIVE CO,CONAGRA BRANDS INC,...,WALMART INC,WHOLE FOODS MARKET INC,CVS HEALTH CORP,ALBERTO-CULVER CO,ANHEUSER-BUSCH COS INC,BEAM SUNTORY INC,KRAFT HEINZ FOODS CO,PEPSI BOTTLING GROUP INC,UST LLC,WM WRIGLEY JR CO
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6/30/2006,2.6,2.65,2.6,2.44,2.65,2.5,2.5,2.67,2.48,2.51,...,2.5,2.61,2.55,2.62,2.52,,2.52,2.59,2.58,2.52
9/30/2006,2.35,2.53,2.64,2.58,2.49,2.57,2.55,2.56,2.55,2.67,...,2.61,2.69,2.35,2.58,2.56,,2.58,2.61,2.59,2.54
12/31/2006,2.66,2.58,2.64,2.62,2.58,2.66,2.6,2.66,2.63,2.69,...,2.62,2.52,2.67,,2.68,,2.55,2.56,2.72,2.62
3/31/2007,2.65,2.8,2.54,2.59,2.56,2.62,2.6,2.56,2.62,2.62,...,2.6,2.59,2.65,,2.6,,2.62,2.66,2.64,2.54
6/30/2007,2.57,2.48,2.48,2.6,2.55,2.49,2.62,2.69,2.52,2.69,...,2.52,2.5,2.51,,2.55,,2.59,2.55,2.62,2.53
9/30/2007,2.58,2.54,2.71,2.56,2.66,2.54,2.58,2.54,2.59,2.52,...,2.51,2.76,2.66,,2.51,,2.57,2.57,2.55,2.59
12/31/2007,2.54,2.73,2.51,2.56,2.56,2.57,2.53,2.56,2.58,2.56,...,2.56,2.49,2.61,,2.55,,2.58,2.61,2.61,2.57
3/31/2008,2.55,2.55,2.48,2.46,2.53,2.49,2.48,2.45,2.5,2.68,...,2.47,2.52,2.53,,2.57,,2.59,2.46,2.45,2.52
6/30/2008,2.47,2.57,2.5,2.48,2.49,2.43,2.41,2.35,2.47,2.17,...,2.49,2.24,2.38,,2.55,,2.42,2.33,2.55,2.49
9/30/2008,2.43,2.33,2.58,2.45,2.51,2.51,2.52,2.38,2.5,2.48,...,2.52,2.42,2.34,,2.43,,2.51,2.34,2.5,2.51


In [4]:
# fill all nans with -1 to make for easier parsing
# if weight is -1, then the stock is not in the index at this time step
index_cos = index_cos.fillna(-1)
index_cos

Unnamed: 0_level_0,ALTRIA GROUP INC,ARCHER-DANIELS-MIDLAND CO,AVON PRODUCTS INC,BROWN-FORMAN CORP-CLASS B,CAMPBELL SOUP CO,CLOROX COMPANY,COCA-COLA CO/THE,COCA-COLA EUROPACIFIC PARTNE,COLGATE-PALMOLIVE CO,CONAGRA BRANDS INC,...,WALMART INC,WHOLE FOODS MARKET INC,CVS HEALTH CORP,ALBERTO-CULVER CO,ANHEUSER-BUSCH COS INC,BEAM SUNTORY INC,KRAFT HEINZ FOODS CO,PEPSI BOTTLING GROUP INC,UST LLC,WM WRIGLEY JR CO
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6/30/2006,2.6,2.65,2.6,2.44,2.65,2.5,2.5,2.67,2.48,2.51,...,2.5,2.61,2.55,2.62,2.52,-1.0,2.52,2.59,2.58,2.52
9/30/2006,2.35,2.53,2.64,2.58,2.49,2.57,2.55,2.56,2.55,2.67,...,2.61,2.69,2.35,2.58,2.56,-1.0,2.58,2.61,2.59,2.54
12/31/2006,2.66,2.58,2.64,2.62,2.58,2.66,2.6,2.66,2.63,2.69,...,2.62,2.52,2.67,-1.0,2.68,-1.0,2.55,2.56,2.72,2.62
3/31/2007,2.65,2.8,2.54,2.59,2.56,2.62,2.6,2.56,2.62,2.62,...,2.6,2.59,2.65,-1.0,2.6,-1.0,2.62,2.66,2.64,2.54
6/30/2007,2.57,2.48,2.48,2.6,2.55,2.49,2.62,2.69,2.52,2.69,...,2.52,2.5,2.51,-1.0,2.55,-1.0,2.59,2.55,2.62,2.53
9/30/2007,2.58,2.54,2.71,2.56,2.66,2.54,2.58,2.54,2.59,2.52,...,2.51,2.76,2.66,-1.0,2.51,-1.0,2.57,2.57,2.55,2.59
12/31/2007,2.54,2.73,2.51,2.56,2.56,2.57,2.53,2.56,2.58,2.56,...,2.56,2.49,2.61,-1.0,2.55,-1.0,2.58,2.61,2.61,2.57
3/31/2008,2.55,2.55,2.48,2.46,2.53,2.49,2.48,2.45,2.5,2.68,...,2.47,2.52,2.53,-1.0,2.57,-1.0,2.59,2.46,2.45,2.52
6/30/2008,2.47,2.57,2.5,2.48,2.49,2.43,2.41,2.35,2.47,2.17,...,2.49,2.24,2.38,-1.0,2.55,-1.0,2.42,2.33,2.55,2.49
9/30/2008,2.43,2.33,2.58,2.45,2.51,2.51,2.52,2.38,2.5,2.48,...,2.52,2.42,2.34,-1.0,2.43,-1.0,2.51,2.34,2.5,2.51


In [5]:
# get list of tickers to rename column names from company names to tickers
# tickers are needed to get price data from yahoo finance
tickers = [
    "MO",  # Altria Group Inc
    "ADM",  # Archer-Daniels-Midland Co
    "AVP",  # Avon Products Inc
    "BF-B",  # Brown-Forman Corp-Class B
    "CPB",  # Campbell Soup Co
    "CLX",  # Clorox Company
    "KO",  # Coca-Cola Co/The
    "CCEP",  # Coca-Cola Europacific Partners (formerly Coca-Cola European Partners)
    "CL",  # Colgate-Palmolive Co
    "CAG",  # Conagra Brands Inc
    "STZ",  # Constellation Brands Inc-A
    "COST",  # Costco Wholesale Corp
    "DFODQ",  # Dean Foods Co
    "EL",  # Estee Lauder Companies-Cl A
    "GIS",  # General Mills Inc
    "HSY",  # Hershey Co/The
    "HSH",  # Hillshire Brands Co/The
    "HRL",  # Hormel Foods Corp
    "SJM",  # JM Smucker Co/The
    "K",  # Kellogg Co (assuming Kellanova is a typo)
    "KDP",  # Keurig Dr Pepper Inc
    "GMCR",  # Keurig Green Mountain Inc
    "KMB",  # Kimberly-Clark Corp
    "KRFT",  # Kraft Foods Group Inc
    "KR",  # Kroger Co
    "LO",  # Lorillard Inc
    "MKC",  # McCormick & Co-Non VTG Shrs
    "MJN",  # Mead Johnson Nutrition Co
    "TAP",  # Molson Coors Beverage Co - B
    "MDLZ",  # Mondelez International Inc-A
    "MNST",  # Monster Beverage 1990 Corp
    "PEP",  # PepsiCo Inc
    "PM",  # Philip Morris International
    "PG",  # Procter & Gamble Co/The
    "RAI",  # Reynolds American Inc
    "SWY",  # Safeway Inc
    "SVU",  # Supervalu Inc
    "SYY",  # Sysco Corp
    "TSN",  # Tyson Foods Inc-Cl A
    "WBA",  # Walgreens Boots Alliance Inc
    "WMT",  # Walmart Inc
    "WFM",  # Whole Foods Market Inc
    "CVS",  # CVS Health Corp
    "ACV",  # Alberto-Culver Co
    "BUD",  # Anheuser-Busch Cos Inc
    "BEAM",  # Beam Suntory Inc
    "KHC",  # Kraft Heinz Foods Co
    "PBG",  # Pepsi Bottling Group Inc
    "UST",  # UST LLC
    "WWY"  # WM Wrigley Jr Co
]

# replace company names with tickers for column names
index_cos = index_cos.set_axis(tickers, axis = 1)
index_cos

Unnamed: 0_level_0,MO,ADM,AVP,BF-B,CPB,CLX,KO,CCEP,CL,CAG,...,WMT,WFM,CVS,ACV,BUD,BEAM,KHC,PBG,UST,WWY
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6/30/2006,2.6,2.65,2.6,2.44,2.65,2.5,2.5,2.67,2.48,2.51,...,2.5,2.61,2.55,2.62,2.52,-1.0,2.52,2.59,2.58,2.52
9/30/2006,2.35,2.53,2.64,2.58,2.49,2.57,2.55,2.56,2.55,2.67,...,2.61,2.69,2.35,2.58,2.56,-1.0,2.58,2.61,2.59,2.54
12/31/2006,2.66,2.58,2.64,2.62,2.58,2.66,2.6,2.66,2.63,2.69,...,2.62,2.52,2.67,-1.0,2.68,-1.0,2.55,2.56,2.72,2.62
3/31/2007,2.65,2.8,2.54,2.59,2.56,2.62,2.6,2.56,2.62,2.62,...,2.6,2.59,2.65,-1.0,2.6,-1.0,2.62,2.66,2.64,2.54
6/30/2007,2.57,2.48,2.48,2.6,2.55,2.49,2.62,2.69,2.52,2.69,...,2.52,2.5,2.51,-1.0,2.55,-1.0,2.59,2.55,2.62,2.53
9/30/2007,2.58,2.54,2.71,2.56,2.66,2.54,2.58,2.54,2.59,2.52,...,2.51,2.76,2.66,-1.0,2.51,-1.0,2.57,2.57,2.55,2.59
12/31/2007,2.54,2.73,2.51,2.56,2.56,2.57,2.53,2.56,2.58,2.56,...,2.56,2.49,2.61,-1.0,2.55,-1.0,2.58,2.61,2.61,2.57
3/31/2008,2.55,2.55,2.48,2.46,2.53,2.49,2.48,2.45,2.5,2.68,...,2.47,2.52,2.53,-1.0,2.57,-1.0,2.59,2.46,2.45,2.52
6/30/2008,2.47,2.57,2.5,2.48,2.49,2.43,2.41,2.35,2.47,2.17,...,2.49,2.24,2.38,-1.0,2.55,-1.0,2.42,2.33,2.55,2.49
9/30/2008,2.43,2.33,2.58,2.45,2.51,2.51,2.52,2.38,2.5,2.48,...,2.52,2.42,2.34,-1.0,2.43,-1.0,2.51,2.34,2.5,2.51


In [6]:
# create dictionary to store constituent stocks at each time step
time_constituent_dict = {}

# for each time step
for row in index_cos.iterrows():

    # get time and constituent data
    time = row[0]
    data = row[1]

    # get companies
    idx = data.index

    # get weight of company at given time
    val = data.values

    # for each company, check if it has a positive weight at this time
    # if so, add it to the list of constituent stocks for the index at this time
    # negative weights indicate that the stock is not in the index at this time and will not be added to the dictionary
    cos = []
    i = 0
    for weight in val:
        if weight > 0:
            cos.append(idx[i])
        i+=1

    # add list of constituent stocks to dictionary
    time_constituent_dict[time] = cos


### Follow same process for second dataframe

In [7]:
# read bloomberg index data from 2014-2023
# drop unneeded columns and rows and transpose to get in time series format
index_cos1 = pd.read_excel('data/14_23_index.XLSX', skiprows=10)
index_cos1.drop(columns=['Unnamed: 0', 'Unnamed: 1'], inplace = True)
index_cos1 = index_cos1.iloc[:55,:]
index_cos1 = index_cos1.T

# reset so that columns are named correctly (companies as column names)
index_cos1 = index_cos1.rename(columns = index_cos1.loc['Unnamed: 2']).iloc[5:,:]

# drop weight columns
index_cos1.drop(columns=[np.nan], inplace = True)

# rename index
index_cos1.index.rename("Date", inplace = True)

# fill all nans with -1 to make for easier parsing
# if weight is -1, then the stock is not in the index at this time step
index_cos1 = index_cos1.fillna(-1)
index_cos1

Unnamed: 0_level_0,ALTRIA GROUP INC,ARCHER-DANIELS-MIDLAND CO,AVON PRODUCTS INC,BROWN-FORMAN CORP-CLASS B,BUNGE GLOBAL SA,CAMPBELL SOUP CO,CHURCH & DWIGHT CO INC,CLOROX COMPANY,COCA-COLA CO/THE,COCA-COLA EUROPACIFIC PARTNE,...,REYNOLDS AMERICAN INC,SAFEWAY INC,SYSCO CORP,TARGET CORP,TYSON FOODS INC-CL A,WALGREENS BOOTS ALLIANCE INC,WALMART INC,WHOLE FOODS MARKET INC,CVS HEALTH CORP,BEAM SUNTORY INC
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3/31/2015,2.5,2.63,-1,2.62,-1.0,2.64,-1.0,2.61,2.61,2.71,...,2.53,-1,2.51,-1.0,2.63,2.6,2.58,2.47,2.57,-1
6/30/2015,2.75,2.51,-1,2.76,-1.0,2.76,-1.0,2.68,2.66,2.69,...,2.81,-1,2.65,-1.0,2.77,2.69,2.65,2.64,2.78,-1
9/30/2015,2.78,2.61,-1,2.71,-1.0,2.76,-1.0,2.83,2.83,2.65,...,2.82,-1,2.63,-1.0,2.68,2.47,2.7,2.64,2.56,-1
12/31/2015,2.65,2.76,-1,2.55,-1.0,2.55,2.6,2.59,2.61,2.59,...,2.66,-1,2.58,-1.0,2.61,2.68,2.66,2.53,2.7,-1
3/31/2016,2.7,2.65,-1,2.68,-1.0,2.66,2.7,2.66,2.74,2.69,...,2.63,-1,2.72,-1.0,2.63,2.76,2.72,2.51,2.76,-1
6/30/2016,2.8,2.68,-1,2.73,-1.0,2.85,2.77,2.82,2.65,-1.0,...,2.83,-1,2.79,-1.0,2.96,2.71,2.76,2.51,2.66,-1
9/30/2016,2.72,2.75,-1,2.85,-1.0,2.72,2.77,2.79,2.74,-1.0,...,2.74,-1,2.69,-1.0,2.83,2.66,2.81,2.71,2.66,-1
12/31/2016,2.76,2.65,-1,2.65,-1.0,2.77,2.7,2.82,2.67,-1.0,...,2.75,-1,2.72,-1.0,2.7,2.59,2.67,2.6,2.66,-1
3/31/2017,2.54,2.78,-1,2.71,-1.0,2.63,2.71,2.66,2.73,-1.0,...,2.83,-1,2.71,-1.0,2.64,2.64,2.8,2.77,2.66,-1
6/30/2017,2.82,2.81,-1,2.82,-1.0,2.63,2.79,2.77,2.83,-1.0,...,2.84,-1,2.6,-1.0,2.94,2.76,2.72,3.36,2.88,-1


In [8]:
# get list of tickers to rename column names from company names to tickers
# tickers are needed to get price data from yahoo finance
tickers = [
    "MO",  # Altria Group Inc
    "ADM",  # Archer-Daniels-Midland Co
    "AVP",  # Avon Products Inc
    "BF-B",  # Brown-Forman Corp-Class B
    "BG",  # Bunge Ltd (assuming Bunge Global SA refers to Bunge Ltd)
    "CPB",  # Campbell Soup Co
    "CHD",  # Church & Dwight Co Inc
    "CLX",  # Clorox Company
    "KO",  # Coca-Cola Co/The
    "CCEP",  # Coca-Cola Europacific Partners
    "CL",  # Colgate-Palmolive Co
    "CAG",  # Conagra Brands Inc
    "STZ",  # Constellation Brands Inc-A
    "COST",  # Costco Wholesale Corp
    "COTY",  # Coty Inc-Cl A
    "DG",  # Dollar General Corp
    "DLTR",  # Dollar Tree Inc
    "EL",  # Estee Lauder Companies-Cl A
    "GIS",  # General Mills Inc
    "HSY",  # Hershey Co/The
    "HRL",  # Hormel Foods Corp
    "SJM",  # JM Smucker Co/The
    "K",  # Kellogg Co (assuming Kellanova is a typo)
    "KENV",  # Kenvue Inc (Note: This ticker might not be accurate; please verify)
    "KDP",  # Keurig Dr Pepper Inc
    "GMCR",  # Keurig Green Mountain Inc
    "KMB",  # Kimberly-Clark Corp
    "KRFT",  # Kraft Foods Group Inc
    "KHC",  # Kraft Heinz Co/The
    "KR",  # Kroger Co
    "LW",  # Lamb Weston Holdings Inc
    "LO",  # Lorillard Inc
    "MKC",  # McCormick & Co-Non VTG Shrs
    "MJN",  # Mead Johnson Nutrition Co
    "TAP",  # Molson Coors Beverage Co - B
    "MDLZ",  # Mondelez International Inc-A
    "MNST",  # Monster Beverage 1990 Corp
    "MNST",  # Monster Beverage Corp (same as above)
    "PEP",  # PepsiCo Inc
    "PM",  # Philip Morris International
    "PG",  # Procter & Gamble Co/The
    "RAI",  # Reynolds American Inc
    "SWY",  # Safeway Inc
    "SYY",  # Sysco Corp
    "TGT",  # Target Corp
    "TSN",  # Tyson Foods Inc-Cl A
    "WBA",  # Walgreens Boots Alliance Inc
    "WMT",  # Walmart Inc
    "WFM",  # Whole Foods Market Inc
    "CVS",  # CVS Health Corp
    "BEAM",  # Beam Suntory Inc
]


# replace company names with tickers for column names
index_cos1 = index_cos1.set_axis(tickers, axis = 1)
index_cos1

Unnamed: 0_level_0,MO,ADM,AVP,BF-B,BG,CPB,CHD,CLX,KO,CCEP,...,RAI,SWY,SYY,TGT,TSN,WBA,WMT,WFM,CVS,BEAM
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3/31/2015,2.5,2.63,-1,2.62,-1.0,2.64,-1.0,2.61,2.61,2.71,...,2.53,-1,2.51,-1.0,2.63,2.6,2.58,2.47,2.57,-1
6/30/2015,2.75,2.51,-1,2.76,-1.0,2.76,-1.0,2.68,2.66,2.69,...,2.81,-1,2.65,-1.0,2.77,2.69,2.65,2.64,2.78,-1
9/30/2015,2.78,2.61,-1,2.71,-1.0,2.76,-1.0,2.83,2.83,2.65,...,2.82,-1,2.63,-1.0,2.68,2.47,2.7,2.64,2.56,-1
12/31/2015,2.65,2.76,-1,2.55,-1.0,2.55,2.6,2.59,2.61,2.59,...,2.66,-1,2.58,-1.0,2.61,2.68,2.66,2.53,2.7,-1
3/31/2016,2.7,2.65,-1,2.68,-1.0,2.66,2.7,2.66,2.74,2.69,...,2.63,-1,2.72,-1.0,2.63,2.76,2.72,2.51,2.76,-1
6/30/2016,2.8,2.68,-1,2.73,-1.0,2.85,2.77,2.82,2.65,-1.0,...,2.83,-1,2.79,-1.0,2.96,2.71,2.76,2.51,2.66,-1
9/30/2016,2.72,2.75,-1,2.85,-1.0,2.72,2.77,2.79,2.74,-1.0,...,2.74,-1,2.69,-1.0,2.83,2.66,2.81,2.71,2.66,-1
12/31/2016,2.76,2.65,-1,2.65,-1.0,2.77,2.7,2.82,2.67,-1.0,...,2.75,-1,2.72,-1.0,2.7,2.59,2.67,2.6,2.66,-1
3/31/2017,2.54,2.78,-1,2.71,-1.0,2.63,2.71,2.66,2.73,-1.0,...,2.83,-1,2.71,-1.0,2.64,2.64,2.8,2.77,2.66,-1
6/30/2017,2.82,2.81,-1,2.82,-1.0,2.63,2.79,2.77,2.83,-1.0,...,2.84,-1,2.6,-1.0,2.94,2.76,2.72,3.36,2.88,-1


In [9]:
# create dictionary to store constituent stocks at each time step
time_constituent_dict1 = {}

# for each time step
for row in index_cos1.iterrows():

    # get time and constituent data
    time = row[0]
    data = row[1]

    # get companies
    idx = data.index

    # get weight of company at given time
    val = data.values

    # for each company, check if it has a positive weight at this time
    # if so, add it to the list of constituent stocks for the index at this time
    # negative weights indicate that the stock is not in the index at this time and will not be added to the dictionary
    cos = []
    i = 0
    for weight in val:
        if weight > 0:
            cos.append(idx[i])
        i+=1

    # add list of constituent stocks to dictionary
    time_constituent_dict1[time] = cos


In [10]:
# print out all keys to make sure all time steps (quarters) are accounted for
keys = time_constituent_dict1.keys()
for key, value in time_constituent_dict1.items():
    time_constituent_dict[key] = value
print(time_constituent_dict.keys())

dict_keys(['6/30/2006', '9/30/2006', '12/31/2006', '3/31/2007', '6/30/2007', '9/30/2007', '12/31/2007', '3/31/2008', '6/30/2008', '9/30/2008', '12/31/2008', '3/31/2009', '6/30/2009', '9/30/2009', '12/31/2009', '3/31/2010', '6/30/2010', '9/30/2010', '12/31/2010', '3/31/2011', '6/30/2011', '9/30/2011', '12/31/2011', '3/31/2012', '6/30/2012', '9/30/2012', '12/31/2012', '3/31/2013', '6/30/2013', '9/30/2013', '12/31/2013', '3/31/2014', '3/31/2015', '6/30/2015', '9/30/2015', '12/31/2015', '3/31/2016', '6/30/2016', '9/30/2016', '12/31/2016', '3/31/2017', '6/30/2017', '9/30/2017', '12/31/2017', '3/31/2018', '6/30/2018', '9/30/2018', '12/31/2018', '3/31/2019', '6/30/2019', '9/30/2019', '12/31/2019', '3/31/2020', '6/30/2020', '9/30/2020', '12/31/2020', '3/31/2021', '6/30/2021', '9/30/2021', '12/31/2021', '3/31/2022', '6/30/2022', '9/30/2022', '12/31/2022', '3/31/2023', '6/30/2023', '9/30/2023', '11/9/2023'])


# 2) Get Constituent Price Data

- Over each time period (quarter), we need price data for each stock which is included in the index. We can find this by iterating through our time_constituent dict.

In [11]:
# get list of keys (time steps) from time_constituent_dict
# used to get price data from yahoo finance
keys = list(time_constituent_dict.keys())

yf_keys = []

# for each time step, get the date in the correct format for yahoo finance
for key in keys:
    key_list = key.split('/')
    new_key = key_list[2] + '-' + key_list[0] + '-' + key_list[1]
    yf_keys.append(new_key)
yf_keys

['2006-6-30',
 '2006-9-30',
 '2006-12-31',
 '2007-3-31',
 '2007-6-30',
 '2007-9-30',
 '2007-12-31',
 '2008-3-31',
 '2008-6-30',
 '2008-9-30',
 '2008-12-31',
 '2009-3-31',
 '2009-6-30',
 '2009-9-30',
 '2009-12-31',
 '2010-3-31',
 '2010-6-30',
 '2010-9-30',
 '2010-12-31',
 '2011-3-31',
 '2011-6-30',
 '2011-9-30',
 '2011-12-31',
 '2012-3-31',
 '2012-6-30',
 '2012-9-30',
 '2012-12-31',
 '2013-3-31',
 '2013-6-30',
 '2013-9-30',
 '2013-12-31',
 '2014-3-31',
 '2015-3-31',
 '2015-6-30',
 '2015-9-30',
 '2015-12-31',
 '2016-3-31',
 '2016-6-30',
 '2016-9-30',
 '2016-12-31',
 '2017-3-31',
 '2017-6-30',
 '2017-9-30',
 '2017-12-31',
 '2018-3-31',
 '2018-6-30',
 '2018-9-30',
 '2018-12-31',
 '2019-3-31',
 '2019-6-30',
 '2019-9-30',
 '2019-12-31',
 '2020-3-31',
 '2020-6-30',
 '2020-9-30',
 '2020-12-31',
 '2021-3-31',
 '2021-6-30',
 '2021-9-30',
 '2021-12-31',
 '2022-3-31',
 '2022-6-30',
 '2022-9-30',
 '2022-12-31',
 '2023-3-31',
 '2023-6-30',
 '2023-9-30',
 '2023-11-9']

In [12]:
# find which tickers are in the index at each time step
consistent_cos = []
i = 0

# for ea
for cos in time_constituent_dict.values():

    # if first time step, add all companies to consistent_cos
    if i == 0:
        consistent_cos = cos

    # if not first time step, find intersection of companies in index at this time step and companies in index at previous time step
    # only keep companies which are in both time steps (i.e. companies which are in the index at each time step)
    else:
        consistent_cos = list(set(consistent_cos) & set(cos))

# do the same for the second time period (2014 - 2023) using the consistent_cos from the first time period (2006 - 2014)
for cos in time_constituent_dict1.values():
    consistent_cos = list(set(consistent_cos) & set(cos))

consistent_cos

['BF-B',
 'GIS',
 'PEP',
 'ADM',
 'CAG',
 'K',
 'KR',
 'SJM',
 'KO',
 'CLX',
 'SYY',
 'TAP',
 'MKC',
 'KMB',
 'CPB',
 'PM',
 'MNST',
 'HRL',
 'MO',
 'CL',
 'TSN',
 'HSY',
 'WMT',
 'PG',
 'MDLZ',
 'STZ',
 'COST',
 'WBA',
 'EL']

In [13]:
# get price data for each company in consistent_cos at each time step
constituent_prices_dict = {}

# for each time step
for date_i in range(0, len(keys)-1):

    # get price data for all consistent companies  at this time step
    df = yf.download(consistent_cos, start = yf_keys[date_i], end = yf_keys[date_i+1], interval = '1d')

    # take only the adjusted close column as the price
    df = df['Adj Close']

    # add price data to dictionary
    constituent_prices_dict[keys[date_i]] = df

    # print out time step to make sure code is running
    print(keys[date_i])

[*********************100%***********************]  29 of 29 completed

ERROR 
1 Failed download:
ERROR ['PM']: Exception("PM: Data doesn't exist for startDate = 1151640000, endDate = 1159588800")



6/30/2006
[*********************100%***********************]  29 of 29 completed

ERROR 
1 Failed download:
ERROR ['PM']: Exception("PM: Data doesn't exist for startDate = 1159588800, endDate = 1167541200")



9/30/2006
[*********************100%***********************]  29 of 29 completed

ERROR 
1 Failed download:
ERROR ['PM']: Exception("PM: Data doesn't exist for startDate = 1167541200, endDate = 1175313600")



12/31/2006
[*********************100%***********************]  29 of 29 completed

ERROR 
1 Failed download:
ERROR ['PM']: Exception("PM: Data doesn't exist for startDate = 1175313600, endDate = 1183176000")



3/31/2007
[*********************100%***********************]  29 of 29 completed

ERROR 
1 Failed download:
ERROR ['PM']: Exception("PM: Data doesn't exist for startDate = 1183176000, endDate = 1191124800")



6/30/2007
[*********************100%***********************]  29 of 29 completed

ERROR 
1 Failed download:
ERROR ['PM']: Exception("PM: Data doesn't exist for startDate = 1191124800, endDate = 1199077200")



9/30/2007
[*********************100%***********************]  29 of 29 completed
12/31/2007
[*********************100%***********************]  29 of 29 completed
3/31/2008
[*********************100%***********************]  29 of 29 completed
6/30/2008
[*********************100%***********************]  29 of 29 completed
9/30/2008
[*********************100%***********************]  29 of 29 completed
12/31/2008
[*********************100%***********************]  29 of 29 completed
3/31/2009
[*********************100%***********************]  29 of 29 completed
6/30/2009
[*********************100%***********************]  29 of 29 completed
9/30/2009
[*********************100%***********************]  29 of 29 completed
12/31/2009
[*********************100%***********************]  29 of 29 completed
3/31/2010
[*********************100%***********************]  29 of 29 completed
6/30/2010
[*********************100%***********************]  29 of 29 completed
9/30/2010
[*************

In [14]:
pd.concat([list(constituent_prices_dict.values())[0], list(constituent_prices_dict.values())[1]])

Unnamed: 0_level_0,ADM,BF-B,CAG,CL,CLX,COST,CPB,EL,GIS,HRL,...,PEP,PG,PM,SJM,STZ,SYY,TAP,TSN,WBA,WMT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2006-06-30,27.333622,10.877379,9.864583,19.878170,36.931801,40.329330,22.135223,15.649321,14.837613,6.580417,...,36.589054,33.737263,,25.648918,22.299345,18.341204,22.911636,11.275488,28.350817,33.057484
2006-07-03,28.492384,10.951970,9.855659,19.871534,37.016605,40.646988,22.499065,15.637186,14.897922,6.619396,...,36.729218,34.162006,,25.706324,22.620462,18.425226,23.002775,11.267902,28.603733,32.645718
2006-07-05,27.949425,10.871286,9.873506,19.835039,36.744038,40.131676,22.487137,15.475309,14.920905,6.530807,...,36.729218,33.937496,,26.503895,22.326109,18.467474,23.063530,10.964390,28.521540,32.268276
2006-07-06,28.121590,10.888032,9.815505,20.253172,36.913628,39.425755,22.481152,15.738358,14.753185,6.559156,...,36.784065,34.253044,,26.171089,22.566940,18.497646,23.056772,11.275488,28.799730,32.062389
2006-07-07,28.518877,10.702300,9.753042,20.203396,36.871235,39.644581,22.163494,15.580526,15.027891,6.537896,...,36.845001,34.283371,,26.331760,22.361788,18.485575,23.279549,10.964390,28.989416,31.568270
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2006-12-22,20.953732,10.039432,12.331244,21.660744,39.216499,37.034306,23.794031,16.853691,17.033995,6.776123,...,38.813057,39.123974,,27.877796,25.332058,22.233290,25.872015,12.707890,29.613972,31.486750
2006-12-26,21.259918,10.082397,12.372019,21.798243,39.314312,37.381950,23.763906,16.870073,17.042723,6.792197,...,38.653126,39.363144,,28.272715,25.474773,22.257551,26.042349,12.662123,29.430000,31.880884
2006-12-27,21.432978,10.151442,12.276890,21.811649,39.406025,37.885662,23.745815,16.890547,16.903053,6.767199,...,38.683880,39.504185,,28.324991,25.706688,22.415274,26.147942,12.669750,29.271391,31.915451
2006-12-28,21.479563,10.179063,12.227055,22.036339,39.503853,37.672836,23.739767,16.935610,16.862316,6.738628,...,38.905319,39.540966,,28.394693,25.804804,22.366753,26.147942,12.631608,29.226990,31.811726


In [15]:
# concatenate all price data (from both time periods) into one dataframe
final_constituent_prices = pd.DataFrame()
for df in constituent_prices_dict.values():
    final_constituent_prices = pd.concat([final_constituent_prices, df])
final_constituent_prices

Unnamed: 0_level_0,ADM,BF-B,CAG,CL,CLX,COST,CPB,EL,GIS,HRL,...,PEP,PG,PM,SJM,STZ,SYY,TAP,TSN,WBA,WMT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2006-06-30,27.333622,10.877379,9.864583,19.878170,36.931801,40.329330,22.135223,15.649321,14.837613,6.580417,...,36.589054,33.737263,,25.648918,22.299345,18.341204,22.911636,11.275488,28.350817,33.057484
2006-07-03,28.492384,10.951970,9.855659,19.871534,37.016605,40.646988,22.499065,15.637186,14.897922,6.619396,...,36.729218,34.162006,,25.706324,22.620462,18.425226,23.002775,11.267902,28.603733,32.645718
2006-07-05,27.949425,10.871286,9.873506,19.835039,36.744038,40.131676,22.487137,15.475309,14.920905,6.530807,...,36.729218,33.937496,,26.503895,22.326109,18.467474,23.063530,10.964390,28.521540,32.268276
2006-07-06,28.121590,10.888032,9.815505,20.253172,36.913628,39.425755,22.481152,15.738358,14.753185,6.559156,...,36.784065,34.253044,,26.171089,22.566940,18.497646,23.056772,11.275488,28.799730,32.062389
2006-07-07,28.518877,10.702300,9.753042,20.203396,36.871235,39.644581,22.163494,15.580526,15.027891,6.537896,...,36.845001,34.283371,,26.331760,22.361788,18.485575,23.279549,10.964390,28.989416,31.568270
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-11-02,72.419998,57.840000,27.670000,74.800003,123.010002,555.969971,41.279999,114.379997,66.239998,33.180000,...,166.830002,151.440002,90.940002,114.070000,238.000000,65.650002,57.830002,47.560001,21.500000,165.520004
2023-11-03,72.910004,59.549999,27.809999,74.820000,125.550003,560.900024,40.990002,110.959999,65.739998,32.869999,...,166.789993,150.070007,91.519997,113.470001,241.679993,67.019997,59.029999,47.060001,22.110001,164.660004
2023-11-06,72.559998,58.970001,27.430000,75.209999,129.000000,569.820007,40.700001,112.669998,65.220001,32.529999,...,166.699997,150.940002,90.970001,112.839996,241.559998,66.820000,59.400002,46.580002,21.770000,164.880005
2023-11-07,72.110001,59.090000,27.389999,75.230003,132.520004,571.270020,40.430000,115.529999,65.099998,32.590000,...,167.179993,150.589996,91.279999,110.139999,242.520004,67.559998,59.110001,46.720001,21.650000,165.649994


In [16]:
# drop PM column since no data was available
final_constituent_prices.drop(columns = ['PM'], inplace = True)
final_constituent_prices

Unnamed: 0_level_0,ADM,BF-B,CAG,CL,CLX,COST,CPB,EL,GIS,HRL,...,MO,PEP,PG,SJM,STZ,SYY,TAP,TSN,WBA,WMT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2006-06-30,27.333622,10.877379,9.864583,19.878170,36.931801,40.329330,22.135223,15.649321,14.837613,6.580417,...,6.225357,36.589054,33.737263,25.648918,22.299345,18.341204,22.911636,11.275488,28.350817,33.057484
2006-07-03,28.492384,10.951970,9.855659,19.871534,37.016605,40.646988,22.499065,15.637186,14.897922,6.619396,...,6.279616,36.729218,34.162006,25.706324,22.620462,18.425226,23.002775,11.267902,28.603733,32.645718
2006-07-05,27.949425,10.871286,9.873506,19.835039,36.744038,40.131676,22.487137,15.475309,14.920905,6.530807,...,6.216880,36.729218,33.937496,26.503895,22.326109,18.467474,23.063530,10.964390,28.521540,32.268276
2006-07-06,28.121590,10.888032,9.815505,20.253172,36.913628,39.425755,22.481152,15.738358,14.753185,6.559156,...,6.592454,36.784065,34.253044,26.171089,22.566940,18.497646,23.056772,11.275488,28.799730,32.062389
2006-07-07,28.518877,10.702300,9.753042,20.203396,36.871235,39.644581,22.163494,15.580526,15.027891,6.537896,...,6.595843,36.845001,34.283371,26.331760,22.361788,18.485575,23.279549,10.964390,28.989416,31.568270
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-11-02,72.419998,57.840000,27.670000,74.800003,123.010002,555.969971,41.279999,114.379997,66.239998,33.180000,...,40.669998,166.830002,151.440002,114.070000,238.000000,65.650002,57.830002,47.560001,21.500000,165.520004
2023-11-03,72.910004,59.549999,27.809999,74.820000,125.550003,560.900024,40.990002,110.959999,65.739998,32.869999,...,40.669998,166.789993,150.070007,113.470001,241.679993,67.019997,59.029999,47.060001,22.110001,164.660004
2023-11-06,72.559998,58.970001,27.430000,75.209999,129.000000,569.820007,40.700001,112.669998,65.220001,32.529999,...,40.540001,166.699997,150.940002,112.839996,241.559998,66.820000,59.400002,46.580002,21.770000,164.880005
2023-11-07,72.110001,59.090000,27.389999,75.230003,132.520004,571.270020,40.430000,115.529999,65.099998,32.590000,...,40.509998,167.179993,150.589996,110.139999,242.520004,67.559998,59.110001,46.720001,21.650000,165.649994


In [17]:
# drop all rows with nans since some rows are duplicated
final_constituent_prices.dropna(axis = 0, inplace = True)
final_constituent_prices

Unnamed: 0_level_0,ADM,BF-B,CAG,CL,CLX,COST,CPB,EL,GIS,HRL,...,MO,PEP,PG,SJM,STZ,SYY,TAP,TSN,WBA,WMT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2006-06-30,27.333622,10.877379,9.864583,19.878170,36.931801,40.329330,22.135223,15.649321,14.837613,6.580417,...,6.225357,36.589054,33.737263,25.648918,22.299345,18.341204,22.911636,11.275488,28.350817,33.057484
2006-07-03,28.492384,10.951970,9.855659,19.871534,37.016605,40.646988,22.499065,15.637186,14.897922,6.619396,...,6.279616,36.729218,34.162006,25.706324,22.620462,18.425226,23.002775,11.267902,28.603733,32.645718
2006-07-05,27.949425,10.871286,9.873506,19.835039,36.744038,40.131676,22.487137,15.475309,14.920905,6.530807,...,6.216880,36.729218,33.937496,26.503895,22.326109,18.467474,23.063530,10.964390,28.521540,32.268276
2006-07-06,28.121590,10.888032,9.815505,20.253172,36.913628,39.425755,22.481152,15.738358,14.753185,6.559156,...,6.592454,36.784065,34.253044,26.171089,22.566940,18.497646,23.056772,11.275488,28.799730,32.062389
2006-07-07,28.518877,10.702300,9.753042,20.203396,36.871235,39.644581,22.163494,15.580526,15.027891,6.537896,...,6.595843,36.845001,34.283371,26.331760,22.361788,18.485575,23.279549,10.964390,28.989416,31.568270
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-11-02,72.419998,57.840000,27.670000,74.800003,123.010002,555.969971,41.279999,114.379997,66.239998,33.180000,...,40.669998,166.830002,151.440002,114.070000,238.000000,65.650002,57.830002,47.560001,21.500000,165.520004
2023-11-03,72.910004,59.549999,27.809999,74.820000,125.550003,560.900024,40.990002,110.959999,65.739998,32.869999,...,40.669998,166.789993,150.070007,113.470001,241.679993,67.019997,59.029999,47.060001,22.110001,164.660004
2023-11-06,72.559998,58.970001,27.430000,75.209999,129.000000,569.820007,40.700001,112.669998,65.220001,32.529999,...,40.540001,166.699997,150.940002,112.839996,241.559998,66.820000,59.400002,46.580002,21.770000,164.880005
2023-11-07,72.110001,59.090000,27.389999,75.230003,132.520004,571.270020,40.430000,115.529999,65.099998,32.590000,...,40.509998,167.179993,150.589996,110.139999,242.520004,67.559998,59.110001,46.720001,21.650000,165.649994


In [41]:
# view final price dataframe
final_constituent_prices

Unnamed: 0_level_0,ADM,BF-B,CAG,CL,CLX,COST,CPB,EL,GIS,HRL,HSY,K,KMB,KO,KR,MDLZ,MKC,MNST,MO,PEP,PG,SJM,STZ,SYY,TAP,TSN,WBA,WMT
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1
2006-06-30,27.333622,10.877379,9.864583,19.878170,36.931801,40.329330,22.135223,15.649321,14.837613,6.580417,36.206779,25.332493,32.204948,12.601964,8.061806,13.184583,11.742983,3.966042,6.225357,36.589054,33.737263,25.648918,22.299345,18.341204,22.911636,11.275488,28.350817,33.057484
2006-07-03,28.492384,10.951970,9.855659,19.871534,37.016605,40.646988,22.499065,15.637186,14.897922,6.619396,36.311970,25.447557,32.246716,12.704489,8.061806,13.248588,11.833987,4.129375,6.279616,36.729218,34.162006,25.706324,22.620462,18.425226,23.002775,11.267902,28.603733,32.645718
2006-07-05,27.949425,10.871286,9.873506,19.835039,36.744038,40.131676,22.487137,15.475309,14.920905,6.530807,36.338291,25.316801,31.964846,12.601964,8.035987,13.286992,11.809355,4.278125,6.216880,36.729218,33.937496,26.503895,22.326109,18.467474,23.063530,10.964390,28.521540,32.268276
2006-07-06,28.121590,10.888032,9.815505,20.253172,36.913628,39.425755,22.481152,15.738358,14.753185,6.559156,36.739338,25.097109,31.943981,12.692770,8.061806,12.958447,12.045119,4.243125,6.592454,36.784065,34.253044,26.171089,22.566940,18.497646,23.056772,11.275488,28.799730,32.062389
2006-07-07,28.518877,10.702300,9.753042,20.203396,36.871235,39.644581,22.163494,15.580526,15.027891,6.537896,36.476353,24.861725,31.636021,12.648827,8.035987,12.885909,11.999370,4.245833,6.595843,36.845001,34.283371,26.331760,22.361788,18.485575,23.279549,10.964390,28.989416,31.568270
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-11-02,72.419998,57.840000,27.670000,74.800003,123.010002,555.969971,41.279999,114.379997,66.239998,33.180000,189.550003,51.730000,121.050003,57.090000,45.400002,67.970001,64.820000,52.660000,40.669998,166.830002,151.440002,114.070000,238.000000,65.650002,57.830002,47.560001,21.500000,165.520004
2023-11-03,72.910004,59.549999,27.809999,74.820000,125.550003,560.900024,40.990002,110.959999,65.739998,32.869999,187.990005,52.060001,119.389999,56.740002,45.369999,68.820000,64.959999,55.560001,40.669998,166.789993,150.070007,113.470001,241.679993,67.019997,59.029999,47.060001,22.110001,164.660004
2023-11-06,72.559998,58.970001,27.430000,75.209999,129.000000,569.820007,40.700001,112.669998,65.220001,32.529999,187.660004,51.340000,120.790001,56.970001,45.160000,68.239998,64.709999,56.139999,40.540001,166.699997,150.940002,112.839996,241.559998,66.820000,59.400002,46.580002,21.770000,164.880005
2023-11-07,72.110001,59.090000,27.389999,75.230003,132.520004,571.270020,40.430000,115.529999,65.099998,32.590000,187.490005,50.900002,120.410004,57.180000,45.110001,68.489998,64.769997,55.980000,40.509998,167.179993,150.589996,110.139999,242.520004,67.559998,59.110001,46.720001,21.650000,165.649994


# 3) Get Index Data Time Series

In [49]:
# get index data from yahoo finance
index = yf.download('RSPS', start = '2006-06-30', end = '2023-11-06', interval = '1d')
index = index['Adj Close']
index = pd.DataFrame(index)
index

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Adj Close
Date,Unnamed: 1_level_1
2006-11-07,1.743284
2006-11-08,1.743284
2006-11-09,1.743284
2006-11-10,1.747295
2006-11-13,1.747295
...,...
2023-10-30,28.910000
2023-10-31,29.049999
2023-11-01,28.930000
2023-11-02,29.430000


# 4) Save Data

In [43]:
# save index constituent prices
final_constituent_prices.to_csv('data/constituent_prices.csv')

In [57]:
# save index prices
index.to_csv('data/index_prices.csv')