# Reconstruction of the NASDAQ index composition in the years covered by the analysis

In order to conduct an analysis as accurate as possible, we needed to retrieve not only the time series of prices for the stocks that currently make up the NASDAQ index, but also the change in the index's composition over the past years.

In [None]:
import pandas as pd
import numpy as np
import os
import re
from datetime import datetime

Through the document *nasdaq_constituent.json*, we have reconstructed all stocks that joined or ceased to be part of the Index during the period we considered:

In [None]:
changes = pd.read_json('nasdaq_constituent.json')
changes.date.astype

<bound method NDFrame.astype of 0     2024-03-18
1     2023-12-18
2     2023-12-18
3     2023-12-18
4     2023-12-18
         ...    
401   1995-03-06
402   1995-02-27
403   1995-02-13
404   1995-01-27
405   1995-01-27
Name: date, Length: 406, dtype: datetime64[ns]>

The last update of the NASDAQ in the year 2013 was on 23 December, so we exclude previous updates:

In [None]:
changes = changes[changes.date >= '2013-12-01']
changes.shape

(121, 7)

In [None]:
dates = changes["date"].unique().tolist()
len(dates)

55

In [None]:
url = 'https://en.wikipedia.org/wiki/Nasdaq-100'
# Read all tables from the webpage
tables = pd.read_html(url)
tickers = tables[4]["Ticker"].to_list()
len(tickers)

101

We can observe that the number of stocks exceeds by one; this is because the google share appears in two versions (with and without dividends).

We reconstructed the history of the index composition during the period considered and saved the data in a dataframe:

In [None]:
# Creating the DataFrame with the column named "components"
composition = pd.DataFrame()

In [None]:
for date in dates:

    change = changes[changes.date == date]
    added = change["symbol"].tolist()
    removed = change["removedTicker"].tolist()

    # Unique elements in 'a'
    unique_added = [a for a in added if a not in removed]

    # Unique elements in 'b'
    unique_removed = [r for r in removed if r not in added]

    print("\n", date)
    print(len(tickers))


    new_tickers = []

    for r in unique_removed:

        if (r != '') & (r not in tickers):

            print("add: ", r)
            tickers.append(r)

    new_tickers = tickers.copy()


    print(len(new_tickers))

    new_tickers_2 = []

    for a in unique_added:

        if (a != '') & (a in new_tickers):

            print("remove ",a)
            new_tickers.remove(a)

    new_tickers_2 = new_tickers.copy()

    print(len(new_tickers_2))

    if (len(added) != len(removed)):
        count += 1

    temp_df = pd.DataFrame({"components": [new_tickers_2]})
    temp_df.index = [date]

    tickers = new_tickers_2.copy()

    composition = pd.concat([composition, temp_df], axis = 0)


 2024-03-18 00:00:00
96
add:  SPLK
97
97

 2023-12-18 00:00:00
97
add:  SGEN
add:  ALGN
add:  ENPH
add:  JD
add:  LCID
add:  ZM
103
remove  SPLK
102

 2023-07-17 00:00:00
102
102
102

 2023-06-20 00:00:00
102
add:  RIVN
103
103

 2023-06-07 00:00:00
103
103
103

 2022-12-19 00:00:00
103
add:  VRSN
add:  SWKS
add:  SPLK
add:  MTCH
add:  DOCU
108
remove  RIVN
107

 2022-11-21 00:00:00
107
add:  OKTA
108
remove  ENPH
107

 2022-02-22 00:00:00
107
107
107

 2022-02-02 00:00:00
107
107
107

 2022-01-24 00:00:00
107
add:  PTON
108
108

 2021-12-20 00:00:00
108
add:  CDW
add:  TCOM
add:  INCY
add:  FOX
112
remove  LCID
111

 2021-08-26 00:00:00
111
111
111

 2021-07-21 00:00:00
111
111
111

 2020-12-21 00:00:00
111
add:  BMRN
add:  TTWO
add:  ULTA
114
remove  MTCH
remove  OKTA
remove  PTON
111

 2020-10-19 00:00:00
111
111
111

 2020-08-24 00:00:00
111
111
111

 2020-07-20 00:00:00
111
add:  CSGP
112
112

 2020-06-22 00:00:00
112
add:  UAL
113
remove  DOCU
112

 2020-04-30 00:00:00
112
add: 

In [None]:
composition.iloc[::-1]

Unnamed: 0,components
2013-12-23,"[ADBE, GOOGL, AMZN, AMGN, ADI, AAPL, AMAT, ARM..."
2014-04-03,"[ADBE, GOOGL, AMZN, AMGN, ADI, AAPL, AMAT, ARM..."
2014-11-06,"[ADBE, GOOGL, AMZN, AMGN, ADI, AAPL, AMAT, ARM..."
2014-12-22,"[ADBE, GOOGL, AMZN, AMGN, ADI, AAPL, AMAT, ARM..."
2015-03-23,"[ADBE, GOOGL, AMZN, AMGN, ADI, AAPL, AMAT, ARM..."
2015-07-01,"[ADBE, GOOGL, AMZN, AMGN, ADI, AAPL, AMAT, ARM..."
2015-07-24,"[ADBE, GOOGL, AMZN, AMGN, ADI, AAPL, AMAT, ARM..."
2015-07-27,"[ADBE, GOOGL, AMZN, AMGN, ADI, AAPL, AMAT, ARM..."
2015-07-29,"[ADBE, GOOGL, AMZN, AMGN, ADI, AAPL, AMAT, ARM..."
2015-07-31,"[ADBE, GOOGL, AMZN, AMGN, ADI, AAPL, AMAT, ARM..."


In [None]:
composition.to_csv('Data/nas_comps.csv')