# Extracting S&P 500 Tickers from Wikipedia

This Python script extracts the list of tickers of S&P 500 companies from Wikipedia, processes the data, and saves it into a Parquet file. Below is the code commented.

## Library Imports

In [1]:
# You can install the required packages using the following commands
# %pip install pandas
# %pip install lxml

import pandas as pd

In [4]:
# Wikipedia URL with the list of S&P 500
url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"

# Read tables from Wikipedia
tables = pd.read_html(url)

# Extract the first table containing the tickers
df = tables[0]

# Select only the 'Symbol' column and remove any duplicates
df_tickers = df[['Symbol']].drop_duplicates()

# Rename the 'Symbol' column to 'Ticker'
df_tickers = df_tickers.rename(columns={'Symbol': 'Ticker'})

# Sort the tickers alphabetically but maintain the index order
df_tickers = df_tickers.sort_values(by='Ticker').reset_index(drop=True)

# Get the total number of tickers
total_tickers = df_tickers.shape[0]

# Save as a Parquet file (for efficient storage and retrieval)
df_tickers.to_parquet("sp500_tickers.parquet", index=False)

# Display results
print(f"Total number of tickers: {total_tickers}")

Total number of tickers: 503


In [5]:
df_tickers.head()

Unnamed: 0,Ticker
0,A
1,AAPL
2,ABBV
3,ABNB
4,ABT
