<h1>Yahoo Finance, Web Scraping, and Pandas</h1>

<h3>Step 1: Load Libraries and Get S&P 500 List from Wikipedia</h3>

Import the libraries that we'll need.

In [14]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import yfinance as yf
import sqlite3

Define the website that we want to scrape and transfer it to a beautiful soup object called "soup".

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

In [3]:
# print(soup.prettify())

Let's scrape the information that we want.  We want all the headings in the table on Wikipedia, which includes:
 - Stock abbreviation 
 - Name
 - Industry
 - Date the stock was added to the S&P 500 list.

Found a super powerful way to just use pandas to import a table as a dataframe in one go.

In [4]:
table = soup.find_all('table')
df = pd.read_html(str(table))[0]

<h3>Step 2: Iterate through stock list to obtain price history information using Yahoo Finance API

Make a string of all the ticker symbols (stock abbreviations) that we want so we can use the API to get the information we need.

Alternatively, we can call 10 ticker symbols at a time to not overload the API.  We don't know what the maximum request is.

In [9]:
# Documentation available here:       https://pypi.org/project/yfinance/


tickerStrings = df["Symbol"].tolist()
df_list = list()

# UNCOMMENT TO RUN ----- WARNING ----- TAKES A COUPLE OF MINUTES!
#for ticker in tickerStrings:
#    data = yf.download(ticker, group_by="Ticker", interval='1mo', period = 'max')
#    data['ticker'] = ticker  # add this column because the dataframe doesn't contain a column with the ticker
#    df_list.append(data)

# combine all dataframes into a single dataframe
#stock_histories = pd.concat(df_list)

#print(stock_histories)

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

Let's confirm that our datatypes are correct.

<h3>Step 3:  Check information quality and export

In [13]:
stock_histories.dtypes

Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume       float64
ticker        object
dtype: object

In [16]:
stock_histories.head(3)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1970-08-17,,,,,,,MMM
1970-11-16,,,,,,,MMM
1971-02-11,,,,,,,MMM


Finally, let's export the data we've scraped into a comma seperated values file so we can import into a SQL table, server, or visualizer.

In [15]:
stock_histories.to_csv('Stock_Histories.csv')