# Yahoo Finance Scraper: Extracting Financial Data from Yahoo Finance with BeautifulSoup

After practicing using BeautifulSoup to extract data from Wikipedia via Scraping, we will be pulling data from Yahoo Finance to be able to fill any gaps in financial calculations that raw stock data is insufficient for, like book data from financial statements

## NOTE: YOU NEED TO ADD A USER AGENT TO YOUR REQUESTS!!! EMAIL YOUR PROF

### Links:

Yahoo Finance: https://finance.yahoo.com/

Stock Quote: https://finance.yahoo.com/quote/{}?p={}

Statistics: https://finance.yahoo.com/quote/{}/key-statistics?p={}

Historical Data: Skipped, the yfinance package can handle this for us

Profile: https://finance.yahoo.com/quote/{}/profile?p={}

Financials: https://finance.yahoo.com/quote/{}/financials?p={}

## Step 1: Imports and Configs

For this experiment we will be using AMD as our stock

In [24]:
import requests as rq
from bs4 import BeautifulSoup
import pandas as pd
import json

In [33]:
user_agent_path = '/Users/dB/.secret/.user-agent.json'

with open(user_agent_path) as f: # f = file
    ua = json.load(f) # return file contents as a json

headers = ua

symbol = 'AMD'
url_q = f'https://finance.yahoo.com/quote/{symbol}?p={symbol}' # q = Quote
url_s = f'https://finance.yahoo.com/quote/{symbol}/key-statistics?p={symbol}' # s = Statistics
url_p = f'https://finance.yahoo.com/quote/{symbol}/profile?p={symbol}' # p = Profile
url_f = f'https://finance.yahoo.com/quote/{symbol}/financials?p={symbol}' # f = Financials

## Step 2a: Scraping Quotes

### Creating the BeautifulSoup Object

In [34]:
response = rq.get(url=url_q, headers=headers)
print(response)

<Response [200]>


In [27]:
raw_html = response.text
soup = BeautifulSoup(raw_html, 'html')

### Scraping the Table

In [28]:
# Access the Quote Summary Table
table = soup.find('table', class_='W(100%)')

# Data is just a list of rows, so pull from 'tr's and 'td's
rows = table.find_all('td')
print(rows)

[<td class="C($primaryColor) W(51%)"><span>Previous Close</span></td>, <td class="Ta(end) Fw(600) Lh(14px)" data-test="PREV_CLOSE-value">162.67</td>, <td class="C($primaryColor) W(51%)"><span>Open</span></td>, <td class="Ta(end) Fw(600) Lh(14px)" data-test="OPEN-value">165.80</td>, <td class="C($primaryColor) W(51%)"><span>Bid</span></td>, <td class="Ta(end) Fw(600) Lh(14px)" data-test="BID-value">175.88 x 800</td>, <td class="C($primaryColor) W(51%)"><span>Ask</span></td>, <td class="Ta(end) Fw(600) Lh(14px)" data-test="ASK-value">175.99 x 800</td>, <td class="C($primaryColor) W(51%)"><span>Day's Range</span></td>, <td class="Ta(end) Fw(600) Lh(14px)" data-test="DAYS_RANGE-value">162.20 - 174.25</td>, <td class="C($primaryColor) W(51%)"><span>52 Week Range</span></td>, <td class="Ta(end) Fw(600) Lh(14px)" data-test="FIFTY_TWO_WK_RANGE-value">71.54 - 174.25</td>, <td class="C($primaryColor) W(51%)"><span>Volume</span></td>, <td class="Ta(end) Fw(600) Lh(14px)" data-test="TD_VOLUME-valu

In [29]:
# Strip the row, organize into key-value pairs
count = 0
l = []
d = {}
for r in rows:
    if count < 2:
        text = r.text.strip() # gets a string
        l.append(text)
        count+=1

    else:
        text = r.text.strip()
        d[l[0]] = l[1] # Create a dictionary Entry labeled 'Title':'Text'
        count=1
        l = [text]
print(d)


{'Previous Close': '162.67', 'Open': '165.80', 'Bid': '175.88 x 800', 'Ask': '175.99 x 800', "Day's Range": '162.20 - 174.25', '52 Week Range': '71.54 - 174.25', 'Volume': '137,240,249'}


### Creating the DataFrame

In [30]:
columns = d.keys()
df = pd.DataFrame(columns=columns)
df.loc[len(df)] = d.values()
df

Unnamed: 0,Previous Close,Open,Bid,Ask,Day's Range,52 Week Range,Volume
0,162.67,165.8,175.88 x 800,175.99 x 800,162.20 - 174.25,71.54 - 174.25,137240249


## Step 2b: Scraping Statistics

### Creating the BeautifulSoup Object

In [35]:
response = rq.get(url_s, headers=headers)
print(response)

<Response [200]>


In [36]:
raw_html = response.text
soup = BeautifulSoup(raw_html, 'html')

### Scrape the Table

In [70]:
# Pull The Row with Column Names
cols = soup.find('tr', class_='Bdtw(0px) C($primaryColor)')

# Pull the Column Names from the Top Row
col_names = cols.find_all('th')

col_1 = col_names[0]
formatted_cols = col_names[1:]

for c in formatted_cols:
    print(c.text.strip())
# Remove any extra info from the Current Date



As of Date: 1/19/2024Current
9/30/2023
6/30/2023
3/31/2023
12/31/2022
