# Extracting and Visualizing Stock Data Notebook

This notebook demonstrates how to extract and visualize stock and revenue data for Tesla and GameStop. It uses **yfinance** to fetch historical stock data and web scraping with **BeautifulSoup** and **pandas** to extract quarterly revenue data from Macrotrends. Finally, the notebook uses **Plotly** to create interactive dashboards.

In [1]:
# Install necessary libraries 
!pip install yfinance
!pip install bs4
!pip install plotly
!pip install pandas
!pip install requests




[notice] A new release of pip is available: 25.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 25.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 25.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 25.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 25.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [12]:
# Import required libraries
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Define the graphing function
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(
        rows=2, cols=1, shared_xaxes=True, 
        subplot_titles=("Historical Share Price", "Historical Revenue"),
        vertical_spacing=0.3
    )
    
    fig.add_trace(
        go.Scatter(
            x=pd.to_datetime(stock_data['Date'], infer_datetime_format=True), 
            y=stock_data['Close'].astype('float'), 
            name="Share Price"
        ), 
        row=1, col=1
    )
    
    fig.add_trace(
        go.Scatter(
            x=pd.to_datetime(revenue_data['Date'], infer_datetime_format=True), 
            y=revenue_data['Revenue'].astype('float'), 
            name="Revenue"
        ), 
        row=2, col=1
    )
    
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(
        showlegend=False,
        height=900,
        title=stock,
        xaxis_rangeslider_visible=True
    )
    fig.show()

## Question 1: Extract Tesla Stock Data Using yfinance

We create a ticker object for Tesla (symbol **TSLA**) and extract its maximum historical stock data using yfinance. Then we reset the index and display the first five rows.

In [3]:
# Create the ticker object for Tesla
tesla = yf.Ticker("TSLA")

# Extract the maximum historical stock data
tesla_data = tesla.history(period="max")

# Reset the index and display the first five rows
tesla_data.reset_index(inplace=True)
print(tesla_data.head())

                       Date      Open      High       Low     Close  \
0 2010-06-29 00:00:00-04:00  1.266667  1.666667  1.169333  1.592667   
1 2010-06-30 00:00:00-04:00  1.719333  2.028000  1.553333  1.588667   
2 2010-07-01 00:00:00-04:00  1.666667  1.728000  1.351333  1.464000   
3 2010-07-02 00:00:00-04:00  1.533333  1.540000  1.247333  1.280000   
4 2010-07-06 00:00:00-04:00  1.333333  1.333333  1.055333  1.074000   

      Volume  Dividends  Stock Splits  
0  281494500        0.0           0.0  
1  257806500        0.0           0.0  
2  123282000        0.0           0.0  
3   77097000        0.0           0.0  
4  103003500        0.0           0.0  


## Question 2: Extract Tesla Revenue Data Using Web Scraping

We scrape Tesla's quarterly revenue data from Macrotrends using the requests library and BeautifulSoup. We then clean the data by removing commas and dollar signs, rename the columns, and drop any missing values.

In [None]:

# URL for Tesla revenue data
url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"

# Define custom headers to mimic a browser
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
}

# Download the webpage data with custom headers
html_data = requests.get(url, headers=headers).text

# Parse the HTML data using BeautifulSoup
soup = BeautifulSoup(html_data, "html5lib")

# Extract the table from the fetched HTML content
tesla_revenue = pd.read_html(html_data, match="Tesla Quarterly Revenue", flavor='bs4')[0]

# Strip any extra spaces from column names and print them for inspection
tesla_revenue.columns = [col.strip() for col in tesla_revenue.columns]
print("Columns before renaming:", tesla_revenue.columns)

# Update the rename mapping based on the actual column names
tesla_revenue = tesla_revenue.rename(columns={
    'Tesla Quarterly Revenue (Millions of US $)': 'Date',
    'Tesla Quarterly Revenue (Millions of US $).1': 'Revenue'
})

# Print columns after renaming to verify the change
print("Columns after renaming:", tesla_revenue.columns)

# Clean the Revenue column by removing commas and dollar signs
tesla_revenue["Revenue"] = tesla_revenue["Revenue"].str.replace(",", "").str.replace("$", "")

# Remove rows with missing Revenue values
tesla_revenue.dropna(inplace=True)

# Display the last five rows of the Tesla revenue data
print(tesla_revenue.tail())


Columns before renaming: Index(['Tesla Quarterly Revenue (Millions of US $)', 'Tesla Quarterly Revenue (Millions of US $).1'], dtype='object')
Columns after renaming: Index(['Date', 'Revenue'], dtype='object')
          Date Revenue
57  2010-09-30      31
58  2010-06-30      28
59  2010-03-31      21
61  2009-09-30      46
62  2009-06-30      27


  tesla_revenue = pd.read_html(html_data, match="Tesla Quarterly Revenue", flavor='bs4')[0]


## Question 3: Extract GameStop Stock Data Using yfinance

We create a ticker object for GameStop (symbol **GME**) and extract its maximum historical stock data. Then we reset the index and display the first five rows.

In [5]:
# Create the ticker object for GameStop
gamestop = yf.Ticker("GME")

# Extract the maximum historical stock data
gme_data = gamestop.history(period="max")

# Reset the index and display the first five rows
gme_data.reset_index(inplace=True)
print(gme_data.head())

                       Date      Open      High       Low     Close    Volume  \
0 2002-02-13 00:00:00-05:00  1.620128  1.693350  1.603296  1.691666  76216000   
1 2002-02-14 00:00:00-05:00  1.712707  1.716074  1.670626  1.683250  11021600   
2 2002-02-15 00:00:00-05:00  1.683250  1.687458  1.658001  1.674834   8389600   
3 2002-02-19 00:00:00-05:00  1.666418  1.666418  1.578047  1.607504   7410400   
4 2002-02-20 00:00:00-05:00  1.615920  1.662210  1.603296  1.662210   6892800   

   Dividends  Stock Splits  
0        0.0           0.0  
1        0.0           0.0  
2        0.0           0.0  
3        0.0           0.0  
4        0.0           0.0  


## Question 4: Extract GameStop Revenue Data Using Web Scraping

We scrape GameStop's quarterly revenue data from Macrotrends. After parsing the HTML, we extract the revenue table, clean the data by removing commas and dollar signs, rename the columns, and drop any missing values.

In [9]:
# URL for GameStop revenue data
url = "https://www.macrotrends.net/stocks/charts/GME/gamestop/revenue"

# Define custom headers to mimic a browser
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
}

# Download the webpage data with custom headers
html_data = requests.get(url, headers=headers).text

# Parse the HTML data using BeautifulSoup
soup = BeautifulSoup(html_data, "html5lib")

# Extract the table from the fetched HTML content (not directly from the URL)
gme_revenue = pd.read_html(html_data, match="GameStop Quarterly Revenue", flavor='bs4')[0]

# Optional: Inspect and clean column names
gme_revenue.columns = [col.strip() for col in gme_revenue.columns]
print("Columns before renaming:", gme_revenue.columns)

# Adjust the column names in the rename mapping to match exactly the fetched headers
gme_revenue = gme_revenue.rename(columns={
    'GameStop Quarterly Revenue (Millions of US $)': 'Date',
    'GameStop Quarterly Revenue (Millions of US $).1': 'Revenue'
}, inplace=False)
print("Columns after renaming:", gme_revenue.columns)

# Clean the Revenue column by removing commas and dollar signs
gme_revenue["Revenue"] = gme_revenue["Revenue"].str.replace(",", "").str.replace("$", "")

# Remove rows with missing Revenue values
gme_revenue.dropna(inplace=True)

# Display the last five rows of the GameStop revenue data
print(gme_revenue.tail())

Columns before renaming: Index(['GameStop Quarterly Revenue (Millions of US $)', 'GameStop Quarterly Revenue (Millions of US $).1'], dtype='object')
Columns after renaming: Index(['Date', 'Revenue'], dtype='object')
          Date Revenue
59  2010-01-31    3524
60  2009-10-31    1835
61  2009-07-31    1739
62  2009-04-30    1981
63  2009-01-31    3492


  gme_revenue = pd.read_html(html_data, match="GameStop Quarterly Revenue", flavor='bs4')[0]


## Question 5: Plot Tesla Stock and Revenue Dashboard

We now use the `make_graph` function to create an interactive dashboard that displays Tesla's historical share price and revenue.

In [10]:
# Plot the dashboard for Tesla
make_graph(tesla_data, tesla_revenue, 'Tesla Stock Data Graph')


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.



## Question 6: Plot GameStop Stock and Revenue Dashboard

Similarly, we plot GameStop's stock and revenue data using the `make_graph` function.

In [11]:
# Plot the dashboard for GameStop
make_graph(gme_data, gme_revenue, 'GameStop Stock Data Graph')


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.

