# 📊 Extracting and Visualizing Stock Data

This project focuses on retrieving, analyzing, and visualizing historical stock and revenue data for companies like **Tesla** and **GameStop** using Python tools.

### 📌 Objective
To practice data extraction from web sources and APIs, clean the data using `pandas`, and visualize it using `Plotly` to uncover patterns and trends.

---

🧠 **Note:** This project was inspired by a lab assignment from the *IBM Data Science Professional Certificate* on Coursera.  
All code, analysis, and commentary included here are my original work. Assignment-specific questions and instructions have been removed to respect Coursera's [Honor Code](https://www.coursera.org/about/honorcode).

---


In [1]:
%pip install yfinance==0.2.38
%pip install pandas==2.2.2
%pip install nbformat

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install --upgrade yfinance
%pip install bs4
%pip install --upgrade nbformat
%pip install plotly
%pip install bs4

Collecting yfinance
  Downloading yfinance-0.2.63-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting curl_cffi>=0.7 (from yfinance)
  Downloading curl_cffi-0.11.3-cp39-abi3-win_amd64.whl.metadata (15 kB)
Collecting protobuf>=3.19.0 (from yfinance)
  Downloading protobuf-6.31.1-cp310-abi3-win_amd64.whl.metadata (593 bytes)
Collecting websockets>=13.0 (from yfinance)
  Downloading websockets-15.0.1-cp312-cp312-win_amd64.whl.metadata (7.0 kB)
Downloading yfinance-0.2.63-py2.py3-none-any.whl (118 kB)
   ---------------------------------------- 0.0/118.4 kB ? eta -:--:--
   --- ------------------------------------ 10.2/118.4 kB ? eta -:--:--
   ------------- ------------------------- 41.0/118.4 kB 495.5 kB/s eta 0:00:01
   -------------------- ------------------ 61.4/118.4 kB 469.7 kB/s eta 0:00:01
   ----------------------- --------------- 71.7/118.4 kB 393.8 kB/s eta 0:00:01
   ------------------------------ -------- 92.2/118.4 kB 438.1 kB/s eta 0:00:01
   ---------------------------------

In [1]:
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In Python, you can ignore warnings using the warnings module. You can use the filterwarnings function to filter or ignore specific warning messages or categories.


In [2]:
import warnings
# Ignore all warnings
warnings.filterwarnings("ignore", category=FutureWarning)

### Define Graphing Function


In this section, we define the function `make_graph`. **You don't have to know how the function works, you should only care about the inputs. It takes a dataframe with stock data (dataframe must contain Date and Close columns), a dataframe with revenue data (dataframe must contain Date and Revenue columns), and the name of the stock.**


In [3]:
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
    revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()

Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is Tesla and its ticker symbol is `TSLA`.


In [4]:
Tesla = yf.Ticker("TSLA")

Using the ticker object and the function `history` extract stock information and save it in a dataframe named `tesla_data`. Set the `period` parameter to ` "max" ` so we get information for the maximum amount of time.


In [5]:
tesla_data = Tesla.history(period="max")

In [6]:
tesla_data.reset_index(inplace=True)
print(tesla_data.head())

                       Date      Open      High       Low     Close  \
0 2010-06-29 00:00:00-04:00  1.266667  1.666667  1.169333  1.592667   
1 2010-06-30 00:00:00-04:00  1.719333  2.028000  1.553333  1.588667   
2 2010-07-01 00:00:00-04:00  1.666667  1.728000  1.351333  1.464000   
3 2010-07-02 00:00:00-04:00  1.533333  1.540000  1.247333  1.280000   
4 2010-07-06 00:00:00-04:00  1.333333  1.333333  1.055333  1.074000   

      Volume  Dividends  Stock Splits  
0  281494500        0.0           0.0  
1  257806500        0.0           0.0  
2  123282000        0.0           0.0  
3   77097000        0.0           0.0  
4  103003500        0.0           0.0  


Use the `requests` library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm Save the text of the response as a variable named `html_data`.


In [7]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm"
html_data = requests.get(url).text
#print (html_data)

Parse the html data using `beautiful_soup` using parser i.e `html5lib` or `html.parser`. Make sure to use the `html_data` with the content parameter as follow `html_data.content` .


In [8]:
soup = BeautifulSoup(html_data,'html5lib')
#print(soup.prettify())

Using `BeautifulSoup` or the `read_html` function extract the table with `Tesla Revenue` and store it into a dataframe named `tesla_revenue`. The dataframe should have columns `Date` and `Revenue`.


In [9]:
#Finding all tables
tables = soup.find_all('table')
tesla_table = None

#looping through all tables to find the quaterly revenue table
for table in tables:
    if "Tesla Quarterly Revenue" in table.text:
        tesla_table = table
        break

#initializing a df
tesla_revenue = pd.DataFrame(columns=['Date','Revenue'])

#looping through rows and extracting table data
for row in tesla_table.find('tbody').find_all('tr'):
    columns = row.find_all('td')
  
    #ensuring that extracted data has exactly 2 cols
    if len(columns) ==2:
  
        #creating new row in dataframe with extracted date and cleaned revenue values
        date = columns[0].text.strip()
        revenue = columns[1].text.strip()
        #concating/ appending the new row with the extracted date and revenue INSIDE THE LOOP
        tesla_revenue = pd.concat([tesla_revenue, pd.DataFrame({"Date": [date], "Revenue": [revenue]})], ignore_index=True)

Remove the comma and dollar sign from the `Revenue` column. 


In [12]:
tesla_revenue["Revenue"] = tesla_revenue['Revenue'].str.replace(r',|\$',"", regex=True)

Remove an null or empty strings in the Revenue column.


In [13]:
tesla_revenue.dropna(inplace=True)
tesla_revenue = tesla_revenue[tesla_revenue['Revenue'] != ""]
#print(tesla_revenue)

Display the last 5 row of the `tesla_revenue` dataframe using the `tail` function.


In [14]:
tesla_revenue_tail= tesla_revenue.tail()
print(tesla_revenue_tail)

          Date Revenue
48  2010-09-30      31
49  2010-06-30      28
50  2010-03-31      21
52  2009-09-30      46
53  2009-06-30      27


Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is GameStop and its ticker symbol is `GME`.


In [15]:
GameStop = yf.Ticker("GME")

Using the ticker object and the function `history` we'll extract stock information and save it in a dataframe named `gme_data`. Set the `period` parameter to ` "max" ` so we get information for the maximum amount of time.


In [16]:
gme_data = GameStop.history(period="max")

In [17]:
gme_data.reset_index(inplace=True)
print(gme_data.head())

                       Date      Open      High       Low     Close    Volume  \
0 2002-02-13 00:00:00-05:00  1.620128  1.693350  1.603296  1.691666  76216000   
1 2002-02-14 00:00:00-05:00  1.712707  1.716074  1.670626  1.683250  11021600   
2 2002-02-15 00:00:00-05:00  1.683251  1.687459  1.658002  1.674834   8389600   
3 2002-02-19 00:00:00-05:00  1.666418  1.666418  1.578047  1.607504   7410400   
4 2002-02-20 00:00:00-05:00  1.615921  1.662210  1.603296  1.662210   6892800   

   Dividends  Stock Splits  
0        0.0           0.0  
1        0.0           0.0  
2        0.0           0.0  
3        0.0           0.0  
4        0.0           0.0  


Use the `requests` library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html. 


In [18]:
url2 = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html"
html_data_2 = requests.get(url2).text
#print(html_data_2)

Parse the html data using `beautiful_soup` using parser i.e `html5lib` or `html.parser`.


In [19]:
soup2 = BeautifulSoup(html_data_2, 'html5lib')
#print(soup2.prettify())

Using `BeautifulSoup` or the `read_html` function extract the table with `GameStop Revenue` and store it into a dataframe named `gme_revenue`. The dataframe should have columns `Date` and `Revenue`. Make sure the comma and dollar sign is removed from the `Revenue` column.


In [20]:
# Read all tables from the webpage
tables = pd.read_html(url2)

# Select the second table as mentioned it's in index 1
gme_revenue = tables[1]
#print(gme_revenue)

# Rename columns
gme_revenue.columns = ['Date', 'Revenue']

# Clean the revenue column by removing the dollar signs and commas
gme_revenue['Revenue'] = gme_revenue['Revenue'].replace({'\$': '', ',': ''}, regex=True)

# Convert revenue to a numeric type
#gme_revenue['Revenue'] = pd.to_numeric(gme_revenue['Revenue'])
#print(gme_revenue)


  gme_revenue['Revenue'] = gme_revenue['Revenue'].replace({'\$': '', ',': ''}, regex=True)


Display the last five rows of the `gme_revenue` dataframe using the `tail` function.


In [21]:
print(gme_revenue.tail())

          Date Revenue
57  2006-01-31    1667
58  2005-10-31     534
59  2005-07-31     416
60  2005-04-30     475
61  2005-01-31     709


Using the `make_graph` function to graph the Tesla Stock Data. Note the graph will only show data upto June 2021.


In [22]:
make_graph(tesla_data,tesla_revenue,"Tesla Qurterly Revenue")

Using the `make_graph` function to graph the GameStop Stock Data. Note the graph will only show data upto June 2021.


In [23]:
make_graph(gme_data,gme_revenue,"GameStop Revenue")