# Extracting and Visualizing Stock Data

### Description

Extracting essential data from a dataset and displaying it is a necessary part of data science; therefore individuals can make correct decisions based on the data. In this task, I will extract some stock data and then display this data in a graph for visualization.

- I will extract data from a web page and the process is called Web Scraping. 
- After Scraping, I will utilize BeautifulSoup for Data Parsing.
- Finally, I will use custom-built functions for plotting and visualization.

### Dataset

I have chosen the stock datasets of Tesla and Gamestop dates between the years 2010 to 2024. The dataset is stored on cloud provided by `IBMDeveloperSkillsNetwork` (I will use the url to read into my notebook and parse it)



In [1]:
# Importing all the libraries we will need

import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [2]:
def make_graph(stock_data, revenue_data, stock):
    """
    Custom plotting function for the stock data
    
    Args:
        stock_data:  Dataframe with Stock Data (dataframe must contain Date and Close columns)
        revenue_data: Dataframe with Revenue Data (dataframe must contain Date and Revenue columns)
        stock: Name of the stock
        
    """
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
    revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()

We use the **yfinance** library's **Ticker** function to get stock data. I will enter the ticker symbol of the stock I want to extract data by creating a ticker object. The stock is **Tesla** and its ticker symbol is **TSLA**.

In [3]:
tesla = yf.Ticker("TSLA")

Using the ticker object and the function history, I will extract stock information and save it in a dataframe named **tesla_data**. 

In [4]:
# Setting the period parameter to max so I get information for the maximum amount of time.

tesla_data = tesla.history(period = "max")
tesla_data.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2010-06-29 00:00:00-04:00,1.266667,1.666667,1.169333,1.592667,281494500,0.0,0.0
2010-06-30 00:00:00-04:00,1.719333,2.028,1.553333,1.588667,257806500,0.0,0.0
2010-07-01 00:00:00-04:00,1.666667,1.728,1.351333,1.464,123282000,0.0,0.0
2010-07-02 00:00:00-04:00,1.533333,1.54,1.247333,1.28,77097000,0.0,0.0
2010-07-06 00:00:00-04:00,1.333333,1.333333,1.055333,1.074,103003500,0.0,0.0


In [5]:
# Resetting index so date does not act as the index

tesla_data.reset_index(inplace = True)
tesla_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2010-06-29 00:00:00-04:00,1.266667,1.666667,1.169333,1.592667,281494500,0.0,0.0
1,2010-06-30 00:00:00-04:00,1.719333,2.028,1.553333,1.588667,257806500,0.0,0.0
2,2010-07-01 00:00:00-04:00,1.666667,1.728,1.351333,1.464,123282000,0.0,0.0
3,2010-07-02 00:00:00-04:00,1.533333,1.54,1.247333,1.28,77097000,0.0,0.0
4,2010-07-06 00:00:00-04:00,1.333333,1.333333,1.055333,1.074,103003500,0.0,0.0


Utilizing Webscraping to Extract Tesla Revenue Data

In [6]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm"
html_data = requests.get(url).text

In [7]:
# Parse the html data using beautiful_soup

soup = BeautifulSoup(html_data)

Using BeautifulSoup library to extract the table with **Tesla Quarterly Revenue** and store it into a dataframe named **tesla_revenue**. 

**Note:** The dataframe should have columns Date and Revenue for plotting purposes.

In [8]:
data = []
for table in soup.find_all("table"):
    if any(["Tesla Quarterly Revenue".lower() in th.text.lower() for th in table.find_all("th")]):
        for row in table.find("tbody").find_all("tr"):
            date_col, rev_col = [col for col in row.find_all("td")]
            data.append({
                "Date": date_col.text,
                "Revenue": rev_col.text # .replace("$", " ").replace(",", "")
            })
tesla_revenue = pd.DataFrame(data)

In [9]:
tesla_revenue.head()

Unnamed: 0,Date,Revenue
0,2022-09-30,"$21,454"
1,2022-06-30,"$16,934"
2,2022-03-31,"$18,756"
3,2021-12-31,"$17,719"
4,2021-09-30,"$13,757"


In [10]:
# Removing the comma and dollar sign from the Revenue column.

tesla_revenue["Revenue"] = tesla_revenue['Revenue'].str.replace('$',"")
tesla_revenue["Revenue"] = tesla_revenue['Revenue'].str.replace(',',"")


In [11]:
tesla_revenue.head()

Unnamed: 0,Date,Revenue
0,2022-09-30,21454
1,2022-06-30,16934
2,2022-03-31,18756
3,2021-12-31,17719
4,2021-09-30,13757


In [12]:
# Removing any null or empty strings in the Revenue column

tesla_revenue.dropna(inplace=True)

tesla_revenue = tesla_revenue[tesla_revenue['Revenue'] != ""]

In [13]:
# Checking our revenue table after cleaning

tesla_revenue.head()

Unnamed: 0,Date,Revenue
0,2022-09-30,21454
1,2022-06-30,16934
2,2022-03-31,18756
3,2021-12-31,17719
4,2021-09-30,13757


We repeat the above procedure to extract Gamestop Stock Data

In [14]:
gme = yf.Ticker("GME")

In [15]:
# Setting the period parameter to max so I get information for the maximum amount of time.

gme_data = gme.history(period = "max")

In [16]:
# Resetting index so date does not act as the index

gme_data.reset_index(inplace = True)
gme_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2002-02-13 00:00:00-05:00,1.620129,1.69335,1.603296,1.691667,76216000,0.0,0.0
1,2002-02-14 00:00:00-05:00,1.712707,1.716073,1.670626,1.68325,11021600,0.0,0.0
2,2002-02-15 00:00:00-05:00,1.683251,1.687459,1.658002,1.674834,8389600,0.0,0.0
3,2002-02-19 00:00:00-05:00,1.666418,1.666418,1.578047,1.607504,7410400,0.0,0.0
4,2002-02-20 00:00:00-05:00,1.61592,1.662209,1.603296,1.662209,6892800,0.0,0.0


In [17]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html"
html_data = requests.get(url).text

In [18]:
# Parse the html data using beautiful_soup

soup = BeautifulSoup(html_data)

Using BeautifulSoup to extract the table with **GameStop Revenue** and store it into a dataframe named **gme_revenue**. 

**Note:** The dataframe should have columns **Date** and **Revenue** for plotting purposes. 

<details><summary>Click here if you need help locating the table</summary>

```
    
Below is the code to isolate the table, you will now need to loop through the rows and columns like in the previous lab
    
soup.find_all("tbody")[1]
    
If you want to use the read_html function the table is located at index 1


```

</details>


In [19]:
# I also made sure the comma and dollar sign is removed from the `Revenue` column using a method similar to what I did with tesla_revenue DF.

data = []
for table in soup.find_all("table"):
    if any(["GameStop Quarterly Revenue".lower() in th.text.lower() for th in table.find_all("th")]):
        for row in table.find("tbody").find_all("tr"):
            date_col, rev_col = [col for col in row.find_all("td")]
            data.append({
                "Date": date_col.text,
                "Revenue": rev_col.text.replace("$", " ").replace(",", "")
            })
gme_revenue = pd.DataFrame(data)

In [20]:
gme_revenue.tail()

Unnamed: 0,Date,Revenue
57,2006-01-31,1667
58,2005-10-31,534
59,2005-07-31,416
60,2005-04-30,475
61,2005-01-31,709


### Plotting Tesla Stock Graph


In [None]:
make_graph(tesla_data, tesla_revenue, 'Tesla (Revenue vs. Price Comparison)')

![alt text](tesla_stock_data_plot.png)

### Plotting GameStop Stock Graph


In [None]:
make_graph(gme_data, gme_revenue, 'GameStop (Revenue vs. Price Comparison)')

![alt text](gamestop_stock_data_plot.png)



Finally! We have seen how to use Webscraping to load the data available on webpages in our notebooks as dataframes and visualize the trends through plotting.

This is one of the initial steps a Data Scientist has to perform when finding insights in a dataset.



#### END OF NOTEBOOK