# **Extracting and Visualizing Stock Data**

## **Description**

Hi, I'm Eduardo.

This project is for the **Python Project for Data Science IBM Certification**. It involves extracting essential data from a dataset and displaying it, a crucial aspect of data science for making informed decisions. Specifically, we will extract stock data for _Tesla_ and _GameStop_, and visualize it in a graph.

**About the Authors:**

- Joseph Santarcangelo: Holds a PhD in Electrical Engineering. His research focuses on using machine learning, signal processing, and computer vision to understand the impact of videos on human cognition. Joseph has been working for IBM since completing his PhD.
- Azim Hirjani

## **Table of Contents**
1. Define a Function that Makes a Graph

2. Question 1: Use yfinance to Extract Stock Data

3. Question 2: Use Webscraping to Extract Tesla Revenue Data

4. Question 3: Use yfinance to Extract Stock Data

5. Question 4: Use Webscraping to Extract GME Revenue Data

6. Question 5: Plot Tesla Stock Graph

7. Question 6: Plot GameStop Stock Graph

### Libraries

In [1]:
!pip install yfinance
!mamba install bs4
!pip install nbformat

/bin/bash: line 1: mamba: command not found


In [2]:
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

  _empty_series = pd.Series()


In Python, you can ignore warnings using the warnings module. You can use the filterwarnings function to filter or ignore specific warning messages or categories.

In [3]:
import warnings
# Ignore all warnings
warnings.filterwarnings("ignore", category=FutureWarning)

## **Define a Function that Makes a Graph**
In this section, we define the function `make_graph`. This function takes a dataframe with stock data (containing Date and Close columns), a dataframe with revenue data (containing Date and Revenue columns), and the name of the stock.





In [4]:
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
    revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()

## **Question 1: Use yfinance to Extract Stock Data**
Using the `Ticker` function, enter the ticker symbol of the stock you want to extract data on to create a ticker object. The stock is Tesla and its ticker symbol is TSLA.




In [5]:
# Create a ticker object for Tesla
ticker = yf.Ticker("TSLA")

# Extract historical stock data for Tesla
tesla_data = ticker.history(period="max")

# Reset the index of the tesla_data DataFrame
tesla_data.reset_index(inplace=True)

# Display the first five rows of the tesla_data dataframe
print(tesla_data.head())

                       Date      Open      High       Low     Close  \
0 2010-06-29 00:00:00-04:00  1.266667  1.666667  1.169333  1.592667   
1 2010-06-30 00:00:00-04:00  1.719333  2.028000  1.553333  1.588667   
2 2010-07-01 00:00:00-04:00  1.666667  1.728000  1.351333  1.464000   
3 2010-07-02 00:00:00-04:00  1.533333  1.540000  1.247333  1.280000   
4 2010-07-06 00:00:00-04:00  1.333333  1.333333  1.055333  1.074000   

      Volume  Dividends  Stock Splits  
0  281494500        0.0           0.0  
1  257806500        0.0           0.0  
2  123282000        0.0           0.0  
3   77097000        0.0           0.0  
4  103003500        0.0           0.0  


## **Question 2: Use Webscraping to Extract Tesla Revenue Data**
Use the `requests` library to download the webpage [https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm). Save the text of the response as a variable named `html_data`.



In [6]:
# URL of the webpage to be scraped
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm"

# Send a GET request to the URL
response = requests.get(url)

# Save the text of the response as html_data
html_data = response.text

Parse the html data using `beautiful_soup`.

In [7]:
# Parse the HTML data
soup = BeautifulSoup(html_data, 'html.parser')

Using `BeautifulSoup` or the `read_html` function extract the table with `Tesla Revenue` and store it into a dataframe named `tesla_revenue`. The dataframe should have columns `Date` and `Revenue`.

In [8]:
# Find the table containing Tesla Revenue
table = soup.find('table')

# Convert the table HTML to a dataframe
tesla_revenue = pd.read_html(str(table))[0]

# Rename columns
tesla_revenue.columns = ['Date', 'Revenue']

Execute the following line to remove the comma and dollar sign from the `Revenue` column.

In [9]:
tesla_revenue["Revenue"] = tesla_revenue['Revenue'].str.replace(',|\$',"")

Execute the following lines to remove an null or empty strings in the Revenue column.

In [10]:
tesla_revenue.dropna(inplace=True)

tesla_revenue = tesla_revenue[tesla_revenue['Revenue'] != ""]

Display the last 5 row of the `tesla_revenue` dataframe using the `tail` function. Take a screenshot of the results.

In [11]:
tesla_revenue.tail()

Unnamed: 0,Date,Revenue
8,2013,2013
9,2012,413
10,2011,204
11,2010,117
12,2009,112


## **Question 3: Use yfinance to Extract Stock Data**
Using the `Ticker` function, enter the ticker symbol of the stock you want to extract data on to create a ticker object. The stock is GameStop and its ticker symbol is GME.




In [12]:
# Create a ticker object for GameStop (GME)
gme_ticker = yf.Ticker("GME")

# Extract stock information and save it in a dataframe named gme_data
gme_data = gme_ticker.history(period="max")

# Reset the index of the gme_data DataFrame
gme_data.reset_index(inplace=True)

# Display the first five rows of the gme_data dataframe
print(gme_data.head())

                       Date      Open      High       Low     Close    Volume  \
0 2002-02-13 00:00:00-05:00  1.620129  1.693350  1.603296  1.691667  76216000   
1 2002-02-14 00:00:00-05:00  1.712707  1.716074  1.670626  1.683250  11021600   
2 2002-02-15 00:00:00-05:00  1.683250  1.687458  1.658002  1.674834   8389600   
3 2002-02-19 00:00:00-05:00  1.666418  1.666418  1.578047  1.607504   7410400   
4 2002-02-20 00:00:00-05:00  1.615921  1.662210  1.603296  1.662210   6892800   

   Dividends  Stock Splits  
0        0.0           0.0  
1        0.0           0.0  
2        0.0           0.0  
3        0.0           0.0  
4        0.0           0.0  


## **Question 4: Use Webscraping to Extract GME Revenue Data**
Use the `requests` library to download the webpage [https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html). Save the text of the response as a variable named `html_data`.


In [13]:
# URL of the webpage to be scraped
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html"

# Send a GET request to the URL
response = requests.get(url)

# Save the text of the response as html_data
html_data = response.text

In [14]:
# Parse the HTML data
soup = BeautifulSoup(html_data, 'html.parser')

# Find the table with GameStop Revenue
table = soup.find("table")

# Read the table into a dataframe using read_html
gme_revenue = pd.read_html(str(table))[0]

# Rename the columns
gme_revenue.columns = ["Date", "Revenue"]

# Remove the comma and dollar sign from the Revenue column
gme_revenue["Revenue"] = gme_revenue["Revenue"].str.replace(",|\$", "")

# Drop any rows with null or empty strings in the Revenue column
gme_revenue.dropna(inplace=True)
gme_revenue = gme_revenue[gme_revenue["Revenue"] != ""]

gme_revenue.tail()

Unnamed: 0,Date,Revenue
11,2009,8806
12,2008,7094
13,2007,5319
14,2006,3092
15,2005,1843


## **Question 5: Plot Tesla Stock Graph**
Use the `make_graph` function to graph the Tesla Stock Data, also provide a title for the graph. The structure to call the `make_graph` function is `make_graph(tesla_data, tesla_revenue, 'Tesla')`. Note the graph will only show data up to June 2021.


In [15]:
# Convert the Date column in tesla_revenue to datetime objects
tesla_revenue['Date'] = pd.to_datetime(tesla_revenue['Date'], format='%Y')

# Find the maximum date in tesla_data
max_date = tesla_data['Date'].max()

# Assuming max_date is a pandas Timestamp with timezone information
max_date_naive = max_date.tz_localize(None)

# Then perform the comparison
tesla_revenue_filtered = tesla_revenue[tesla_revenue['Date'] <= max_date_naive]

# Plot the graph
make_graph(tesla_data, tesla_revenue_filtered, 'Tesla')

## **Question 6: Plot GameStop Stock Graph**
Use the `make_graph` function to graph the GameStop Stock Data, also provide a title for the graph. The structure to call the `make_graph` function is `make_graph(gme_data, gme_revenue, 'GameStop')`. Note the graph will only show data up to June 2021.

In [16]:
# Convert the Date column in gme_revenue to datetime objects
gme_revenue['Date'] = pd.to_datetime(gme_revenue['Date'], format='%Y')

# Find the maximum date in gme_data
max_date_gme = gme_data['Date'].max()

# Ensure max_date_gme is tz-naive if it has timezone information
max_date_gme = max_date_gme.tz_localize(None) if max_date_gme.tzinfo is not None else max_date_gme

# Filter gme_revenue to include only dates up to the maximum date in gme_data
gme_revenue_filtered = gme_revenue[gme_revenue['Date'] <= max_date_gme]

# Plot the graph
make_graph(gme_data, gme_revenue_filtered, 'GameStop')

## Conclusion


In this project, we successfully extracted and visualized crucial stock data for two highly relevant companies in today's market: Tesla and GameStop. Through the use of Python libraries such as `yfinance` for stock data extraction and `BeautifulSoup` for web scraping revenue data, we demonstrated the power of programming in financial analysis. Additionally, the `plotly` library enabled us to create dynamic and interactive graphs that visually represent the historical share price and revenue trends of these companies.

This notebook not only serves as a testament to the practical application of data science in finance but also showcases the integration of different data sources and technologies to derive meaningful insights. The ability to programmatically access and visualize stock and revenue data is an invaluable skill in today's data-driven decision-making processes.

Moreover, this exercise underscores the importance of visual data representation in understanding market trends and making informed investment decisions. As we continue to navigate through vast amounts of data, the methodologies applied in this project can be extended to other stocks, sectors, and financial indicators, further broadening our analytical capabilities.

This project, poised for inclusion in a GitHub portfolio, exemplifies a comprehensive approach to data extraction, manipulation, and visualization, highlighting the synergy between data science and finance.