<a href="https://colab.research.google.com/github/J-saderr/Final_Assignment/blob/main/Final_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Extracting and Visualizing Stock Data</h1>
<h2>Description</h2>


<h2>Table of Contents</h2>
<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li>Define a Function that Makes a Graph</li>
        <li>Question 1: Use yfinance to Extract Stock Data</li>
        <li>Question 2: Use Webscraping to Extract Tesla Revenue Data</li>
        <li>Question 3: Use yfinance to Extract Stock Data</li>
        <li>Question 4: Use Webscraping to Extract GME Revenue Data</li>
        <li>Question 5: Plot Tesla Stock Graph</li>
        <li>Question 6: Plot GameStop Stock Graph</li>
    </ul>
<p>
    Estimated Time Needed: <strong>30 min</strong></p>
</div>

<hr>


In [None]:
!pip install yfinance
!pip install bs4



In [None]:
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## Define Graphing Function


In this section, we define the function `make_graph`. You don't have to know how the function works, you should only care about the inputs. It takes a dataframe with stock data (dataframe must contain Date and Close columns), a dataframe with revenue data (dataframe must contain Date and Revenue columns), and the name of the stock.


In [None]:
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data.Date, infer_datetime_format=True), y=stock_data.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data.Date, infer_datetime_format=True), y=revenue_data.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()

## Question 1: Use yfinance to Extract Stock Data


Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is Tesla and its ticker symbol is `TSLA`.


In [None]:
tesla = yf.Ticker("TSLA")

Using the ticker object and the function `history` extract stock information and save it in a dataframe named `tesla_data`. Set the `period` parameter to `max` so we get information for the maximum amount of time.


In [None]:
tesla_data = tesla.history(period="max")

**Reset the index** using the `reset_index(inplace=True)` function on the tesla_data DataFrame and display the first five rows of the `tesla_data` dataframe using the `head` function. Take a screenshot of the results and code from the beginning of Question 1 to the results below.


In [None]:
tesla_data.reset_index(inplace=True)
tesla_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2010-06-29 00:00:00-04:00,1.266667,1.666667,1.169333,1.592667,281494500,0.0,0.0
1,2010-06-30 00:00:00-04:00,1.719333,2.028,1.553333,1.588667,257806500,0.0,0.0
2,2010-07-01 00:00:00-04:00,1.666667,1.728,1.351333,1.464,123282000,0.0,0.0
3,2010-07-02 00:00:00-04:00,1.533333,1.54,1.247333,1.28,77097000,0.0,0.0
4,2010-07-06 00:00:00-04:00,1.333333,1.333333,1.055333,1.074,103003500,0.0,0.0


## Question 2: Use Webscraping to Extract Tesla Revenue Data


In [None]:
url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"

# Add headers to mimic a browser
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'}

# Use requests to retrieve data from the URL with the headers
response = requests.get(url, headers=headers)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content with BeautifulSoup using 'lxml' parser
    soup = BeautifulSoup(response.content, "lxml")

    # Find the table with the class 'historical_data_table'
    table = soup.find('table', {'class': 'historical_data_table'})

    # Read the table into a pandas DataFrame
    tesla_revenue = pd.read_html(str(table))[0]

    # Rename the columns for clarity
    tesla_revenue = tesla_revenue.rename(columns={
        'Tesla Annual Revenue (Millions of US $)': 'Date',
        'Tesla Annual Revenue (Millions of US $).1': 'Revenue'
    }, inplace=False)

    # Clean up the 'Revenue' column by removing commas and dollar signs
    tesla_revenue["Revenue"] = tesla_revenue["Revenue"].str.replace(",", "").str.replace("$", "")

    # Display the first few rows of the dataframe
    print(tesla_revenue.head())
else:
    print(f'Failed to retrieve data, status code: {response.status_code}')

   Date Revenue
0  2023   96773
1  2022   81462
2  2021   53823
3  2020   31536
4  2019   24578


In [None]:
tesla_revenue

Unnamed: 0,Date,Revenue
0,2023,96773
1,2022,81462
2,2021,53823
3,2020,31536
4,2019,24578
5,2018,21461
6,2017,11759
7,2016,7000
8,2015,4046
9,2014,3198


Display the last 5 row of the `tesla_revenue` dataframe using the `tail` function. Take a screenshot of the results.


In [None]:
tesla_revenue.dropna(inplace=True)
tesla_revenue.tail()

Unnamed: 0,Date,Revenue
10,2013,2013
11,2012,413
12,2011,204
13,2010,117
14,2009,112


## Question 3: Use yfinance to Extract Stock Data


Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is GameStop and its ticker symbol is `GME`.


In [None]:
gamestop = yf.Ticker("GME")

Using the ticker object and the function `history` extract stock information and save it in a dataframe named `gme_data`. Set the `period` parameter to `max` so we get information for the maximum amount of time.


In [None]:
gme_data=gamestop.history(period="max")

**Reset the index** using the `reset_index(inplace=True)` function on the gme_data DataFrame and display the first five rows of the `gme_data` dataframe using the `head` function. Take a screenshot of the results and code from the beginning of Question 3 to the results below.


In [None]:
gme_data.reset_index(inplace=True)
gme_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2002-02-13 00:00:00-05:00,1.620128,1.69335,1.603296,1.691666,76216000,0.0,0.0
1,2002-02-14 00:00:00-05:00,1.712707,1.716074,1.670626,1.683251,11021600,0.0,0.0
2,2002-02-15 00:00:00-05:00,1.68325,1.687458,1.658001,1.674834,8389600,0.0,0.0
3,2002-02-19 00:00:00-05:00,1.666418,1.666418,1.578047,1.607504,7410400,0.0,0.0
4,2002-02-20 00:00:00-05:00,1.61592,1.66221,1.603296,1.66221,6892800,0.0,0.0


## Question 4: Use Webscraping to Extract GME Revenue Data


In [None]:
url = "https://www.macrotrends.net/stocks/charts/GME/gamestop/revenue"
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'}

# Sử dụng requests để lấy dữ liệu từ URL với headers đã định nghĩa
response = requests.get(url, headers=headers)

# Kiểm tra xem yêu cầu đã thành công hay không
if response.status_code == 200:
    # Phân tích nội dung HTML với BeautifulSoup sử dụng parser 'lxml'
    soup = BeautifulSoup(response.content, "lxml")

    # Tìm bảng có class 'historical_data_table'
    table = soup.find('table', {'class': 'historical_data_table'})

    # Đọc bảng vào DataFrame của pandas
    gme_revenue = pd.read_html(str(table))[0]

    # Đổi tên các cột cho rõ ràng

    gme_revenue.columns = ['Date', 'Revenue']

    # Làm sạch cột 'Revenue' bằng cách loại bỏ dấu phẩy và ký hiệu đô la
    gme_revenue['Revenue'] = gme_revenue['Revenue'].replace({'\$': '', ',': ''}, regex=True)

    # Hiển thị một số hàng đầu tiên của dataframe
    print(gme_revenue.head())
else:
    print(f'Không thể lấy dữ liệu, mã trạng thái: {response.status_code}')

   Date Revenue
0  2024    5273
1  2023    5927
2  2022    6011
3  2021    5090
4  2020    6466


Display the last five rows of the `gme_revenue` dataframe using the `tail` function. Take a screenshot of the results.


In [None]:
gme_revenue.dropna(inplace=True)
gme_revenue.tail()

Unnamed: 0,Date,Revenue
11,2013,8887
12,2012,9551
13,2011,9474
14,2010,9078
15,2009,8806


## Question 5: Plot Tesla Stock Graph


Use the `make_graph` function to graph the Tesla Stock Data, also provide a title for the graph. The structure to call the `make_graph` function is `make_graph(tesla_data, tesla_revenue, 'Tesla')`


In [None]:
make_graph(tesla_data, tesla_revenue, 'Tesla Stock Data Graph')


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.



## Question 6: Plot GameStop Stock Graph


Use the `make_graph` function to graph the GameStop Stock Data, also provide a title for the graph. The structure to call the `make_graph` function is `make_graph(gme_data, gme_revenue, 'GameStop')`.


In [None]:
make_graph(gme_data, gme_revenue, 'GameStop Stock Data Graph')


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.

