# NVIDIA Corporation Stock Data

A `stock` (also known as equity) is a security that represents the ownership of a fraction of a corporation. This entitles the owner of the stock to a proportion of the corporation's assets and profits equal to how much stock they own. Units of stock are called `shares`

The `stock ticker` is a report of the price of a certain stock, updated continuously throughout the trading session by the various stock market exchanges

Using `yfinance` library to extract `NVIDIA` Stock Data from `Yahoo Finance`

# 0. Business Understanding

Nvidia Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. 

### Set up

In [1]:
!pip install yfinance
!pip install bs4
!pip install nbformat



In [2]:
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

### Basic Function

In [3]:
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
    revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()

In [4]:
def search_table_index(tb_list, tb_name):
    index = 0
    for table in tb_list:
        for row in table.find_all('tr'):
            for data_cell in row.find_all(['th', 'td']):
                if tb_name in data_cell.text:
                    return index
        index +=1

## 1. Loading Data
- Load stock data of NVIDIA on YahooFinace using `yfinace` library and `Ticker` object.
- Scrape NVIDIA's quarterly revenue from `macrotrends.com` pagesource

In [12]:
# Using 'Ticker' function to create a ticker object

NVDA = yf.Ticker('NVDA')

In [13]:
# Load data into DataFrame

# Get information for the maximum amount of time. 
NVDA_data = NVDA.history(period='max')

NVDA_data.reset_index(inplace=True)

NVDA_data.tail()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
6468,2024-10-07 00:00:00-04:00,124.989998,130.639999,124.949997,127.720001,346250200,0.0,0.0
6469,2024-10-08 00:00:00-04:00,130.259995,133.479996,129.419998,132.889999,285722500,0.0,0.0
6470,2024-10-09 00:00:00-04:00,134.110001,134.520004,131.380005,132.649994,246191600,0.0,0.0
6471,2024-10-10 00:00:00-04:00,131.910004,135.0,131.0,134.809998,242311300,0.0,0.0
6472,2024-10-11 00:00:00-04:00,134.009995,135.779999,133.660004,134.800003,169732000,0.0,0.0


In [14]:
# Page source of NVIDIA Revenue 2010-2024 on Macrotrends.net

with open('nvidia_pagesource.txt') as page_soruce:
    nvda_soup = BeautifulSoup(page_soruce.read(), 'html.parser')

In [15]:
# Extract Quarterly Revenue of NVIDIA

nvda_table = nvda_soup.find_all('table')
search_table_name = 'NVIDIA Quarterly Revenue'
index = search_table_index(nvda_table, search_table_name)

nvda_revenue_table = nvda_table[index]

nvda_revenue = pd.DataFrame(columns=['Date', 'Revenue'])

for row in nvda_revenue_table.find('tbody').find_all('tr'):
    col = row.find_all('td')
    date = pd.to_datetime(col[0].text)
    revenue = col[1].text
    temp_dict = {'Date': [date], "Revenue": [revenue]}
    temp_pf = pd.DataFrame(temp_dict)
    nvda_revenue = pd.concat([nvda_revenue, temp_pf], ignore_index=True)

nvda_revenue.head()

  nvda_revenue = pd.concat([nvda_revenue, temp_pf], ignore_index=True)


Unnamed: 0,Date,Revenue
0,2024-07-31,"$30,040"
1,2024-04-30,"$26,044"
2,2024-01-31,"$22,103"
3,2023-10-31,"$18,120"
4,2023-07-31,"$13,507"


## Data Wragling
- Handle Missing Values
- Correct Data Format
- Data Standardization
- Data Normalization
- Bining and Bins Visulization
- Indicator Variables

### Correct Data Format

Remove the comma and dollar sign from the `Revenue` column. 

In [17]:
nvda_revenue["Revenue"] = nvda_revenue['Revenue'].str.replace(',|\$',"", regex=True)

### Handle missing values

Remove an null or empty strings in the Revenue column.

In [19]:
nvda_revenue.dropna(inplace=True)

nvda_revenue = nvda_revenue[nvda_revenue['Revenue'] != ""]

## Data Visualization

In [20]:
make_graph(NVDA_data, nvda_revenue, "NVIDIA Stock Graph")