<a href="https://colab.research.google.com/github/Aafreen2603/Analyzing-stock-performance/blob/main/Analyzing_Stock_Performance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analyzing Stock Performance

---
Data Extraction and Data Visualization

In this project, I will extract financial data like historical share price and quarterly revenue reportings from various sources using Python libraries and webscraping on popular stocks. After collecting this data I will visualize it in a dashboard to identify patterns or trends. The stocks I will analyse are **AMD, Netflix, GameStop, and Tesla.**

The **stock ticker** is a report of the price of a certain stock, **updated continuously throughout the trading session by the various stock market exchanges**.
In this project, I will be using the **y-finance API** to obtain the stock ticker and extract information about the stock.

### Part 1. Extracting Stock Data
To extract stock data using a Python library. We will use the **yfinance library**, it allows us to extract data for stocks returning data in a pandas dataframe.
To obtain financial data we will use a **Python library** and using **webscraping**, since not all stock data we required for this project is available via a library. We will extract historical stock data from a web-page using **beautiful soup**.

In [2]:
!pip install yfinance==0.1.67

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting yfinance==0.1.67
  Downloading yfinance-0.1.67-py2.py3-none-any.whl (25 kB)
Collecting lxml>=4.5.1
  Downloading lxml-4.9.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.4 MB)
[K     |████████████████████████████████| 6.4 MB 7.4 MB/s 
Installing collected packages: lxml, yfinance
  Attempting uninstall: lxml
    Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
Successfully installed lxml-4.9.1 yfinance-0.1.67


In [3]:
import yfinance as yf
import pandas as pd

Using the **Ticker module in yfinance library** we can create an object, 'amd' that will allow us to access functions to extract data. 
For this, we need to provide the ticker symbol for the stock of the comapany, i.e. **AMD(Advanced Micro Devices).**
The **ticker symbol is "AMD".**


In [4]:
amd = yf.Ticker("AMD")

In [None]:
amd_info = amd.info
amd.info # shows stock info as a Python dictionary

In [6]:
amd_info['country'] # we can access info using keys like 'country' or 'sector'

'United States'

In [7]:
amd_info['sector']

'Technology'

**Share Price of a stock** is the smallest part of a company stock that can be bought.
The history() function can be used to get share price of stock over) a period of time. The period is given a value such as max, 1mo(1 month),1d(1 day),etc. depending on how further back you want the share price to go.

In [8]:
amd_data = amd.history(period="max") # Pandas DataFrame format
amd_data.head() 

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1980-03-17,0.0,3.302083,3.125,3.145833,219600,0,0.0
1980-03-18,0.0,3.125,2.9375,3.03125,727200,0,0.0
1980-03-19,0.0,3.083333,3.020833,3.041667,295200,0,0.0
1980-03-20,0.0,3.0625,3.010417,3.010417,159600,0,0.0
1980-03-21,0.0,3.020833,2.90625,2.916667,130800,0,0.0


In [9]:
amd_data.reset_index(inplace=True) # to reset the index of the DataFrame, inplace is set to true to make all changes in the original dataFrame
amd_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,1980-03-17,0.0,3.302083,3.125,3.145833,219600,0,0.0
1,1980-03-18,0.0,3.125,2.9375,3.03125,727200,0,0.0
2,1980-03-19,0.0,3.083333,3.020833,3.041667,295200,0,0.0
3,1980-03-20,0.0,3.0625,3.010417,3.010417,159600,0,0.0
4,1980-03-21,0.0,3.020833,2.90625,2.916667,130800,0,0.0


In [12]:
# an alternative way to get data from a table in a webpage
url=("https://www.macrotrends.net/stocks/charts/AMD/amd/revenue")
read_html_data = pd.read_html(url) 

amd_revenue = read_html_data[1] # second table on the webpage
amd_revenue.columns = ['Date','Revenue'] # change column headings
amd_revenue["Revenue"] = amd_revenue['Revenue'].str.replace(',|\$',"") # remove dollar sign
amd_revenue.head()

  import sys


Unnamed: 0,Date,Revenue
0,2022-03-31,5887
1,2021-12-31,4826
2,2021-09-30,4313
3,2021-06-30,3850
4,2021-03-31,3445


In [13]:
amd_revenue.dropna(inplace=True)
amd_revenue = amd_revenue[amd_revenue['Revenue'] != ""]
amd_revenue.tail()

Unnamed: 0,Date,Revenue
48,2010-03-31,1574
49,2009-12-31,1646
50,2009-09-30,1396
51,2009-06-30,1184
52,2009-03-31,1177


Extracting stock data for Netflix

In [14]:
netflix = yf.Ticker("NFLX")
netflix_data = netflix.history(period="max")
netflix_data.reset_index(inplace=True)
netflix_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2002-05-23,1.156429,1.242857,1.145714,1.196429,104790000,0,0.0
1,2002-05-24,1.214286,1.225,1.197143,1.21,11104800,0,0.0
2,2002-05-28,1.213571,1.232143,1.157143,1.157143,6609400,0,0.0
3,2002-05-29,1.164286,1.164286,1.085714,1.103571,6757800,0,0.0
4,2002-05-30,1.107857,1.107857,1.071429,1.071429,10154200,0,0.0


### Extracting Data using Webscraping
Since not all data is available via API, I will be using wescraping tools to obtain financial data. 

First we need to download the webpage using the **Requests Library**. 
Then we parse the webpage HTML using the **BeautifulSoup Library** to extract data.
Finally we will combine all our data to build a DataFrame.
We will request [Netflix stock Data](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/netflix_data_webpage.html) from Yahoo Finance.


In [15]:
!mamba install bs4==4.10.0 -y
!mamba install html5lib==1.1 -y
!pip install lxml==4.6.4

/bin/bash: mamba: command not found
/bin/bash: mamba: command not found
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting lxml==4.6.4
  Downloading lxml-4.6.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 4.0 MB/s 
[?25hInstalling collected packages: lxml
  Attempting uninstall: lxml
    Found existing installation: lxml 4.9.1
    Uninstalling lxml-4.9.1:
      Successfully uninstalled lxml-4.9.1
Successfully installed lxml-4.6.4


In [16]:
import requests
from bs4 import BeautifulSoup

In [17]:
url="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/netflix_data_webpage.html"
data = requests.get(url).text

In [18]:
soup = BeautifulSoup(data,'html5lib') # using the BeautifulSoup constructor to create an object 'soup' which will parse through the webpage html

In [19]:
# to convert html page to pandas DataFrame format
netflix_revenue = pd.DataFrame(columns=["Date","Open","High","Low","Close","Volume"])
netflix_revenue.head() # dataframe is empty as of now

Unnamed: 0,Date,Open,High,Low,Close,Volume


In [20]:
# isolate the body of the table which contains all the information -- tbody

# loop through each row to find the column values for each row
for row in soup.find("tbody").find_all('tr'):
    col = row.find_all("td")
    date = col[0].text
    Open = col[1].text
    high = col[2].text
    low = col[3].text
    close = col[4].text
    adj_close = col[5].text
    volume = col[6].text
    
    #append the data of each row to the table
    netflix_revenue = netflix_revenue.append({"Date":date, "Open":Open, "High":high, "Low":low, "Close":close, "Adj Close":adj_close, "Volume":volume}, ignore_index=True)    

In [21]:
netflix_revenue.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,"Jun 01, 2021",504.01,536.13,482.14,528.21,78560600,528.21
1,"May 01, 2021",512.65,518.95,478.54,502.81,66927600,502.81
2,"Apr 01, 2021",529.93,563.56,499.0,513.47,111573300,513.47
3,"Mar 01, 2021",545.57,556.99,492.85,521.66,90183900,521.66
4,"Feb 01, 2021",536.79,566.65,518.28,538.85,61902300,538.85


Extracting [stock data for Tesla](https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0220ENSkillsNetwork23455606-2022-01-01):

In [22]:
tesla = yf.Ticker("TSLA")

In [23]:
tesla_data = tesla.history(period="max")
tesla_data.reset_index(inplace=True)
tesla_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2010-06-29,3.8,5.0,3.508,4.778,93831500,0,0.0
1,2010-06-30,5.158,6.084,4.66,4.766,85935500,0,0.0
2,2010-07-01,5.0,5.184,4.054,4.392,41094000,0,0.0
3,2010-07-02,4.6,4.62,3.742,3.84,25699000,0,0.0
4,2010-07-06,4.0,4.0,3.166,3.222,34334500,0,0.0


In [24]:
# extracting using webscraping 
url=("https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0220ENSkillsNetwork23455606-2022-01-01")

read_html_data = pd.read_html(url) 
tesla_revenue = read_html_data[1] # second table on the webpage
tesla_revenue.columns = ['Date','Revenue'] # change column headings
tesla_revenue["Revenue"] = tesla_revenue['Revenue'].str.replace(',|\$',"") # remove dollar sign
tesla_revenue.head()

  import sys


Unnamed: 0,Date,Revenue
0,2022-03-31,18756
1,2021-12-31,17719
2,2021-09-30,13757
3,2021-06-30,11958
4,2021-03-31,10389


In [25]:
# to clean data -> remove null or empty strings
tesla_revenue.dropna(inplace=True)
tesla_revenue = tesla_revenue[tesla_revenue['Revenue'] != ""]
tesla_revenue.tail()

Unnamed: 0,Date,Revenue
46,2010-09-30,31
47,2010-06-30,28
48,2010-03-31,21
50,2009-09-30,46
51,2009-06-30,27


Extracting [stock data for Gamestop](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html)

In [26]:
gme = yf.Ticker("GME")
gme_data = gme.history(period="max")
gme_data.reset_index(inplace=True)
gme_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2002-02-13,6.480514,6.7734,6.413183,6.766666,19054000,0.0,0.0
1,2002-02-14,6.850829,6.864295,6.682504,6.733001,2755400,0.0,0.0
2,2002-02-15,6.733002,6.749834,6.632007,6.699337,2097400,0.0,0.0
3,2002-02-19,6.665671,6.665671,6.312188,6.430016,1852600,0.0,0.0
4,2002-02-20,6.463682,6.648839,6.413184,6.648839,1723200,0.0,0.0


In [27]:
# extracting using webscraping 
url = ("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html")
read_html_data = pd.read_html(url) 

gme_revenue = read_html_data[1] # second table on the webpage
gme_revenue.columns = ['Date','Revenue'] # change column headings
gme_revenue["Revenue"] = gme_revenue['Revenue'].str.replace(',|\$',"") # remove dollar sign
gme_revenue.head()

  import sys


Unnamed: 0,Date,Revenue
0,2020-04-30,1021
1,2020-01-31,2194
2,2019-10-31,1439
3,2019-07-31,1286
4,2019-04-30,1548


In [28]:
# to clean data -> remove null or empty strings
gme_revenue.dropna(inplace=True)
gme_revenue = gme_revenue[gme_revenue['Revenue'] != ""]
gme_revenue.tail()

Unnamed: 0,Date,Revenue
57,2006-01-31,1667
58,2005-10-31,534
59,2005-07-31,416
60,2005-04-30,475
61,2005-01-31,709


# Data Visualization
### Plotting Stock Graphs
To define a function that creates a graph.
It takes a dataframe with stock data (dataframe must contain Date and Close columns), a dataframe with revenue data (dataframe must contain Date and Revenue columns), and the name of the stock.

In [29]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [30]:
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
    revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()

In [31]:
make_graph(tesla_data, tesla_revenue, 'Tesla') 

In [32]:
make_graph(gme_data, gme_revenue, 'Gamestop')

In [33]:
make_graph(amd_data, amd_revenue, 'AMD')