<h1>Extracting Stock Data Using a Web Scraping</h1>


Not all stock data is available via API in this assignment; you will use web-scraping to obtain financial data. You will be quizzed on your results.\
Using beautiful soup we will extract historical share data from a web-page.


In [None]:
# !pip install pandas
# !pip install requests
# !pip install bs4
# !pip install plotly

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

## Using Webscraping to Extract Stock Data


Use the `requests` library to download the webpage https://finance.yahoo.com/quote/AMZN/history?period1=1451606400\&period2=1612137600\&interval=1mo\&filter=history\&frequency=1mo\&includeAdjustedClose=true. Save the text of the response as a variable named `html_data`.


In [28]:
url="https://finance.yahoo.com/quote/AMZN/history?period1=1451606400&period2=1612137600&interval=1mo&filter=history&frequency=1mo&includeAdjustedClose=true"
data = requests.get(url).text

Parse the html data using `beautiful_soup`.


In [29]:
soup = BeautifulSoup(data, "html5lib")

<b>Question 1</b> what is the content of the title attribute:


In [30]:
soup.title

<title>Amazon.com, Inc. (AMZN) Stock Historical Prices &amp; Data - Yahoo Finance</title>

In [31]:
#find all html tables in the web page
tables = soup.find_all('table') # in html table is represented by the tag <table>

In [32]:
len(tables)

1

In [33]:
print(tables[0].prettify())

<table class="W(100%) M(0)" data-reactid="33" data-test="historical-prices">
 <thead data-reactid="34">
  <tr class="C($tertiaryColor) Fz(xs) Ta(end)" data-reactid="35">
   <th class="Ta(start) W(100px) Fw(400) Py(6px)" data-reactid="36">
    <span data-reactid="37">
     Date
    </span>
   </th>
   <th class="Fw(400) Py(6px)" data-reactid="38">
    <span data-reactid="39">
     Open
    </span>
   </th>
   <th class="Fw(400) Py(6px)" data-reactid="40">
    <span data-reactid="41">
     High
    </span>
   </th>
   <th class="Fw(400) Py(6px)" data-reactid="42">
    <span data-reactid="43">
     Low
    </span>
   </th>
   <th class="Fw(400) Py(6px)" data-reactid="44">
    <span data-reactid="45">
     Close*
    </span>
   </th>
   <th class="Fw(400) Py(6px)" data-reactid="46">
    <span data-reactid="47">
     Adj Close**
    </span>
   </th>
   <th class="Fw(400) Py(6px)" data-reactid="48">
    <span data-reactid="49">
     Volume
    </span>
   </th>
  </tr>
 </thead>
 <tbody data-

Using beautiful soup extract the table with historical share prices and store it into a dataframe named `amazon_data`. The dataframe should have columns Date, Open, High, Low, Close, Adj Close, and Volume. Fill in each variable with the correct data from the list `col`.

Hint: Print the `col` list to see what data to use


In [34]:
amazon_data = pd.DataFrame(columns=["Date", "Open", "High", "Low", "Close", "Adj Close", "Volume"])

for row in soup.find("tbody").find_all("tr"):
    col = row.find_all("td")
    date = col[0].text
    Open = col[1].text
    high = col[2].text
    low = col[3].text
    close = col[4].text
    adj_close = col[5].text
    volume = col[6].text
    
    amazon_data = amazon_data.append({"Date":date, "Open":Open, "High":high, "Low":low, "Close":close, "Adj Close":adj_close, "Volume":volume}, ignore_index=True)

Print out the first five rows of the `amazon_data` dataframe you created.


In [35]:
amazon_data.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,"Jan 01, 2021",3270.0,3363.89,3086.0,3206.2,3206.2,71528900
1,"Dec 01, 2020",3188.5,3350.65,3072.82,3256.93,3256.93,77556200
2,"Nov 01, 2020",3061.74,3366.8,2950.12,3168.04,3168.04,90810500
3,"Oct 01, 2020",3208.0,3496.24,3019.0,3036.15,3036.15,116226100
4,"Sep 01, 2020",3489.58,3552.25,2871.0,3148.73,3148.73,115899300


<b>Question 2</b> What is the name of the columns of the dataframe


In [27]:
amazon_data.columns

Index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')

<b>Question 3</b> What is the `Open` of `Jun 01, 2019` of the dataframe?


In [46]:
amazon_data.query("Date == 'Jun 01, 2019'")["Open"]

19    1,760.01
Name: Open, dtype: object

In [49]:
# For quiz 2

amazon_data.tail()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
56,"May 01, 2016",663.92,724.23,656.0,722.79,722.79,90614500
57,"Apr 01, 2016",590.49,669.98,585.25,659.59,659.59,78464200
58,"Mar 01, 2016",556.29,603.24,538.58,593.64,593.64,94009500
59,"Feb 01, 2016",578.15,581.8,474.0,552.52,552.52,124144800
60,"Jan 01, 2016",656.29,657.72,547.18,587.0,587.0,130200900


<h2>About the Authors:</h2> 

<a href="https://www.linkedin.com/in/joseph-s-50398b136/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0220ENSkillsNetwork23455606-2021-01-01">Joseph Santarcangelo</a> has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Azim Hirjani


## Change Log

| Date (YYYY-MM-DD) | Version | Changed By    | Change Description        |
| ----------------- | ------- | ------------- | ------------------------- |
| 2020-11-10        | 1.1     | Malika Singla | Deleted the Optional part |
| 2020-08-27        | 1.0     | Malika Singla | Added lab to GitLab       |

<hr>

## <h3 align="center"> © IBM Corporation 2020. All rights reserved. <h3/>

<p>
