<center>
    <img src="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/Logos/organization_logo/organization_logo.png" width="300" alt="cognitiveclass.ai logo"  />
</center>


<h1>Extracting Stock Data Using a Web Scraping</h1>


Not all stock data is available via API in this assignment; you will use web-scraping to obtain financial data. You will be quizzed on your results.\
Using beautiful soup we will extract historical share data from a web-page.


<h2>Table of Contents</h2>
<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li>Downloading the Webpage Using Requests Library</li>
        <li>Parsing Webpage HTML Using BeautifulSoup</li>
        <li>Extracting Data and Building DataFrame</li>
    </ul>
<p>
    Estimated Time Needed: <strong>30 min</strong></p>
</div>

<hr>


In [1]:
#!pip install pandas
#!pip install requests
!pip install bs4
#!pip install plotly



In [2]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

## Using Webscraping to Extract Stock Data Example


First we must use the `request` library to downlaod the webpage, and extract the text. We will extract Sony Group Corporation (SONY) stock data https://finance.yahoo.com/quote/SONY/history?p=SONY.


In [3]:
url = "https://finance.yahoo.com/quote/SONY/history?p=SONY"

data  = requests.get(url).text

Next we must parse the text into html using `beautiful_soup`


In [4]:
soup = BeautifulSoup(data, 'html5lib')

Now we can turn the html table into a pandas dataframe


In [7]:
sony_data = pd.DataFrame(columns=["Date", "Open", "High", "Low", "Close", "Adj Close", "Volume"])

# First we isolate the body of the table which contains all the information
# Then we loop through each row and find all the column values for each row
for row in soup.find("tbody").find_all('tr'):
    col = row.find_all("td")
    date = col[0].text
    Open = col[1].text
    high = col[2].text
    low = col[3].text
    close = col[4].text
    adj_close = col[5].text
    volume = col[6].text
    
    # Finally we append the data of each row to the table
    sony_data = sony_data.append({"Date":date, "Open":Open, "High":high, "Low":low, "Close":close, "Adj Close":adj_close, "Volume":volume}, ignore_index=True)    

We can now print out the dataframe


In [8]:
sony_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,"Jun 25, 2021",97.75,98.52,97.56,98.25,98.25,523500
1,"Jun 24, 2021",96.68,97.15,96.6,96.77,96.77,379000
2,"Jun 23, 2021",96.5,96.63,96.04,96.14,96.14,522000
3,"Jun 22, 2021",97.65,98.23,97.5,97.7,97.7,717300
4,"Jun 21, 2021",96.02,97.04,96.0,96.91,96.91,1062800


We can also use the pandas `read_html` function


In [9]:
read_html_pandas_data = pd.read_html(url)

Beacause there is only one table on the page, we just take the first table in the list returned


In [10]:
sony_dataframe = read_html_pandas_data[0]

sony_dataframe.head()

Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
0,"Jun 25, 2021",97.75,98.52,97.56,98.25,98.25,523500
1,"Jun 24, 2021",96.68,97.15,96.6,96.77,96.77,379000
2,"Jun 23, 2021",96.5,96.63,96.04,96.14,96.14,522000
3,"Jun 22, 2021",97.65,98.23,97.5,97.7,97.7,717300
4,"Jun 21, 2021",96.02,97.04,96.0,96.91,96.91,1062800


## Using Webscraping to Extract Stock Data Exercise


Use the `requests` library to download the webpage [https://finance.yahoo.com/quote/AMZN/history?period1=1451606400\&period2=1612137600\&interval=1mo\&filter=history\&frequency=1mo\&includeAdjustedClose=true](https://finance.yahoo.com/quote/AMZN/history?utm_medium=Exinfluencer\&utm_source=Exinfluencer\&utm_content=000026UJ\&utm_term=10006555\&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0220ENSkillsNetwork23455606-2021-01-01\&period1=1451606400\&period2=1612137600\&interval=1mo\&filter=history\&frequency=1mo\&includeAdjustedClose=true). Save the text of the response as a variable named `html_data`.


In [18]:
amzurl = "https://finance.yahoo.com/quote/AMZN/history?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0220ENSkillsNetwork23455606-2021-01-01&period1=1451606400&period2=1612137600&interval=1mo&filter=history&frequency=1mo&includeAdjustedClose=true"
amzurl_data= requests.get(amzurl).text

Parse the html data using `beautiful_soup`.


In [19]:
amz_soup = BeautifulSoup(amzurl_data,'html5lib')

<b>Question 1</b> What is the content of the title attribute:


In [20]:
amz_soup.title

<title>Amazon.com, Inc. (AMZN) Stock Historical Prices &amp; Data - Yahoo Finance</title>

Using beautiful soup extract the table with historical share prices and store it into a dataframe named `amazon_data`. The dataframe should have columns Date, Open, High, Low, Close, Adj Close, and Volume. Fill in each variable with the correct data from the list `col`.


In [21]:
amazon_data = pd.DataFrame(columns=["Date", "Open", "High", "Low", "Close", "Volume"])

for row in soup.find("tbody").find_all("tr"):
    col = row.find_all("td")
    date = col[0]
    Open = col[1]
    high = col[2]
    low = col[3]
    close = col[4]
    adj_close = col[5]
    volume = col[6]
    
    amazon_data = amazon_data.append({"Date":date, "Open":Open, "High":high, "Low":low, "Close":close, "Adj Close":adj_close, "Volume":volume}, ignore_index=True)

Print out the first five rows of the `amazon_data` dataframe you created.


In [23]:
amazon_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,"[[Jun 25, 2021]]",[[97.75]],[[98.52]],[[97.56]],[[98.25]],"[[523,500]]",[[98.25]]
1,"[[Jun 24, 2021]]",[[96.68]],[[97.15]],[[96.60]],[[96.77]],"[[379,000]]",[[96.77]]
2,"[[Jun 23, 2021]]",[[96.50]],[[96.63]],[[96.04]],[[96.14]],"[[522,000]]",[[96.14]]
3,"[[Jun 22, 2021]]",[[97.65]],[[98.23]],[[97.50]],[[97.70]],"[[717,300]]",[[97.70]]
4,"[[Jun 21, 2021]]",[[96.02]],[[97.04]],[[96.00]],[[96.91]],"[[1,062,800]]",[[96.91]]


<b>Question 2</b> What is the name of the columns of the dataframe


In [None]:
#date, open, high, low, close, volume, adj close

<b>Question 3</b> What is the `Open` of the last row of the amazon_data dataframe?


In [24]:
amazon_data.head

AttributeError: 'DataFrame' object has no attribute 'low'

<h2>About the Authors:</h2> 

<a href="https://www.linkedin.com/in/joseph-s-50398b136/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0220ENSkillsNetwork23455606-2021-01-01">Joseph Santarcangelo</a> has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Azim Hirjani


## Change Log

| Date (YYYY-MM-DD) | Version | Changed By    | Change Description        |
| ----------------- | ------- | ------------- | ------------------------- |
| 2021-06-09       | 1.2     | Lakshmi Holla|Added URL in question 3 |
| 2020-11-10        | 1.1     | Malika Singla | Deleted the Optional part |
| 2020-08-27        | 1.0     | Malika Singla | Added lab to GitLab       |

<hr>

## <h3 align="center"> © IBM Corporation 2020. All rights reserved. <h3/>

<p>
