# Extracting Amazon Stock data

A company's stock share is a piece of the company more precisely:

A stock (also known as equity) is a security that represents the ownership of a fraction of a corporation. This entitles the owner of the stock to a proportion of the corporation's assets and profits equal to how much stock they own. Units of stock are called "shares."

An investor can buy a stock and sell it later. If the stock price increases, the investor profits, If it decreases,the investor with incur a loss.  Determining the stock price is complex; it depends on the number of outstanding shares, the size of the company's future profits, and much more. People trade stocks throughout the day the stock ticker is a report of the price of a certain stock, updated continuously throughout the trading session by the various stock market exchanges.

You are a data scientist working for a hedge fund; it's your job to determine any suspicious stock activity. In this file you will extract stock data using a Python library. We will use the requests library, it allows us to extract data for stocks returning data in a pandas dataframe.Here,I am choosing Amazon stock data for webscraping.

# INSTALLING LIBRARIES

In [2]:
#!pip install pandas==1.3.3
#!pip install requests==2.26.0
!mamba install bs4==4.10.0 -y
!mamba install html5lib==1.1 -y
!pip install lxml==4.6.4
#!pip install plotly==5.3.1

/bin/bash: mamba: command not found
/bin/bash: mamba: command not found
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting lxml==4.6.4
  Downloading lxml-4.6.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m59.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: lxml
  Attempting uninstall: lxml
    Found existing installation: lxml 4.9.2
    Uninstalling lxml-4.9.2:
      Successfully uninstalled lxml-4.9.2
Successfully installed lxml-4.6.4


# IMPORTING LIBRARIES

In [3]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

# DOWNLOADING WEBPAGE

In [4]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/amazon_data_webpage.html"

data  = requests.get(url).text

# TEXT INTO HTML

In [5]:
soup = BeautifulSoup(data, 'html5lib')

# HTML TABLE INTO DATAFRAME

In [6]:
amazon_data = pd.DataFrame(columns=["Date", "Open", "High", "Low", "Close", "Volume"])

# First we isolate the body of the table which contains all the information
# Then we loop through each row and find all the column values for each row
for row in soup.find("tbody").find_all('tr'):
    col = row.find_all("td")
    date = col[0].text
    Open = col[1].text
    high = col[2].text
    low = col[3].text
    close = col[4].text
    adj_close = col[5].text
    volume = col[6].text
    
    # Finally we append the data of each row to the table
    amazon_data = amazon_data.append({"Date":date, "Open":Open, "High":high, "Low":low, "Close":close, "Adj Close":adj_close, "Volume":volume}, ignore_index=True)    

# EXTRACTED DATAFRAME

In [7]:
amazon_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,"Jan 01, 2021",3270.0,3363.89,3086.0,3206.2,71528900,3206.2
1,"Dec 01, 2020",3188.5,3350.65,3072.82,3256.93,77556200,3256.93
2,"Nov 01, 2020",3061.74,3366.8,2950.12,3168.04,90810500,3168.04
3,"Oct 01, 2020",3208.0,3496.24,3019.0,3036.15,116226100,3036.15
4,"Sep 01, 2020",3489.58,3552.25,2871.0,3148.73,115899300,3148.73


In [8]:
read_html_pandas_data = pd.read_html(url)

In [9]:
amazon_dataframe = read_html_pandas_data[0]

amazon_dataframe.head()

Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
0,"Jan 01, 2021",3270.0,3363.89,3086.0,3206.2,3206.2,71528900
1,"Dec 01, 2020",3188.5,3350.65,3072.82,3256.93,3256.93,77556200
2,"Nov 01, 2020",3061.74,3366.8,2950.12,3168.04,3168.04,90810500
3,"Oct 01, 2020",3208.0,3496.24,3019.0,3036.15,3036.15,116226100
4,"Sep 01, 2020",3489.58,3552.25,2871.0,3148.73,3148.73,115899300


In [10]:
amazon_dataframe

Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
0,"Jan 01, 2021",3270.00,3363.89,3086.00,3206.20,3206.20,71528900
1,"Dec 01, 2020",3188.50,3350.65,3072.82,3256.93,3256.93,77556200
2,"Nov 01, 2020",3061.74,3366.80,2950.12,3168.04,3168.04,90810500
3,"Oct 01, 2020",3208.00,3496.24,3019.00,3036.15,3036.15,116226100
4,"Sep 01, 2020",3489.58,3552.25,2871.00,3148.73,3148.73,115899300
...,...,...,...,...,...,...,...
57,"Apr 01, 2016",590.49,669.98,585.25,659.59,659.59,78464200
58,"Mar 01, 2016",556.29,603.24,538.58,593.64,593.64,94009500
59,"Feb 01, 2016",578.15,581.80,474.00,552.52,552.52,124144800
60,"Jan 01, 2016",656.29,657.72,547.18,587.00,587.00,130200900


# CHECKING DATAFRAME

In [11]:
amazon_dataframe.head(1)

Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
0,"Jan 01, 2021",3270.0,3363.89,3086.0,3206.2,3206.2,71528900


In [12]:
amazon_dataframe.tail(1)

Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
61,*Close price adjusted for splits.**Adjusted cl...,*Close price adjusted for splits.**Adjusted cl...,*Close price adjusted for splits.**Adjusted cl...,*Close price adjusted for splits.**Adjusted cl...,*Close price adjusted for splits.**Adjusted cl...,*Close price adjusted for splits.**Adjusted cl...,*Close price adjusted for splits.**Adjusted cl...


In [13]:
amazon_dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62 entries, 0 to 61
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Date         62 non-null     object
 1   Open         62 non-null     object
 2   High         62 non-null     object
 3   Low          62 non-null     object
 4   Close*       62 non-null     object
 5   Adj Close**  62 non-null     object
 6   Volume       62 non-null     object
dtypes: object(7)
memory usage: 3.5+ KB


# SAVING DATAFRAME

In [14]:
amazon_dataframe.to_csv('amazon stock.csv',index=False)