<a href="https://www.kaggle.com/code/mubasherbajwa/web-scraping-stock-data-using-beautifulsoup?scriptVersionId=202334744" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# 1.0 Introduction
Web Scraping is a process of extracting data and storing it in structured tables. In today’s data-driven world where the amount of data is growing exponentially, the biggest repository of data is on the web. Using web scraping, a huge amount of data can be extracted in a very short time.

**Libraries used:**

**Requests:**
The requests module allows python to make HTTP requests to specified urls. It returns a response object with all the response data.

**BeautifulSoup:**
The BeautifulSoup is a python library that pulls out html and xml files.

**Pandas:**
Pandas is a powerful and open-source library used for data manipulation and analysis. It consists of data structures and functions.

# 2.0 Setting Up
To get started, we first load the required libraries: requests, beautifulsoup4, pandas

In [1]:
from requests import get
from bs4 import BeautifulSoup as bs
import pandas as pd

# 3.0 Making a HTTP request to a server
When one visits a website via browser, an HTTP request is sent to the browser and once the request is approved, one can visit the site. This is done using ‘get’ or ‘request.get’. Similarly, while web scraping, one has to send an HTTP request. The website used to obtain the Nasdaq stock data is ***investing.com***.

In [2]:
url="https://www.investing.com/indices/nasdaq-composite-historical-data"
response=get(url)

# 4.0 Extracting the website’s code
Once getting the access to the website, we extract the html and xml code using the beautifulsoup library. This code determines the structure of the website’s content.

In [3]:
soup=bs(response.text)

# 5.0 Extract Data from the html content of the website
First, we see the html code in the website. Click on the URL and upon right click we get the inspect option. 

We need to extract the table part from the website and get the following output:

In [4]:
table = soup.find("table")
table

<table class="freeze-column-w-1 w-full overflow-x-auto text-xs leading-4"><thead class="relative after:absolute after:bottom-0 after:left-0 after:right-0 after:h-px after:bg-[#ECEDEF]"><tr class="h-[41px]"><th class="datatable-v2_cell__IwP1U sticky left-0 min-w-[106px] bg-white text-left font-semibold text-v2-black"><div class="datatable-v2_cell__wrapper__7O0wk !block md:!inline-flex"><button class="relative inline-flex items-center justify-center whitespace-nowrap rounded-sm p-1.5 text-xs font-bold leading-tight no-underline disabled:bg-inv-grey-50 disabled:text-inv-grey-400 text-inv-grey-700 datatable-v2_sort__oHEK5 !font-semibold !leading-4 text-[#181C21]" type="button"><span>Date</span><span class="flex flex-col"><svg aria-hidden="true" class="datatable-v2_sort-icon__lTndM datatable-v2_sort-icon-up__hKaZQ" fill="none" style="height:auto" viewbox="0 0 24 24" width="1em"><path d="m1 6 11 11L23 6H1Z" fill="currentColor"></path></svg><svg aria-hidden="true" class="datatable-v2_sort-ico

Next, we extract the rows and columns of the table and convert it into a dataframe.

In [5]:
# Finds the <thead> element within the HTML table, which typically contains the table headers.
thead = table.find("thead")

# Finds all <th> elements within the <thead> element, representing the table columns.
cols = thead.find_all("th")

#  Creates a list of column names by extracting the text content from each <th> element and removing leading/trailing whitespace.
columns = [a.text.strip() for a in cols]

# Finds the <tbody> element within the HTML table, which contains the table data.
tbody = table.find("tbody")

# Finds all <tr> elements within the <tbody> element, representing the table rows.
rows = tbody.find_all("tr")

data=[]

# Iterates over each row in the table.
for row in rows:
    
    # Finds all <td> elements within the current row, representing the table cells.
    row_ele = row.find_all("td")
    
    # Extracts the text content from each <td> element and creates a list of cell values.
    row_ele = [b.text.strip() for b in row_ele]
    
    # Creates a dictionary where the keys are the column names and the values are the corresponding cell values from the current row.
    df=dict(zip(columns,row_ele))
    
    # Adds a new key-value pair to the dictionary, setting the "Stock Name" column to "NASDAQ". 
    df['Stock Name']='NASDAQ'
    data.append(df)
Data=pd.DataFrame(data)
Data.head(10)

Unnamed: 0,Date,Price,Open,High,Low,Vol.,Change %,Stock Name
0,"Oct 18, 2024",18489.55,18466.01,18524.33,18452.58,1.01B,+0.63%,NASDAQ
1,"Oct 17, 2024",18373.61,18537.21,18541.46,18368.79,1.03B,+0.04%,NASDAQ
2,"Oct 16, 2024",18367.08,18333.29,18383.11,18214.96,1.01B,+0.28%,NASDAQ
3,"Oct 15, 2024",18315.59,18515.97,18564.25,18252.52,1.15B,-1.01%,NASDAQ
4,"Oct 14, 2024",18502.69,18426.66,18547.91,18423.6,901.62M,+0.87%,NASDAQ
5,"Oct 11, 2024",18342.94,18217.73,18375.53,18208.44,912.77M,+0.33%,NASDAQ
6,"Oct 10, 2024",18282.05,18200.62,18333.39,18154.18,957.62M,-0.05%,NASDAQ
7,"Oct 09, 2024",18291.62,18179.22,18302.05,18133.02,898.50M,+0.60%,NASDAQ
8,"Oct 08, 2024",18182.92,18017.93,18203.04,17989.7,944.80M,+1.45%,NASDAQ
9,"Oct 07, 2024",17923.9,18080.12,18096.33,17900.04,1.01B,-1.18%,NASDAQ


# 6.0 Conclusion
And this concludes our web scraping using beautifulsoup in python. Web scraping is a powerful method to extract data from websites, especially, for the data scientists.