Subscribe for more [Be.Analyst](https://youtube.com/@Be.Analyst) 😀

# Web Scraping with Python

## 1. What is Web Scraping?

- Web Scraping is the act of extracting content and data from a website. 

- Websites are built using Hypertext Markup Language (HTML) codes which web scraping codes or web scrapers can download objects from. 

- Python is a powerful tool that allows you to use code to web scrape a website.

## 2. Install the Necessary Libraries

In [1]:
!pip install pandas
!pip install bs4
!pip install requests

Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py): started
  Building wheel for bs4 (setup.py): finished with status 'done'
  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1264 sha256=ea1604c0f848296d840aeca88d612dbaa80468a7bf89bd6f7200a0adbdfbabba
  Stored in directory: c:\users\durge\appdata\local\pip\cache\wheels\e4\62\1d\d4d1bc4f33350ff84227f89b258edb552d604138e3739f5c83
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1


## 3. Import the Libraries

In [2]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

## 4. Understand the Website

The website I'll use in this tutorial is [trendyol](https://www.trendyol.com/). I want to exract laptop information such as price, brand, ratingCount in this website.

## 5. Understand the URL

Understanding how to interact with the URL is important to web scraping. The url I'll use is [here](https://www.trendyol.com/sr?q=notebook&qt=notebook&st=notebook&os=1&pi=1)

I'm going to extract information first 10 pages of the website I get the above URL.

## 6. Create Empty Arrays

In [3]:
price = []
brand = []
ratingCount = []
info = []

## 7. Web Scraping

I'll use a for loop, which creates an element pgn that goes through the numbers 1 through 10. The next portion is creating a link, this is broken out into 2 parts of the URL, the last section populate after page = as we identified when researching the URL. Thus, each page will be read using the request statement requests and stored in res. 

Then the Beautiful Soup package will give us a way to interact with the HTML from the URL and store this in soup. Next, is a series of for loops within our initial for loop: the first aspect of it locates the CSS Selector (note: SelectorGadget used), and inside the loop returns the result as text then appends to the array. The loop runs till it goes through the first 10 pages.

In [4]:
for pgn in range(1,10):  
    url = "https://www.trendyol.com/sr?q=notebook&qt=notebook&st=notebook&os=" + str(pgn)
    res = requests.get(url)
    soup = BeautifulSoup(res.text)    
    for brand_select in soup.select(".prdct-desc-cntnr-ttl"):
        brand.append(brand_select)
    for ratingCount_select in soup.select(".ratingCount"):
        ratingCount.append(ratingCount_select)
    for info_select in soup.select(".prdct-desc-cntnr-name"):
        info.append(info_select)   
    for price_select in soup.select(".prc-box-dscntd"):
        price.append(price_select)  

## 8. Creating the DataFrame

In [5]:
df=pd.DataFrame(columns=['Brand','Rating_Count', 'Info', 'Price'])
df['Brand']=pd.DataFrame(brand)
df['Rating_Count']=pd.DataFrame(ratingCount)
df['Info']=pd.DataFrame(info)
df['Price']=pd.DataFrame(price)
print(df.shape)
df.head(10)

(216, 4)


Unnamed: 0,Brand,Rating_Count,Info,Price
0,ASUS,(65),X515ea-bq2293w Intel Core I3-1115g4 4 Gb Ram 1...,6.959 TL
1,ASUS,(6),X515ja-ej3064w Fhd I3-1005g1 4g 256 Pcie Share...,"7.024,90 TL"
2,ACER,(6),Aspıre 3 A315-510 Intel I3 8gb Ram 256ssd 15.6...,7.745 TL
3,ACER,(3),Nitro 5 An515-45-r423 Notebook (nh.qbaey.005),17.499 TL
4,ACER,(45),Aspire3 A315-56-33zg Intel Core I3 1005g1 4gb ...,6.794 TL
5,ASUS,(1),"X515ea-bq868 I3-1115g4 8 Gb 256 Gb Ssd 15.6"" F...",8.499 TL
6,HP,(46),Pavilion 14-dv0012nt Core I5 1135g7 8gb 256gb ...,13.999 TL
7,LENOVO,(35),IdeaPad Gaming3 Ryzen7 5800H 16GB 512GB SSD RT...,25.999 TL
8,ASUS,(41),"X515ea-bq868 I3-1115g4 8 Gb 512 Gb Ssd 15.6"" F...",8.799 TL
9,Huawei,(16),Matebook D15 I3 10110u 8gb 256gb Ssd W10 Home ...,12.685 TL


Thank for reading 😀

Don't forget to follow us on [YouTube](http://youtube.com/@Be.Analyst) | [Medium](https://medium.com/@durgeshanalyst) | [Twitter](https://twitter.com/DurgeshBR?t=2LDCN4pHkZOYIo3rMXvKnw&s=09) | [GitHub](http://github.com/durgeshanalyst) | [Linkedin](https://www.linkedin.com/in/durgeshanalyst/) | [Kaggle](https://www.kaggle.com/durgeshanalyst) 😎