# Web Scraping with Python

## 1. What is Web Scraping?

- Web Scraping is the act of extracting content and data from a website. 

- Websites are built using Hypertext Markup Language (HTML) codes which web scraping codes or web scrapers can download objects from. 

- Python is a powerful tool that allows you to use code to web scrape a website.

## 2. Install the Necessary Libraries

In [None]:
!pip install pandas
!pip install bs4
!pip install requests

## 3. Import the Libraries

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

## 4. Understand the Website

The website I'll use in this tutorial is [trendyol](https://www.trendyol.com/). I want to exract laptop information such as price, brand, ratingCount in this website.

## 5. Understand the URL

Understanding how to interact with the URL is important to web scraping. The url I'll use is [here](https://www.trendyol.com/sr?q=notebook&qt=notebook&st=notebook&os=1&pi=1)

I'm going to extract information first 10 pages of the website I get the above URL.

## 6. Create Empty Arrays

In [2]:
price = []
brand = []
ratingCount = []
info = []

## 7. Web Scraping

I'll use a for loop, which creates an element pgn that goes through the numbers 1 through 10. The next portion is creating a link, this is broken out into 2 parts of the URL, the last section populate after page = as we identified when researching the URL. Thus, each page will be read using the request statement requests and stored in res. 

Then the Beautiful Soup package will give us a way to interact with the HTML from the URL and store this in soup. Next, is a series of for loops within our initial for loop: the first aspect of it locates the CSS Selector (note: SelectorGadget used), and inside the loop returns the result as text then appends to the array. The loop runs till it goes through the first 10 pages.

In [3]:
for pgn in range(1,10):  
    url = "https://www.trendyol.com/sr?q=notebook&qt=notebook&st=notebook&os=" + str(pgn)
    res = requests.get(url)
    soup = BeautifulSoup(res.text)    
    for brand_select in soup.select(".prdct-desc-cntnr-ttl"):
        brand.append(brand_select)
    for ratingCount_select in soup.select(".ratingCount"):
        ratingCount.append(ratingCount_select)
    for info_select in soup.select(".prdct-desc-cntnr-name"):
        info.append(info_select)   
    for price_select in soup.select(".prc-box-dscntd"):
        price.append(price_select)  

## 8. Creating the DataFrame

In [4]:
df=pd.DataFrame(columns=['Brand','Rating_Count', 'Info', 'Price'])
df['Brand']=pd.DataFrame(brand)
df['Rating_Count']=pd.DataFrame(ratingCount)
df['Info']=pd.DataFrame(info)
df['Price']=pd.DataFrame(price)
print(df.shape)
df.head(10)

(216, 4)


Unnamed: 0,Brand,Rating_Count,Info,Price
0,Dell,(13),"Latitude 3410 I5 10210u 14"" 8gb Win10 Pro Note...","19.330,07 TL"
1,ACER,(7),Aspire3 A315-56-327t Intel Core I3 1005g1 8gb ...,"5.779,55 TL"
2,LENOVO,(12),Ideapad Intel Core I5-1135g7 8gb/512gb Ssd 15....,"10.142,02 TL"
3,ASUS,(67),X515ma-ej490 Intel Celeron N4020 4gb 256gb Ssd...,"4.791,10 TL"
4,LENOVO,(38),V15 82c700ldtx Athlon Gold 3150u 8gb Ram 256gb...,"7.623,42 TL"
5,Huawei,(41),Matebook 14 AMD Ryzen 5 4600H 8GB Ram 512GB SS...,13.999 TL
6,LENOVO,(60),Yoga 9 82bg007ptx I7-1185g7 16gb Ram 1tb Ssd I...,31.999 TL
7,HP,(17),Pavilion 14-dv0012nt Core I5 1135g7 8gb 256gb ...,13.999 TL
8,ASUS,(5),Laptop X515ja-br1968t I3-1005g1 4gb Ram 256gb ...,6.798 TL
9,ACER,(23),Aspire A314-21 Amd A4-9120 4gb Ram 128gb Ssd 1...,3.900 TL
