# Web scrapping with selenium

We are trying to scrape data from of televisions from [daraz](https://www.daraz.com.bd/televisions/?spm=a2a0e.home.cate_7.1.735212f70hPqYs)

First we install and load the required libraries

In [None]:
%pip install selenium
%pip install webdriver_manager
%pip install pandas

In [1]:
from selenium import webdriver
from time import sleep
import pandas as pd

## Here we go
Here we are doing for one page to check it it works correctly.

In [3]:
from datetime import date
today = str(date.today())
print(today)

2023-12-22


In [2]:
driver = webdriver.Chrome()

In [4]:
url ="https://www.daraz.com.bd/televisions/?spm=a2a0e.home.cate_7.1.735212f70hPqYs"

In [5]:
driver.get(url)

Here we try to find information using `xpath`.

In [6]:
tvs = driver.find_elements('xpath','//div[@data-spm="sku"]/div') 
all_tvs = []
for tv_idx , tv in enumerate(tvs,1):
    title = driver.find_element('xpath',f'(//div[@class="title--wFj93"]/a)[{tv_idx}]')
    dis_price = driver.find_element('xpath',f'(//div[@class="price--NVB62"]/span)[{tv_idx}]')
    try:
        pre_price = driver.find_element('xpath', f'(//del[@class="currency--GVKjl"])[{tv_idx}]').text
    except:
        pre_price = " "    
    data = {
        'title' : title.text,
        "url" : title.get_attribute('href'),
        'DIS_price' : dis_price.text,
        'pre_price' : pre_price
        
    }
    all_tvs.append(data)


The code work nicely.

In [8]:
df = pd.DataFrame(all_tvs)
df.head()

Unnamed: 0,title,url,DIS_price,pre_price
0,SP 24 Inch ULTRA SLIM HD LED TV,https://www.daraz.com.bd/products/sp-24-inch-u...,"৳ 7,800","৳ 10,500"
1,Sony Plus 32'' 4K Supported Android Smart Tele...,https://www.daraz.com.bd/products/sony-plus-32...,"৳ 16,450","৳ 19,900"
2,SONY PLUS 43 android version 9.0 Ram 2GB/16GB ...,https://www.daraz.com.bd/products/sony-plus-43...,"৳ 19,989","৳ 33,000"
3,Exceptional Quality - 24 Inch Hd LED Televisio...,https://www.daraz.com.bd/products/exceptional-...,"৳ 8,200","৳ 8,900"
4,"Haier 32"" Android 11 HD Smart LED TV (H32K66GH...",https://www.daraz.com.bd/products/haier-32-and...,"৳ 28,900","৳ 18,990"


* DIS_price is price after discount and 
* pre_price is actual price.

## Pagination

Now we are trying to find common patterns for pagination. So, we just paste few pages link to observe the common pattern.

In [None]:
https://www.daraz.com.bd/televisions/?spm=a2a0e.searchlistcategory.pagination.1.752e3c08muG6MS
https://www.daraz.com.bd/televisions/?page=2&spm=a2a0e.searchlistcategory.pagination.1.752e3c08muG6MS
https://www.daraz.com.bd/televisions/?spm=a2a0e.searchlistcategory.pagination.3.68183c08F58cJk&page=3
https://www.daraz.com.bd/televisions/?spm=a2a0e.searchlistcategory.pagination.4.6ba73c08Y7tHn5&page=4
https://www.daraz.com.bd/televisions/?spm=a2a0e.searchlistcategory.pagination.5.54f43c08l5EoQq&page=5


Now we are trying to get  values from all the pages.

In [22]:
all_tvs = []
for page in range(1,46):
    url = f'https://www.daraz.com.bd/televisions/?spm=a2a0e.searchlistcategory.pagination.5.54f43c08l5EoQq&page={page}' 
    driver.get(url)
    sleep(5)
    tvs = driver.find_elements('xpath','//div[@data-spm="sku"]/div') 
    for tv_idx , tv in enumerate(tvs,1):
        title = driver.find_element('xpath',f'(//div[@class="title--wFj93"]/a)[{tv_idx}]')
        dis_price = driver.find_element('xpath',f'(//div[@class="price--NVB62"]/span)[{tv_idx}]')
        try:
            pre_price = driver.find_element('xpath', f'(//del[@class="currency--GVKjl"])[{tv_idx}]').text
        except:
            pre_price = " "    
        data = {
            'title' : title.text,
            "url" : title.get_attribute('href'),
            'DIS_price' : dis_price.text,
            'pre_price' : pre_price
            
        }
        all_tvs.append(data)

In [23]:
df = pd.DataFrame(all_tvs)
df.to_csv("darza_Televisions.csv", index = False)

In [25]:
df.shape

(1800, 4)

We can see that we have got 1800 values from 45 pages.