# Project 3: Laptop in Jarir Bookstore Web Scraping 

In this project I have made a web scraping where I extracted laptop data from Jarir bookstore website https://www.jarir.com/sa-en/computers-peripherals/laptops.html in Saudi Arabia whereby the data is extracted and saved to a local file in my computer, then I cleaned the data and publish it in Kaggle https://www.kaggle.com/aljadelruayshid/jarir-laptop


## Problem Statment

The main purpose of this study is to find the important attributes that affected the price of laptop such as brand, graphics_card, display_size, operating_system, color, width, height, weight, depth, generation/release, capacity, operating_system_architecture, RAM, touch_display, fingerprint, and to find different  inducing factors that influence consumer preferences as well as the buying decision process. Alos, to increas brand awareness for laptop brands in market.

##  Data Dictionary

|Feature|Type|Description|
|---|---|---|
|Item Title|Object|It is a type of laptop like:MateBook D Volta,Inspiron 15 3567|
|Brand|Object|It is a compant of laptop|
|Price|Float|It is a price of laptop|
|Graphics Card|Object| It is a type of graphics card like Intel UHD Graphics 617|
|Display Size|Object|It is a size of laptop scren|
|Operating System| Object |It is a type of opreating system like windows or macOS Mojave |
|Color| Object | It is a color of laptop |
|Width| Object | It is a width of laptop |
|Height| Object | It is a height of laptop |
|Weight|Object|It is a weight of laptop|
|Depth|Object|It is a depth of laptop|
|Generation/Release|Object|year of releasing |
|Capacity|Object|It is capacity of laptop like 256 GB SSD |
|Operating System Architecture|Object| It is a type of operating system architecture like 64 bit |
|RAM| Object | A type of RAM |
|Touch Display| Object | Reflect if laptop touch display or not |
|Fingerprint| Object | Reflect if laptop Fingerprint or not |


In [2]:
from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd


In [3]:
df = pd.DataFrame()

In [4]:
features = [
    'item_title',
    'brand',
    'price',
    'graphics_card',
    'display_size',
    'operating_system',
    'color',
    'width',
    'height',
    'weight',
    'depth',
    'generation/release',
    'capacity',
    'operating_system_architecture',
    'RAM',
    'touch_display',
    'fingerprint'
    
]
pages_count = 20  # scrolls down count  
driver = webdriver.Chrome(executable_path='chromedriver/chromedriver')  # initiate the browser
driver.get('https://www.jarir.com/sa-en/computers-peripherals/laptops.html')  # open url with intiated browser
sleep(3)

In [5]:
# scroll down for 20 times and click the load more button
for i in range(pages_count):
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight - 1000)")
    sleep(1)
    try:
        button = driver.find_elements_by_class_name('amscroll-load-button')[0]
        button.click()
    except IndexError:  # button not exists, (pages ended)
        break  # stop the loop
    sleep(2)

In [6]:
html = driver.page_source  # get htnl source to pass html source to bs4 

In [7]:
response = BeautifulSoup(html,'html.parser')  # pass html code to bs4

In [8]:
items = response.find_all('li', attrs={'class': 'item'})  # fin all <li> elements that has class='item' (products)
len(items)

211

In [9]:
urls = [item.find_all('a')[0]['href'] for item in items]  # get the url of each product in items
urls[0:3]

['https://www.jarir.com/sa-en/computers-peripherals/laptops/huawei-matebook-d-laptops-512251.html',
 'https://www.jarir.com/sa-en/computers-peripherals/laptops/dell-inspiron-3567-laptops-513837.html',
 'https://www.jarir.com/sa-en/computers-peripherals/laptops/apple-macbook-air-laptops-516830.html']

In [10]:
def collect_product_data(url):
    print(url) # print the url of product
    product_data = {} # initiate product data key-value more easy to access product data 
    for f in features: # initiate the product data features
        product_data[f] = ''
    driver.get(url) # open product url
    resp = BeautifulSoup(driver.page_source, 'html.parser') # pass product source page to bs4
    # assign values to product keys
    product_data['brand'] = resp.find('div', attrs={'class': 'product-title__brand'}).text.strip('\n').strip()
    #product_data['title'] = resp.find('h1', attrs={'class': 'product-title__title'}).text.strip('\n').strip()
    try:
        product_data['price'] = round(float(resp.find('meta', attrs={'itemprop': 'price'})['content']), 2)
    # some products have no content attribute , so we used an exception here
    except TypeError:
        try:
            # get the price from text of html tag, not from it's attribute (content)
            product_data['price'] = round(float(resp.find('div', attrs={'class': 'price'}).text.strip('\n').strip().replace('SR', '')), 2)
        # some products has no price so we put it as -  
        except AttributeError:
            product_data['price'] = '-'
    # get products rows data , each row has no (ID) for the tag , so we cant get the specific info
    # we get all info and then check if this infos exists in our featrue list that we initiate above
    rows_info = resp.find_all('tr', attrs={'class': 'table__row'})
    for i in rows_info:
        info_title = i.find('th').text.strip('\n').strip().replace(' ', '_')
        info_value = i.find('td').text.strip('\n').strip()
        if(info_title in features):
            product_data[info_title] = info_value
    #return the dictionary has all product infos
    return product_data

In [11]:
def add_to_df(product):
    global df
    df = df.append(product, ignore_index=True)

In [12]:
for url in urls:
    add_to_df(collect_product_data(url))
    sleep(5)

https://www.jarir.com/sa-en/computers-peripherals/laptops/huawei-matebook-d-laptops-512251.html
https://www.jarir.com/sa-en/computers-peripherals/laptops/huawei-matebook-d-laptops-512251.html
https://www.jarir.com/sa-en/computers-peripherals/laptops/dell-inspiron-3567-laptops-513837.html
https://www.jarir.com/sa-en/computers-peripherals/laptops/dell-inspiron-3567-laptops-513837.html
https://www.jarir.com/sa-en/computers-peripherals/laptops/apple-macbook-air-laptops-516830.html
https://www.jarir.com/sa-en/computers-peripherals/laptops/apple-macbook-air-laptops-516830.html
https://www.jarir.com/sa-en/computers-peripherals/laptops/huawei-matebook-d-laptops-503385.html
https://www.jarir.com/sa-en/computers-peripherals/laptops/huawei-matebook-d-laptops-503385.html
https://www.jarir.com/sa-en/computers-peripherals/laptops/huawei-x-pro-laptops-503829.html
https://www.jarir.com/sa-en/computers-peripherals/laptops/huawei-x-pro-laptops-503829.html
https://www.jarir.com/sa-en/computers-peripheral

http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-notebook-15-laptops-534010.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-notebook-15-laptops-534010.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-notebook-14-laptops-534011.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-notebook-14-laptops-534011.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-notebook-14-laptops-534026.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-notebook-14-laptops-534026.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-inspiron-14-3000-laptops-534434.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-inspiron-14-3000-laptops-534434.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-inspiron-14-3000-laptops-534443.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-inspiron-14-3000-laptops-534443.html
http://www.jarir.com/sa-en/computers-peripherals/lap

http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-xps-13-9380-laptops-523133.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-xps-13-9380-laptops-523133.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/acer-aspire-5-a515-54g-laptops-529126.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/acer-aspire-5-a515-54g-laptops-529126.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-xps13-9360-laptops-511301.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-xps13-9360-laptops-511301.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-vostro-5471-laptops-511308.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-vostro-5471-laptops-511308.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-inspiron-13-5370-laptops-513844.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-inspiron-13-5370-laptops-513844.html
http://www.jarir.com/sa-en/computers-per

http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-stream-14-cb003nx-laptops-512459.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-stream-14-cb003nx-laptops-512459.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/apple-macbook-pro-laptops-470003.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/apple-macbook-pro-laptops-470003.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/huawei-matebook-d-laptops-503787.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/huawei-matebook-d-laptops-503787.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-15-da0035nx-laptops-510092.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-15-da0035nx-laptops-510092.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/apple-macbook-pro-laptops-484772.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/apple-macbook-pro-laptops-484772.html
http://www.jarir.com/sa-en/computers-peripherals/l

http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-vostro-5481-laptops-517811.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-vostro-5481-laptops-517811.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-xps-13-9360-laptops-517822.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/dell-xps-13-9360-laptops-517822.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-15-cs2011nx-laptops-530835.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/hp-15-cs2011nx-laptops-530835.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/asus-zenbook-ux430un-laptops-491771.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/asus-zenbook-ux430un-laptops-491771.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/lenovo-330s-14ikb-laptops-512831.html
http://www.jarir.com/sa-en/computers-peripherals/laptops/lenovo-330s-14ikb-laptops-512831.html
http://www.jarir.com/sa-en/computers-peripherals/lapto

In [13]:
df.to_csv('jarir-laptops.csv', encoding='utf-8-sig', index=False)
df.head()

Unnamed: 0,RAM,brand,capacity,color,depth,display_size,fingerprint,generation/release,graphics_card,height,item_title,operating_system,operating_system_architecture,price,touch_display,weight,width
0,8 GB RAM,Huawei,256 GB SSD,Silver,22.10 cm ( 8.70 in ),"14""",Yes,2018,NVIDIA GeForce MX150 (2 GB),1.49 cm ( .59 in ),MateBook D Volta,Windows 10,64 bit,3799,No,1.29 kg ( 2.84 lb ),32.30 cm ( 12.72 in )
1,4 GB RAM,Dell,1 TB HDD,Black,260.30 mm ( 10.25 in ),"15.6""",,2018,Intel HD Graphics 620,23.65 mm ( .93 in ),Inspiron 15 3567,Windows 10,64 bit,1849,No,2.30 kg ( 5.07 lb ),380.00 mm ( 14.96 in )
2,8 GB RAM,Apple,128 GB (PCIe SSD),Gold,21.24 cm ( 8.36 in ),"13.3""",Yes,2018,Intel UHD Graphics 617,1.56 cm ( .61 in ),MacBook Air (Retina),macOS Mojave,Not Specified,5199,No,1.25 kg ( 2.76 lb ),30.41 cm ( 11.97 in )
3,8 GB RAM,Huawei,128 GB SSD/1 TB HDD,Grey,23.90 cm ( 9.41 in ),"15.6""",,2018,NVIDIA GeForce MX150 (2 GB),1.69 cm ( .67 in ),MateBook D,Windows 10,64 bit,2799,No,1.90 kg ( 4.19 lb ),35.80 cm ( 14.09 in )
4,8 GB RAM,Huawei,256 GB NVMe M.2 SSD,Mystic Silver,21.70 cm ( 8.54 in ),"13.88""",Yes,2018,NVIDIA GeForce MX150 (2 GB),1.46 cm ( .57 in ),MateBook X Pro,Windows 10,64 bit,5999,Yes,1.29 kg ( 2.84 lb ),30.40 cm ( 11.97 in )


In [14]:
df['operating_system'].count()

211

In [15]:
driver.close()

In [128]:
df2 = df.copy()

In [129]:
df2.isnull().sum()

RAM                              0
brand                            0
capacity                         0
color                            0
depth                            0
display_size                     0
fingerprint                      0
generation/release               0
graphics_card                    0
height                           0
item_title                       0
operating_system                 0
operating_system_architecture    0
price                            0
touch_display                    0
weight                           0
width                            0
dtype: int64

In [130]:
df2.dtypes

RAM                              object
brand                            object
capacity                         object
color                            object
depth                            object
display_size                     object
fingerprint                      object
generation/release               object
graphics_card                    object
height                           object
item_title                       object
operating_system                 object
operating_system_architecture    object
price                            object
touch_display                    object
weight                           object
width                            object
dtype: object

In [131]:
for c in df2.columns:
    df2[c][df2[c] == ''] = 'NA'
df2['price'][df2['price'] == '-'] = 0
df2['price'][df2['price'] == 0]
df2['height'][df2['height'] == 'NA'] = 0

In [132]:
df2

Unnamed: 0,RAM,brand,capacity,color,depth,display_size,fingerprint,generation/release,graphics_card,height,item_title,operating_system,operating_system_architecture,price,touch_display,weight,width
0,8 GB RAM,Huawei,256 GB SSD,Silver,22.10 cm ( 8.70 in ),"14""",Yes,2018,NVIDIA GeForce MX150 (2 GB),1.49 cm ( .59 in ),MateBook D Volta,Windows 10,64 bit,3799,No,1.29 kg ( 2.84 lb ),32.30 cm ( 12.72 in )
1,4 GB RAM,Dell,1 TB HDD,Black,260.30 mm ( 10.25 in ),"15.6""",,2018,Intel HD Graphics 620,23.65 mm ( .93 in ),Inspiron 15 3567,Windows 10,64 bit,1849,No,2.30 kg ( 5.07 lb ),380.00 mm ( 14.96 in )
2,8 GB RAM,Apple,128 GB (PCIe SSD),Gold,21.24 cm ( 8.36 in ),"13.3""",Yes,2018,Intel UHD Graphics 617,1.56 cm ( .61 in ),MacBook Air (Retina),macOS Mojave,Not Specified,5199,No,1.25 kg ( 2.76 lb ),30.41 cm ( 11.97 in )
3,8 GB RAM,Huawei,128 GB SSD/1 TB HDD,Grey,23.90 cm ( 9.41 in ),"15.6""",,2018,NVIDIA GeForce MX150 (2 GB),1.69 cm ( .67 in ),MateBook D,Windows 10,64 bit,2799,No,1.90 kg ( 4.19 lb ),35.80 cm ( 14.09 in )
4,8 GB RAM,Huawei,256 GB NVMe M.2 SSD,Mystic Silver,21.70 cm ( 8.54 in ),"13.88""",Yes,2018,NVIDIA GeForce MX150 (2 GB),1.46 cm ( .57 in ),MateBook X Pro,Windows 10,64 bit,5999,Yes,1.29 kg ( 2.84 lb ),30.40 cm ( 11.97 in )
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
206,4 GB RAM,Asus,128 GB M.2 SSD,Silver,21.00 cm ( 8.27 in ),"14""",,2019,Intel UHD Graphics 620,1.77 cm ( .70 in ),VivoBook 14 X420UA,Windows 10 S,64 bit,1699,No,1.40 kg ( 3.09 lb ),32.20 cm ( 12.68 in )
207,8 GB RAM,Apple,128 GB SSD,Silver,21.24 cm ( 8.36 in ),"13.3""",,2019,Intel Iris Plus Graphics 645,1.49 cm ( .59 in ),MacBook Pro (Retina + Touch Bar),macOS Mojave,Not Specified,5649,No,1.37 kg ( 3.02 lb ),30.41 cm ( 11.97 in )
208,8 GB RAM,Apple,256 GB SSD,Silver,21.24 cm ( 8.36 in ),"13.3""",,2019,Intel Iris Plus Graphics 645,1.49 cm ( .59 in ),MacBook Pro (Retina + Touch Bar),macOS Mojave,Not Specified,6499,No,1.37 kg ( 3.02 lb ),30.41 cm ( 11.97 in )
209,16 GB RAM,Asus,512 GB NVMe M.2 SSD,Transparent Silver,21.30 cm ( 8.39 in ),"14""",,2019,NVIDIA GeForce MX250 (2 GB),1.80 cm ( .71 in ),VivoBook S14 S431,Windows 10,64 bit,3999,No,1.40 kg ( 3.09 lb ),32.30 cm ( 12.72 in )


In [133]:
df2['price'] = df2['price'].astype(float)
df2.dtypes

RAM                               object
brand                             object
capacity                          object
color                             object
depth                             object
display_size                      object
fingerprint                       object
generation/release                object
graphics_card                     object
height                            object
item_title                        object
operating_system                  object
operating_system_architecture     object
price                            float64
touch_display                     object
weight                            object
width                             object
dtype: object

In [135]:
df2.head(20)

Unnamed: 0,RAM,brand,capacity,color,depth,display_size,fingerprint,generation/release,graphics_card,height,item_title,operating_system,operating_system_architecture,price,touch_display,weight,width
0,8 GB RAM,Huawei,256 GB SSD,Silver,22.10 cm ( 8.70 in ),"14""",Yes,2018,NVIDIA GeForce MX150 (2 GB),1.49 cm ( .59 in ),MateBook D Volta,Windows 10,64 bit,3799.0,No,1.29 kg ( 2.84 lb ),32.30 cm ( 12.72 in )
1,4 GB RAM,Dell,1 TB HDD,Black,260.30 mm ( 10.25 in ),"15.6""",,2018,Intel HD Graphics 620,23.65 mm ( .93 in ),Inspiron 15 3567,Windows 10,64 bit,1849.0,No,2.30 kg ( 5.07 lb ),380.00 mm ( 14.96 in )
2,8 GB RAM,Apple,128 GB (PCIe SSD),Gold,21.24 cm ( 8.36 in ),"13.3""",Yes,2018,Intel UHD Graphics 617,1.56 cm ( .61 in ),MacBook Air (Retina),macOS Mojave,Not Specified,5199.0,No,1.25 kg ( 2.76 lb ),30.41 cm ( 11.97 in )
3,8 GB RAM,Huawei,128 GB SSD/1 TB HDD,Grey,23.90 cm ( 9.41 in ),"15.6""",,2018,NVIDIA GeForce MX150 (2 GB),1.69 cm ( .67 in ),MateBook D,Windows 10,64 bit,2799.0,No,1.90 kg ( 4.19 lb ),35.80 cm ( 14.09 in )
4,8 GB RAM,Huawei,256 GB NVMe M.2 SSD,Mystic Silver,21.70 cm ( 8.54 in ),"13.88""",Yes,2018,NVIDIA GeForce MX150 (2 GB),1.46 cm ( .57 in ),MateBook X Pro,Windows 10,64 bit,5999.0,Yes,1.29 kg ( 2.84 lb ),30.40 cm ( 11.97 in )
5,8 GB RAM,Apple,128 GB SSD,Space Gray,21.24 cm ( 8.36 in ),"13.3""",,2017,Intel Iris Plus Graphics 640,1.49 cm ( .59 in ),MacBook Pro (Retina),macOS Sierra,,10199.0,,1.37 kg ( 3.02 lb ),30.41 cm ( 11.97 in )
6,8 GB RAM,Apple,128 GB (PCIe Flash),Silver,22.70 cm ( 8.94 in ),"13.3""",,2017,Intel HD Graphics 6000,1.70 cm ( .67 in ),MacBook Air,macOS Sierra,,5199.0,,1.35 kg ( 2.98 lb ),32.50 cm ( 12.80 in )
7,8 GB RAM,Apple,128 GB (PCIe Flash),Silver,22.70 cm ( 8.94 in ),"13.3""",,2017,Intel HD Graphics 6000,1.70 cm ( .67 in ),MacBook Air,macOS Sierra,Not Specified,4799.0,No,1.35 kg ( 2.98 lb ),32.50 cm ( 12.80 in )
8,16 GB RAM,HP,1 TB PCIe NVMe M.2 SSD,Natural Silver,1.49 cm ( .59 in ),"13.3""",,2018,NVIDIA GeForce MX150 (2 GB),21.20 cm ( 8.35 in ),ENVY 13-ah0002nx,Windows 10,64 bit,5349.0,No,1.30 kg ( 2.87 lb ),30.70 cm ( 12.09 in )
9,8 GB RAM,Microsoft,256 GB SSD,Platinum,223.20 mm ( 8.79 in ),"13.5""",,2018,Intel UHD Graphics 620,14.47 mm ( .57 in ),Surface 2,Windows 10 S,64 bit,5799.0,Yes,1.25 kg ( 2.76 lb ),308.02 mm ( 12.13 in )


In [None]:
#def arch(x):
#    if(x in ['Not Specified']):
#        return NA
#    else:
#        return 64


#df2['weight'] = df2['weight'].apply(lambda x: str(x).split(' ')[0])
#df2['weight'] = df2['weight'].astype(float)
#df2['height'] = df2['height'].apply(lambda x: str(x).split(' ')[0])
#df2['height'] = df2['height'].astype(float)
#df2['depth'] = df2['depth'].apply(lambda x: str(x).split(' ')[0])
#df2['depth'] = df2['depth'].astype(float)
#df2['width'] = df2['width'].apply(lambda x: str(x).split(' ')[0])
#df2['width'] = df2['width'].astype(float)
#df2['display_size'] = df2['display_size'].apply(lambda x: str(x)[:-1])
#df2['display_size'] = df2['display_size'].astype(float)
#df2['operating_system_architecture'] = df2['operating_system_architecture'].apply(arch)
#df2['display_size'] = df2['display_size'].astype(float)
#df2['RAM'] = df2['RAM'].apply(lambda x: str(x).split(' ')[0])
#df2['RAM'] = df2['RAM'].astype(float)

In [136]:
df2.to_csv('final_data.csv', encoding='utf-8-sig', index=False)