# Pulsifi Task 1 Solutions

1. Choose a E-commerce website, such as zalora.com, lazada.com, and iherb.com and scrape the products catalogue.
2. Store the product details such as product name, product category and product price into a database.
3. Implement a search function such that given a name, it will return all relevant and similar products.
4. Find out what are the distributions of prices across different categories.
(optional)
5. Repeat the process for another website, and find out what are the products that appear on both websites. (They might have slightly different names)


In [1]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

import pandas as pd
import dbm

In [2]:
driver = webdriver.Chrome(executable_path='chromedriver')
driver.get('https://www.lazada.com.my/#')
timeout = 30
try:
    WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.ID, "Level_1_Category_No1")))
except TimeoutException:
    driver.quit()

In [3]:
# find_elements by ID returns an array of selenium objects.
category_element = driver.find_element(By.ID,'Level_1_Category_No1').text;
print("category element ",category_element)

category element  Electronic Devices


In [4]:
# Take the list of li in the ul
list_category_elements = driver.find_element(By.XPATH,'//*[@id="J_8636740892"]/div/ul')
links = list_category_elements.find_elements(By.CLASS_NAME,"lzd-site-menu-root-item")
print('length of links are: ', len(links))
for i in range(len(links)):
    print("element in list ",links[i].text)

length of links are:  12
element in list  Electronic Devices
element in list  Electronic Accessories
element in list  TV & Home Appliances
element in list  Health & Beauty
element in list  Babies & Toys
element in list  Groceries & Pets
element in list  Home & Lifestyle
element in list  Women's Fashion
element in list  Men's Fashion
element in list  Fashion Accessories
element in list  Sports & Lifestyle
element in list  Automotive & Motorcycles


In [5]:
# Clicking toys menu to find the right one
# You might receive error if you just perform element.click(). This is due to the element might not actionable according to DOM
element = driver.find_elements_by_class_name('J_ChannelsLink')[3]
webdriver.ActionChains(driver).move_to_element(element).click(element).perform()

In [6]:
# Once we are in, let us extract all of the product elements then names
product_titles = driver.find_elements_by_class_name('best-seller-container')
for title in product_titles:
    print(title.text)

Selected For You
GO
Cartoon Unicorn PVC Transparent Document Bag Water-Proof Zipper Pouch File Folder Stationery Organizer 21*17CM / 8.3*6.7Inch
RM
2.50
RM 4.00
-38%
Dansunreve 1 piece Pillowcase Cheap And Comfortable Sarung Bantal Solid Color Pillow cover Cover Plain & Hotel
RM
3.90
RM 7.90
-51%
[12.12] [Ready Stock] 45*45cm Leaves Pattern Throw Pillow Cover Soft Linen Cotton Pillowcase Square Cushion Cover
RM
5.90
RM 12.00
-51%
[12.12 Flash Sale] Branches DVB T2 Decoder DVB T2 Receiver MyTV Decoder DVBT2 TV Dekoder Youtube Receiver Satellite
RM
38.60
RM 87.40
-56%
[Christmas Gift For Kids] Threetrees 8.5 inch LCD drawing pad With Accessory Kit, Writing tablet,office writing board, office blackboard, digital drawing pad ,Good Gift For Kids,Use for Home,Office
RM
15.80
RM 28.00
-44%
[12.12 Crazy Brand Mega Offers]UGREEN 1 Meter Micro USB Cable USB 2.0 Sync & Charging Cable for for Samsung, Nokia ,Huawei,Oppo,Vivo,Xiaomi,HTC, Blackberry, Tablet PC, and Most Android Tablets, Android Phon

In [7]:
product_containers = driver.find_elements_by_class_name('best-seller-item-content')
product_titles = list()
pack_sizes = list()
product_prices = list()
rating_counts = list()

In [8]:
for container in product_containers:
    product_titles.append(container.find_element_by_class_name('best-seller-item-title-text').text)
    product_prices.append(container.find_element_by_class_name('best-seller-item-price').text)

data = {'product_title': product_titles, 'product_price': product_prices}
df_product = pd.DataFrame.from_dict(data)

print(df_product.head())



                                       product_title product_price
0  Cartoon Unicorn PVC Transparent Document Bag W...      RM\n2.50
1  Dansunreve 1 piece Pillowcase Cheap And Comfor...      RM\n3.90
2  [12.12] [Ready Stock] 45*45cm Leaves Pattern T...      RM\n5.90
3  [12.12 Flash Sale] Branches DVB T2 Decoder DVB...     RM\n38.60
4  [Christmas Gift For Kids] Threetrees 8.5 inch ...     RM\n15.80


In [9]:
df_product.to_csv("product_my.csv")

In [10]:
def search(keyword, df):
    search = '|'.join(keyword)
    searched = df[df['product_title'].str.contains(search, na=False)]
    return searched

In [11]:
df2 = search(["Phone"], df_product)
df2.head()

Unnamed: 0,product_title,product_price
5,[12.12 Crazy Brand Mega Offers]UGREEN 1 Meter ...,RM\n6.49
13,Sanptoch Shockproof Armor Case For iPhone 11 P...,RM\n13.99
14,Luxury Shockproof Scrub PC Phone Case For iPho...,RM\n13.99
15,For iPhone 11 Pro X Xs Max XR 8 7 6 6s Plus Ph...,RM\n23.90
16,Air Shockproof Phone Silicone Case for iPhone ...,RM\n1.71
