# Pulsifi Coding Test Question 1 Solutions

1. Choose a E-commerce website, such as zalora.com, lazada.com, and iherb.com and scrape the products catalogue.
2. Store the product details such as product name, product category and product price into a database.
3. Implement a search function such that given a name, it will return all relevant and similar products.
4. Find out what are the distributions of prices across different categories.
(optional)
5. Repeat the process for another website, and find out what are the products that appear on both websites. (They might have slightly different names)


This solution is taken from https://towardsdatascience.com/in-10-minutes-web-scraping-with-beautiful-soup-and-selenium-for-data-professionals-8de169d36319


In [1]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

import pandas as pd
import dbm

In [2]:
driver = webdriver.Chrome(executable_path='chromedriver')
driver.get('https://www.lazada.sg/#')
timeout = 30
try:
    WebDriverWait(driver, timeout).until(EC.visibility_of_element_located((By.ID, "Level_1_Category_No1")))
except TimeoutException:
    driver.quit()

In [3]:
# find_elements by ID returns an array of selenium objects.
category_element = driver.find_element(By.ID,'Level_1_Category_No1').text;
print("category element ",category_element)

category element  Electronic Devices


In [4]:
# Take the list of li in the ul
list_category_elements = driver.find_element(By.XPATH,'//*[@id="J_icms-5000498-1511516689962"]/div/ul')
links = list_category_elements.find_elements(By.CLASS_NAME,"lzd-site-menu-root-item")
print('length of links are: ', len(links))
for i in range(len(links)):
    print("element in list ",links[i].text)

length of links are:  12
element in list  Electronic Devices
element in list  Electronic Accessories
element in list  TV & Home Appliances
element in list  Health & Beauty
element in list  Babies & Toys
element in list  Groceries & Pets
element in list  Home & Lifestyle
element in list  Women's Fashion
element in list  Men's Fashion
element in list  Watches & Bags
element in list  Sports & Lifestyle
element in list  Automotive & Motorcycle


In [5]:
# Clicking toys menu to find the right one
# You might receive error if you just perform element.click(). This is due to the element might not actionable according to DOM
element = driver.find_elements_by_class_name('J_ChannelsLink')[2]
webdriver.ActionChains(driver).move_to_element(element).click(element).perform()

In [6]:
# Once we are in, let us extract all of the product elements then names
product_titles = driver.find_elements_by_class_name('title')
for title in product_titles:
    print(title.text)




















Best Deals In Town
Lemnos Cream Cheese With Cracked Pepper
Tradition Scottish Matured White Cheddar
Tradition UK Extra Matured Cheddar
Bannister Downs All Lite Fresh Milk - Australia - Chilled
Bannister Downs Farm Fresh Milk - Australia - Chilled
CocoMax 100% Coconut Water - Case
12 Best Top Picks
Dove Beauty Nourishing Moisture Body Wash
Knorr All In One Seasoning
Enfamil A+ Stage 2 Infant Formula Baby Milk Powder (6M+)
Naturel Canola Oil
Dettol 4-in-1 Disinfectant Multi Surface Cleaner Lavender 2.5L
NESCAFE GOLD Pure Soluble Coffee 200g (NEW)
This Week's Fresh Selection
Korean Jeju Mandarin Oranges
Red Seedless Grapes (Air-Flown)
Green Seedless Grapes (Air-Flown)
Prime Asia 2+1 Hass Avocados
RedMart Korean Maehyang Strawberries
GIVVO Broccolini
All Things Christmas
Kaviari Lumpfish Roe Black
Woodmansterne Pack Of 5 Cards - Stacks Of Presents
Sainsbury's Taste The Difference Iced Fruit Cake Bar - Christmas Special
Artwrap Christmas Boxed Cards
Casillero del Diablo C

In [7]:
product_containers = driver.find_elements_by_class_name('product_container')

product_titles = list()
pack_sizes = list()
product_prices = list()
rating_counts = list()

In [8]:
for container in product_containers:
    product_titles.append(container.find_element_by_class_name('title').text)
    pack_sizes.append(container.find_element_by_class_name('pack_size').text)
    product_prices.append(container.find_element_by_class_name('product_price').text)
    rating_counts.append(container.find_element_by_class_name('ratings_count').text)

data = {'product_title': product_titles, 'pack_size': pack_sizes,'product_price': product_prices, 'rating_count': rating_counts}
df_product = pd.DataFrame.from_dict(data)

print(df_product.head())



                                       product_title pack_size product_price  \
0            Lemnos Cream Cheese With Cracked Pepper     125 g         $4.12   
1           Tradition Scottish Matured White Cheddar     250 g         $9.31   
2                 Tradition UK Extra Matured Cheddar     250 g         $9.85   
3  Bannister Downs All Lite Fresh Milk - Australi...       1 L         $5.94   
4  Bannister Downs Farm Fresh Milk - Australia - ...       1 L         $5.94   

  rating_count  
0          (0)  
1          (1)  
2          (0)  
3          (0)  
4          (0)  


In [9]:
df_product.to_csv("product_sg.csv")