## Web Scraping Bicycle e-commerce Site

**Álvaro Rivera Arcelus**<br>
*Economics + Data Analytics*

### About the the company

**Chain Reaction**<br>
*   Chain Reaction is a british bycycle online store. They sell all types of bicycles and cycling apparel from more then 100 high and medium quality brands
*   ***Type of site***: dynamic site
*   ***URL***: https://www.chainreactioncycles.com/

### Scraping methodology

As the site is a dynamic website (i.e; presence of JavaScript) we are not able to apply conventional packages such as *Beautiful Soup*.<br>
If we try to do so, we tend to obtain less information than the one that we as users see on our screen. <br>
This is commonly caused by function that developers implement on sites in order to improve loading times. (JS Lazy Loading)<br><br>
**Solution:** use *Selenium package* which acts as a "bot" by crawling on the site

We wil be gathering the following information about their bikes:<br>
*   Regular price 
*   Sale price 
*   Product description

**Importing libraries**

In [81]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd
import numpy as np

**Scraping site**

In [82]:
#creating main df where we'll concatenate the final data
main_df = pd.DataFrame(columns=['name','reg_price','price'])

#function that adds all the scraped items into a single array 
def createArray(elements):
    array =[]
    for element in elements: 
        array.append(element.text)
    return array

#using firefox driver to crawl on site
driver = webdriver.Firefox(executable_path=r'C:\Program Files (x86)\geckodriver.exe')

def addNewBike(url):
    driver.get(url)

    price_elem = driver.find_elements(By.CLASS_NAME, "fromamt")
    reg_price_elem = driver.find_elements(By.CLASS_NAME, "rrpamount")
    name_elem = driver.find_elements(By.CLASS_NAME, "description")

    reg_price = createArray(reg_price_elem)
    price = createArray(price_elem)
    name = createArray(name_elem)

    #in case the last bikes didn't have a discount, we need to make all arrays the same length
    #we add a "none" string to missing data
    if len(price)>len(reg_price):
        n = abs(len(price)-len(reg_price))
        #adding to null values a zero
        reg_price = np.concatenate((reg_price,np.repeat('none',n)), axis=None)

    df = pd.DataFrame()
    df['reg_price'] = reg_price
    df['price'] = price
    df['name'] = name

    #we must explicitly say that the variable is global 
    global main_df
    #concatenating new data to dataframe 
    main_df = pd.concat((main_df, df), axis=0)

    return main_df

#links we want to crawl on 
links = ['https://www.chainreactioncycles.com/road-bikes?f=2231&sort=discount', 
        'https://www.chainreactioncycles.com/mountain-bikes?f=2232&sort=discount',
        'https://www.chainreactioncycles.com/bmx-bikes?f=2240&sort=discount',
        'https://www.chainreactioncycles.com/folding-bikes?f=2249&sort=discount',
        'https://www.chainreactioncycles.com/hybrid-city-bikes?f=2234&sort=discount']

#scrapping and concatenating all the data in all the required links at once 
for link in links: main_df = addNewBike(link)


driver.quit()

  driver = webdriver.Firefox(executable_path=r'C:\Program Files (x86)\geckodriver.exe')


Visualizing results (first 50 rows)

In [84]:
print('------------'*3)
print('Number of products: '+str(len(main_df)))
print('------------'*3)

------------------------------------
Number of products: 196
------------------------------------


In [83]:
main_df.reset_index().head(50)

Unnamed: 0,index,name,reg_price,price
0,0,Fuji Sportif 2.3 Road Bike 2022,RRP £849.99,£424.99
1,1,Fuji Sportif 2.1 Road Bike 2022,RRP £1049.99,£524.99
2,2,Fuji Gran Fondo 1.1 Road Bike 2021,RRP £3399.99,£1699.99
3,3,Fuji Gran Fondo 1.3 Road Bike 2021,RRP £2599.99,£1299.99
4,4,Rondo RUUT AL 1 Gravel Bike 2022,RRP £2199.99,£1199.99 - £1429.99
5,5,Rondo RUUT AL 2 Gravel Bike 2022,RRP £1799.99,£999.99
6,6,Kona Rove AL 650 SE Gravel Bike 2022,RRP £999.99,£649.99
7,7,Vitus ZX-1 EVO CR eTap AXS Road Bike (Rival),RRP £3899.99,£2729.99
8,8,Octane One Kode ADV Commuter Road Bike 2022,RRP £1139.99,£799.99
9,9,Cube Axial WS Race Road Bike 2022,RRP £1748.99,£1299.99
