# Webscraping with selenium

As part of a personal study project, I wanted to retrieve data from product sheets from the sub-category Running > Clothing of the website go-sport

Start page for scraping : https://www.go-sport.com/running/vetements/


The steps of my study:
1. Test the different Selenium functionalities and target the elements to be scrapped in the HTML code of the target page.

2. Configure the scraping with Selenium (including the different elements to retrieve).

3. Create a function to automate the scrapping of the following elements: breadcrumb + page title + images + prices + product characteristics + reviews (if any). Each element will be stored temporarily in a variable.

4. Organize the data in columns / rows (one row per product, one column per variable), then export in csv format.


Translated with www.DeepL.com/Translator (free version)

In [2]:
#!pip install selenium

Collecting selenium
  Downloading selenium-4.4.2-py3-none-any.whl (985 kB)
     ------------------------------------- 986.0/986.0 kB 15.7 MB/s eta 0:00:00
Collecting trio-websocket~=0.9
  Downloading trio_websocket-0.9.2-py3-none-any.whl (16 kB)
Collecting trio~=0.17
  Downloading trio-0.21.0-py3-none-any.whl (358 kB)
     ------------------------------------- 359.0/359.0 kB 11.3 MB/s eta 0:00:00
Collecting certifi~=2021.10.8
  Downloading certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
     ---------------------------------------- 149.2/149.2 kB ? eta 0:00:00
Collecting sortedcontainers
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Collecting outcome
  Downloading outcome-1.2.0-py2.py3-none-any.whl (9.7 kB)
Collecting async-generator>=1.9
  Downloading async_generator-1.10-py3-none-any.whl (18 kB)
Collecting wsproto>=0.14
  Downloading wsproto-1.1.0-py3-none-any.whl (24 kB)
Collecting PySocks!=1.5.7,<2.0,>=1.5.6
  Downloading PySocks-1.7.1-py3-none-any.whl (16 kB)


In [3]:
import os
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from datetime import datetime
from urllib import parse
from os import path
import json, pickle

In [81]:
# Storing the path to the webdriver in a variable to reuse it later
DRIVER_PATH = "/chromedriver.exe"

In [82]:
#defining the start page
BASE_URL = "https://www.go-sport.com/running/vetements/"

In [83]:
#here I initialize the webdriver. This tool will enable me to scrap.
driver = webdriver.Chrome(os.getcwd()+DRIVER_PATH)

  driver = webdriver.Chrome(os.getcwd()+DRIVER_PATH)


## config des étapes de nav avec Selenium

For your information, there are a lot of different methods offered by Selenium to find elements on a webpage.  Here are some examples:

- find_element(By.ID, "id")
- find_element(By.NAME, "name")
- find_element(By.XPATH, "xpath")
- find_element(By.LINK_TEXT, "link text")
- find_element(By.PARTIAL_LINK_TEXT, "partial link text")
- find_element(By.TAG_NAME, "tag name")
- find_element(By.CLASS_NAME, "class name")
- find_element(By.CSS_SELECTOR, "css selector")



Check the documentation for more informations. https://selenium-python.readthedocs.io/locating-elements.html#locating-elements

In [84]:
# Let's now request a specific URL
driver.get(BASE_URL)

In [10]:
driver.current_url,driver.title #checks the title of the curent page

('https://www.go-sport.com/running/vetements/',
 'Vetements Running - Au meilleur prix - GO Sport')

In [85]:
# Finding all buttons on the page : there are two
button_agree = driver.find_elements(By.CLASS_NAME, "onetrust-close-btn-handler")
button_agree
#https://selenium-python.readthedocs.io/locating-elements.html

[<selenium.webdriver.remote.webelement.WebElement (session="bccc1306526a9e8069ee1e17d7dcc050", element="e99224ce-969b-4672-923e-85691898512b")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bccc1306526a9e8069ee1e17d7dcc050", element="fbf49fa2-2289-4a1b-91e1-8278afdf3fab")>]

In [86]:
#when navigating this website, the first thing we encounter is the cookie pop up. To be abble to see the page, we first need to interact with the pop up.
button_agree[0].click() #here I click on one of the buttons ("Disagree" ;-p)

As we are starting from a category, we will find many products on the page. 
### reminder:

My goal is to scrap the different product sheets on this page.
So I'll have to configure my webdriver so that it goes on the different product sheets, that it retrieves and stores all the target elements in variables. It will then have to be able to turn around and go to the next product. 
To start, we will visit the first page and test the scraping. If the results are conclusive, we will be able to configure a function to repeat these same actions as many times as necessary


Translated with www.DeepL.com/Translator (free version)

In [87]:
products = driver.find_elements(By.CLASS_NAME, "pdp-link")
products

[<selenium.webdriver.remote.webelement.WebElement (session="bccc1306526a9e8069ee1e17d7dcc050", element="f961a016-4380-480c-a0b9-c64e8a5aa6cf")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bccc1306526a9e8069ee1e17d7dcc050", element="3384f6cd-b925-4dd9-8332-6b02f426bfaf")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bccc1306526a9e8069ee1e17d7dcc050", element="60fc3614-5704-4cc4-bd2e-0932264d1c4b")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bccc1306526a9e8069ee1e17d7dcc050", element="0a5c7678-1078-4ed7-8960-1c959700a418")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bccc1306526a9e8069ee1e17d7dcc050", element="28d05b83-f42e-4697-8bed-552155e81fef")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bccc1306526a9e8069ee1e17d7dcc050", element="4d5306bf-ce78-4ddc-80e1-9d12074bd9c2")>,
 <selenium.webdriver.remote.webelement.WebElement (session="bccc1306526a9e8069ee1e17d7dcc050", element="585615dd-4623-42f6-a575-37

In [14]:
len(products) #there are 24 produccts on the category page. Let's click on the 1st one.

24

In [62]:
products[0].click() #click on first product

### This will be the test for the first product

In [74]:
image = driver.find_elements(By.TAG_NAME, 'img')
image[7]

<selenium.webdriver.remote.webelement.WebElement (session="684099286749213f07b63a8c0b4ffc23", element="00b5455c-a588-4244-9c4c-3950ed4278cb")>

In [78]:
with open(driver.title+'.png', 'wb') as file:
    image = driver.find_elements(By.TAG_NAME, 'img')
    file.write(image[7].screenshot_as_png)

In [54]:
url =driver.current_url
url

'https://www.go-sport.com/running/vetements/'

In [19]:
arianne = driver.find_elements(By.CLASS_NAME, "c-breadcrumb")
arianne[0].text

'Vestes\nRunning\nLARY 300 SWEAT DOU M'

In [16]:
titles = driver.title
titles 

'LARY 300 SWEAT DOU M - Au meilleur prix - GO Sport'

In [17]:
#images = driver.find_element(By.TAG_NAME, 'img')
#images
#https://medium.com/geekculture/scraping-images-using-selenium-f35fab26b122
#https://www.youtube.com/watch?v=E9oZyg5Nifk&ab_channel=codepiep      coool

In [18]:
prices = driver.find_elements(By.CLASS_NAME, "value")[0]
prices.text

'34,99 €'

In [19]:
sizes = driver.find_elements(By.CLASS_NAME, "c-variation__attr-wrapper")[0]
sizes.text

'S\nM\nL\nXL\nXXL'

In [None]:
#model_id = driver.find_element_by_class_name("c-table-value").text
#brand
#color
#genre
#composition

In [39]:
#test product
product_desc = driver.find_elements(By.CLASS_NAME, "c-product-detail__description-short")[0]
product_desc.text

'Informations produit\n Veste de running homme - Athli-Tech\nCourez par tous les temps grace à cette veste qui vous protègera du froid et du vent.\nCapuche zippée et cordon de serrage au niveau du cou.\nInserts aux manches pour mettre vos pousses et vous protéger encore plus. '

In [27]:
model_id = driver.find_elements(By.CLASS_NAME, "c-table-value")[1]
model_id.text

'000000000001474102'

In [29]:
brand = driver.find_elements(By.CLASS_NAME, "c-table-value")[2]
brand.text

'ATHLI-TECH'

In [30]:
color = driver.find_elements(By.CLASS_NAME, "c-table-value")[3]
color.text

'MARRON'

In [31]:
genre = driver.find_elements(By.CLASS_NAME, "c-table-value")[4]
genre.text

'homme'

In [33]:
composition = driver.find_elements(By.CLASS_NAME, "c-table-value")[5]
composition.text

'100% polyester'

In [35]:
category = driver.find_elements(By.CLASS_NAME, "c-table-value")[6]
category.text

'Running'

In [83]:
reviews = driver.find_elements(By.CLASS_NAME, "c-toggler__content")[1]
reviews.text

"Aucun avis n'a encore été laissé sur ce produit."

In [122]:
# Now back to the previous page
driver.back()

In [115]:
 driver.close()

## Config fonction de scrap 

In [90]:
import random, time

urls = []
ariannes = []
titles = []
prices = []
sizes = []
descriptions= []
product_ids = []
brands = []
colors = []
genres = []
compositions = []
categories = []
reviews = []


for i in range(len(products)):
    
    products = driver.find_elements(By.CLASS_NAME, "pdp-link")
    products[i].click()
        
    print(f"Checking product n°{i} at {driver.current_url}")
    
    time.sleep(1)

    try:
        url = driver.current_url
        urls.append(url)
    except IndexError:
        pass    
    
    try:
        arianne = driver.find_elements(By.CLASS_NAME, "c-breadcrumb__text")
        ariannes.append(arianne[0].text)
    except IndexError:
        pass
    
                
    try:
        title = driver.title
        titles.append(title)
    except IndexError:
        pass
    try:
        price = driver.find_elements(By.CLASS_NAME, "value")[0]
        prices.append(price.text)
    except IndexError:
        pass
    
    try:
        size = driver.find_elements(By.CLASS_NAME, "c-variation__attr-wrapper")[0]
        sizes.append(size.text)   
    except IndexError:
        pass
    
    try:
        description = driver.find_elements(By.CLASS_NAME, "c-product-detail__description-short")[0]
        descriptions.append(description.text)   
    except IndexError:
        pass    
    
    try: 
        product_id = driver.find_elements(By.CLASS_NAME, "c-table-value")[1]
        product_ids.append(product_id.text)
    except IndexError:
        pass
    
    try:
        brand = driver.find_elements(By.CLASS_NAME, "c-table-value")[2]
        brands.append(brand.text)
    except IndexError:
        pass
    
    try:
        color = driver.find_elements(By.CLASS_NAME, "c-table-value")[3]
        colors.append(color.text)
    except IndexError:
        pass
    
    try:
        genre = driver.find_elements(By.CLASS_NAME, "c-table-value")[4]
        genres.append(genre.text)
    except IndexError:
        pass
        
    try:    
        composition = driver.find_elements(By.CLASS_NAME, "c-table-value")[5]
        compositions.append(composition.text)    
    except IndexError:
        pass
    
    try:
        
        category = driver.find_elements(By.CLASS_NAME, "c-table-value")[6]
        categories.append(category.text)
    except IndexError:
        pass
    
    try:
        review = driver.find_elements(By.CLASS_NAME, "c-toggler__content")[1]    
        reviews.append(review.text)   
    except IndexError:
        pass        
        
    sleep_time = random.randint(1, 5)
    print(f"Waiting {sleep_time} second." if sleep_time == 1 else f"Waiting {sleep_time} seconds.")
    time.sleep(sleep_time)
    driver.back()

Checking product n°0 at https://www.go-sport.com/homme/vetements/vestes-et-parkas/vestes/running/veste-running-homme-athli-tech-lary-300-sweat-dou-m-gs5410601.html
Waiting 4 seconds.
Checking product n°1 at https://www.go-sport.com/running/athletisme/vetements/legging-running-femme-nike-nike-icon-clash-fast-gs6315808.html
Waiting 2 seconds.
Checking product n°2 at https://www.go-sport.com/femme/vetements/tee-shirts/tee-shirts-manches-longues/running/tee-shirt-ml-running-femme-athli-tech-mary-200-gs10475779.html
Waiting 5 seconds.
Checking product n°3 at https://www.go-sport.com/running/athletisme/vetements/collant-multisport-femme-athli-tech-pia-300-gs10475789.html
Waiting 4 seconds.
Checking product n°4 at https://www.go-sport.com/homme/vetements/tee-shirts/marques/nike-/top-running-homme-nike-miler-ss-gs10519802.html
Waiting 3 seconds.
Checking product n°5 at https://www.go-sport.com/femme/vetements/brassieres/maintien-fort/brassiere-multisport-femme-nike-indy--v-neck-gs6709260.html


ElementNotInteractableException: Message: element not interactable
  (Session info: chrome=104.0.5112.102)
Stacktrace:
Backtrace:
	Ordinal0 [0x011578B3+2193587]
	Ordinal0 [0x010F0681+1771137]
	Ordinal0 [0x01004070+802928]
	Ordinal0 [0x01033533+996659]
	Ordinal0 [0x01029223+954915]
	Ordinal0 [0x0104D7AC+1103788]
	Ordinal0 [0x01028C04+953348]
	Ordinal0 [0x0104D9C4+1104324]
	Ordinal0 [0x0105DAE2+1170146]
	Ordinal0 [0x0104D5C6+1103302]
	Ordinal0 [0x010277E0+948192]
	Ordinal0 [0x010286E6+952038]
	GetHandleVerifier [0x01400CB2+2738370]
	GetHandleVerifier [0x013F21B8+2678216]
	GetHandleVerifier [0x011E17AA+512954]
	GetHandleVerifier [0x011E0856+509030]
	Ordinal0 [0x010F743B+1799227]
	Ordinal0 [0x010FBB68+1817448]
	Ordinal0 [0x010FBC55+1817685]
	Ordinal0 [0x01105230+1856048]
	BaseThreadInitThunk [0x77316739+25]
	RtlGetFullPathName_UEx [0x77B990AF+1215]
	RtlGetFullPathName_UEx [0x77B9907D+1165]


In [None]:
driver.close() #reminder : don't forget to close the driver when you're done !

### Test on Pages 2 to 6

here I just play with selenium and try to improve my function to limitate the errors.
But it works :) !

In [94]:
#page 2 
DRIVER_PATH = "/chromedriver.exe"
BASE_URL2 = "https://www.go-sport.com/running/vetements/?page=2"
driver = webdriver.Chrome(os.getcwd()+DRIVER_PATH)

  driver = webdriver.Chrome(os.getcwd()+DRIVER_PATH)


In [95]:
driver.get(BASE_URL2)
driver.current_url,driver.title

('https://www.go-sport.com/running/vetements/?page=2',
 'Vetements Running - Au meilleur prix - GO Sport')

In [None]:
driver.close()

In [103]:
#page 3
DRIVER_PATH = "/chromedriver.exe"
BASE_URL3 = "https://www.go-sport.com/running/vetements/?page=3"
driver = webdriver.Chrome(os.getcwd()+DRIVER_PATH)

  driver = webdriver.Chrome(os.getcwd()+DRIVER_PATH)


In [104]:
driver.get(BASE_URL3)
driver.current_url,driver.title

('https://www.go-sport.com/running/vetements/?page=3',
 'Vetements Running - Au meilleur prix - GO Sport')

In [138]:
driver.close()

In [116]:
DRIVER_PATH = "/chromedriver.exe"
BASE_URL4 = "https://www.go-sport.com/running/vetements/?page=4"
driver = webdriver.Chrome(os.getcwd()+DRIVER_PATH)

  driver = webdriver.Chrome(os.getcwd()+DRIVER_PATH)


In [117]:
driver.get(BASE_URL4)
driver.current_url,driver.title

('https://www.go-sport.com/running/vetements/?page=4',
 'Vetements Running - Au meilleur prix - GO Sport')

In [None]:
driver.close()

In [126]:
DRIVER_PATH = "/chromedriver.exe"
BASE_URL5 = "https://www.go-sport.com/running/vetements/?page=5"
driver = webdriver.Chrome(os.getcwd()+DRIVER_PATH)

  driver = webdriver.Chrome(os.getcwd()+DRIVER_PATH)


In [127]:
driver.get(BASE_URL5)
driver.current_url,driver.title

('https://www.go-sport.com/running/vetements/?page=5',
 'Vetements Running - Au meilleur prix - GO Sport')

In [None]:
DRIVER_PATH = "/chromedriver.exe"
BASE_URL6 = "https://www.go-sport.com/running/vetements/?page=6"
driver = webdriver.Chrome(os.getcwd()+DRIVER_PATH)

In [None]:
driver.get(BASE_URL6)
driver.current_url,driver.title

In [128]:
button_agree = driver.find_elements(By.CLASS_NAME, "onetrust-close-btn-handler")
button_agree

[<selenium.webdriver.remote.webelement.WebElement (session="dfa1fe3d02beb0dc4a6ce8638ecab2b5", element="8768a27e-54fc-438d-b43b-79587f045950")>,
 <selenium.webdriver.remote.webelement.WebElement (session="dfa1fe3d02beb0dc4a6ce8638ecab2b5", element="13dcf8d6-9017-43ca-8a89-968268b334ec")>]

In [129]:
button_agree[0].click()

In [130]:
for i in range(len(products)):
    
    products = driver.find_elements(By.CLASS_NAME, "pdp-link")
    products[i].click()
        
    print(f"Checking product n°{i} at {driver.current_url}")
    
    time.sleep(1)

    try:
        url = driver.current_url
        urls.append(url)
    except IndexError:
        pass    
    
    try:
        arianne = driver.find_elements(By.CLASS_NAME, "c-breadcrumb__text")
        ariannes.append(arianne[0].text)
    except IndexError:
        pass
    try:
        title = driver.title
        titles.append(title)
    except IndexError:
        pass
    try:
        price = driver.find_elements(By.CLASS_NAME, "value")[0]
        prices.append(price.text)
    except IndexError:
        pass
    
    try:
        size = driver.find_elements(By.CLASS_NAME, "c-variation__attr-wrapper")[0]
        sizes.append(size.text)   
    except IndexError:
        pass
    
    try:
        description = driver.find_elements(By.CLASS_NAME, "c-product-detail__description-short")[0]
        descriptions.append(description.text)   
    except IndexError:
        pass    
    
    try: 
        product_id = driver.find_elements(By.CLASS_NAME, "c-table-value")[1]
        product_ids.append(product_id.text)
    except IndexError:
        pass
    
    try:
        brand = driver.find_elements(By.CLASS_NAME, "c-table-value")[2]
        brands.append(brand.text)
    except IndexError:
        pass
    
    try:
        color = driver.find_elements(By.CLASS_NAME, "c-table-value")[3]
        colors.append(color.text)
    except IndexError:
        pass
    
    try:
        genre = driver.find_elements(By.CLASS_NAME, "c-table-value")[4]
        genres.append(genre.text)
    except IndexError:
        pass
        
    try:    
        composition = driver.find_elements(By.CLASS_NAME, "c-table-value")[5]
        compositions.append(composition.text)    
    except IndexError:
        pass
    
    try:
        
        category = driver.find_elements(By.CLASS_NAME, "c-table-value")[6]
        categories.append(category.text)
    except IndexError:
        pass
    
    try:
        review = driver.find_elements(By.CLASS_NAME, "c-toggler__content")[1]    
        reviews.append(review.text)   
    except IndexError:
        pass        
        
    sleep_time = random.randint(1, 5)
    print(f"Waiting {sleep_time} second." if sleep_time == 1 else f"Waiting {sleep_time} seconds.")
    time.sleep(sleep_time)
    driver.back()

Checking product n°0 at https://www.go-sport.com/running/vetements/homme/running-homme-columbia-columbia-fast-trek-fleece-vest-m12486621.html
Waiting 2 seconds.
Checking product n°1 at https://www.go-sport.com/running/vetements/femme/coupe-vents-et-vestes/running-femme-bodycross-coupe-vent-velia-gris-m6032836.html
Waiting 1 second.
Checking product n°2 at https://www.go-sport.com/running-homme-gore-wear-gore-wear-r5-short-m1837854.html
Waiting 4 seconds.
Checking product n°3 at https://www.go-sport.com/running/vetements/femme/course-a-pied-femme-newline-debardeur-femme-newline-base-cool-m10487227.html
Waiting 5 seconds.
Checking product n°4 at https://www.go-sport.com/running/vetements/femme/course-a-pied-femme-newline-debardeur-femme-newline-base-cool-m10488419.html
Waiting 1 second.
Checking product n°5 at https://www.go-sport.com/running/vetements/femme/course-a-pied-femme-newline-debardeur-femme-newline-base-cool-m10488392.html
Waiting 4 seconds.
Checking product n°6 at https://www

In [139]:
# And now we have all the informations we wanted to get !
len(prices),len(brands),len(titles)

(88, 88, 88)

In [100]:
button_next = driver.find_elements(By.CLASS_NAME, "show-more")
button_next


[<selenium.webdriver.remote.webelement.WebElement (session="a0a0d335c3d52a5f017ded19f4a309b4", element="83c8a8dc-105f-4dbf-8151-49b9c65b2554")>]

In [101]:
button_next[0].click()

In [123]:
for i in range(len(products)):
    
    products = driver.find_elements(By.CLASS_NAME, "pdp-link")
    products[i].click()
        
    print(f"Checking product n°{i} at {driver.current_url}")
    
    time.sleep(1)
    try:
        arianne = driver.find_elements(By.CLASS_NAME, "c-breadcrumb__text")
        ariannes.append(arianne[0].text)
    except IndexError:
        pass
    try:
        title = driver.title
        titles.append(title)
    except IndexError:
        pass
    try:
        price = driver.find_elements(By.CLASS_NAME, "value")[0]
        prices.append(price.text)
    except IndexError:
        pass
    
    try:
        size = driver.find_elements(By.CLASS_NAME, "c-variation__attr-wrapper")[0]
        sizes.append(size.text)   
    except IndexError:
        pass
    
    try:
        description = driver.find_elements(By.CLASS_NAME, "c-product-detail__description-short")[0]
        descriptions.append(description.text)   
    except IndexError:
        pass    
    
    try: 
        product_id = driver.find_elements(By.CLASS_NAME, "c-table-value")[1]
        product_ids.append(product_id.text)
    except IndexError:
        pass
    
    try:
        brand = driver.find_elements(By.CLASS_NAME, "c-table-value")[2]
        brands.append(brand.text)
    except IndexError:
        pass
    
    try:
        color = driver.find_elements(By.CLASS_NAME, "c-table-value")[3]
        colors.append(color.text)
    except IndexError:
        pass
    
    try:
        genre = driver.find_elements(By.CLASS_NAME, "c-table-value")[4]
        genres.append(genre.text)
    except IndexError:
        pass
        
    try:    
        composition = driver.find_elements(By.CLASS_NAME, "c-table-value")[5]
        compositions.append(composition.text)    
    except IndexError:
        pass
    
    try:
        
        category = driver.find_elements(By.CLASS_NAME, "c-table-value")[6]
        categories.append(category.text)
    except IndexError:
        pass
    
    try:
        review = driver.find_elements(By.CLASS_NAME, "c-toggler__content")[1]    
        reviews.append(review.text)   
    except IndexError:
        pass        
        
    sleep_time = random.randint(1, 5)
    print(f"Waiting {sleep_time} second." if sleep_time == 1 else f"Waiting {sleep_time} seconds.")
    time.sleep(sleep_time)
    driver.back()

Checking product n°0 at https://www.go-sport.com/homme/vetements/tee-shirts/manches-longues/running/tee-shirt-mc-trail-homme-nike-db-gs10519797.html
Waiting 5 seconds.
Checking product n°1 at https://www.go-sport.com/running/vetements/femme/coupe-vents-et-vestes/running-femme-bodycross-coupe-vent-velia-noir-m6064915.html
Waiting 5 seconds.
Checking product n°2 at https://www.go-sport.com/football-homme-adidas-maillot-adidas-team-base-m9014309.html
Waiting 1 second.
Checking product n°3 at https://www.go-sport.com/running/vetements/femme/course-a-pied-femme-joma-maillot-femme-joma-elite-viii-m10389435.html
Waiting 1 second.
Checking product n°4 at https://www.go-sport.com/running/vetements/femme/course-a-pied-femme-joma-debardeur-femme-joma-elite-vii-m8805022.html
Waiting 2 seconds.
Checking product n°5 at https://www.go-sport.com/running/vetements/femme/course-a-pied-femme-macron-debardeur-femme-macron-running-selina-m8810345.html
Waiting 4 seconds.
Checking product n°6 at https://www.

Waiting 3 seconds.
Checking product n°51 at https://www.go-sport.com/running/vetements/femme/coupe-vents-et-vestes/veste-coupe-vent-running-femme-athli-tech-gaelle-300-cpv-gs2586878.html
Waiting 2 seconds.
Checking product n°52 at https://www.go-sport.com/running/vetements/femme/coupe-vents-et-vestes/rukka-wm-jacket-munk-gs5859193.html
Waiting 4 seconds.
Checking product n°53 at https://www.go-sport.com/homme/vetements/tee-shirts/manches-longues/running/salomon-agile-training-tee-m-gs6654737.html
Waiting 3 seconds.
Checking product n°54 at https://www.go-sport.com/running/athletisme/vetements/tee-shirt-mc-running-femme-athli-tech-pia-200-gs10475775.html
Waiting 4 seconds.
Checking product n°55 at https://www.go-sport.com/running/vetements/homme/tee-shirts/running-homme-joma-joma-elite-vi-m2231436.html
Waiting 4 seconds.
Checking product n°56 at https://www.go-sport.com/fitness-femme-le-coq-sportif-short-femme-le-coq-sportif-training-perf-running-n1-m12359725.html
Waiting 2 seconds.
Che

In [133]:
len(prices),len(brands),len(titles)

(88, 88, 88)

In [127]:
prices[100]

'9,99 €'

Let's import the variables.
I decided to import them as a csv.
This document will then be used to create a dataset for a recommendation system.
(please check the final project -> streamlit app : https://dorisbaillard-streamlit-app01-recom-2nfdeq.streamlit.app/)

In [151]:
import csv

with open("go_sport4.csv", "w", encoding="utf-8") as f:
    writer = csv.writer(f)
    for row in (urls, ariannes, titles, prices, sizes, descriptions, product_ids, brands, colors, genres, compositions, categories, reviews):
        writer.writerow(row)