# Web Scraping Project

## Introduction

This project involves scraping data from multiple retail websites and analyzing the collected data. The goal is to extract useful information and insights from the collected data. We will be scraping the following websites:

1. **[Miss Etam](https://www.missetam.nl/nl/collectie/jurken/)**: Miss Etam is a European retail store based in the Netherlands. We will be extracting information about their dress collection, including details such as product names, prices, urls, brand etc...

2. **[Gap](https://www.gap.com/browse/category.do?cid=5664&nav=meganav%3AWomen%3ACategories%3AJeans#pageId=0&department=136)**: Gap is a major American retailer. We will be collecting data on women's jeans, including product names, prices, styles, urls, brand etc...

3. **[Your Look for Less](https://www.your-look-for-less.nl/goedkope-blouses)**: Your Look for Less is a European retail store based in a non-English speaking country. We will be gathering details on their blouse collection, such as product names, prices, urls, brand etc...

The collected data will be consolidated into a pandas DataFrame for further analysis. Each section of this notebook will detail the steps involved in scraping data from each site, followed by a comprehensive data analysis section.




In [1]:
# interactions with webpage and or extracting data
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import time

# creating and manipulating dataframes
import pandas as pd
import numpy as np

# for translating language of text to english
from googletrans import Translator


## Web Scraping from Sites below

In this section, we will scrape data from the all three websites. The steps include:
- Sending HTTP requests to the website
- Parsing the HTML content
- Extracting the desired data


## 1st website

In [28]:
def setup_driver():
    chrome_options = Options()
    chrome_options.add_argument("--headless")  # Run in headless mode (no GUI)
    service = Service('C:\\Users\\chrome web driver\\chromedriver-win64\\chromedriver-win64\\chromedriver.exe')  # enter your file path to chromedriver(install compatible version if you don't have it already)
    driver = webdriver.Chrome(service=service, options=chrome_options)
    return driver

def scrape_listing_page(url, driver, base_url):
    driver.get(url)
    time.sleep(2)  # Wait for the page to load
    soup = BeautifulSoup(driver.page_source, 'html.parser')

    product_list = soup.find('div', class_='product-list')
    products = []
    translator = Translator()

    for index, product in enumerate(product_list.find_all('div', class_='productItem', limit=10)):
        product_name = product['data-name'].strip()
        product_name = translator.translate(product_name, dest='en').text

        price = float(product['data-price'].replace(',', '.'))

        discounted_price_element = product.find('span', class_='special-price')
        discounted_price = float(discounted_price_element.text.replace('€', '').replace(',', '.')) if discounted_price_element else None

        product_relative_url = product.find('a', class_='product')['href']
        product_url = base_url + product_relative_url

        products.append({
            'url': product_url,
            'page_type': 'product',
            'product_name': product_name,
            'brand': None,
            'price': price,
            'discounted_price': discounted_price,
            'position': index + 1,
            'number_of_photos': None,
            'number_of_colors': None,
            'product_description': None
        })

    # Collect unique brands
    unique_brands = list(set(product['brand'] for product in products if product['brand']))

    # Add an entry for the listing page itself
    listing_page_entry = {
        'url': url,
        'page_type': 'list',
        'product_name': [product['product_name'] for product in products],
        'brand': unique_brands,
        'price': [product['price'] for product in products],
        'discounted_price': [product['discounted_price'] for product in products if product['discounted_price'] is not None],
        'position': None,
        'number_of_photos': None,
        'number_of_colors': None,
        'product_description': None
    }

    return products, listing_page_entry

def scrape_product_details(product, driver):
    driver.get(product['url'])
    time.sleep(2)  # Wait for the page to load
    soup = BeautifulSoup(driver.page_source, 'html.parser')

    translator = Translator()

    # Extract brand
    brand_element = soup.find('span', class_='title--addition')
    brand = brand_element.text.strip() if brand_element else None

    # Number of photos
    photos_div = soup.find('div', class_='product-detail-image')
    number_of_photos = len(photos_div.find_all('figure')) if photos_div else 0

    # Number of colors
    color_elements = soup.find_all('a', class_='variantIcon groupItem')
    number_of_colors = len(color_elements)

    # Product description
    product_description_element = soup.find('div', id='variant-default')
    product_description = product_description_element.text.strip() if product_description_element else None

    # Translate product description to English
    if product_description:
        detected_lang = translator.detect(product_description).lang
        if detected_lang != 'en':
            product_description = translator.translate(product_description, dest='en').text

    product.update({
        'brand': brand,
        'number_of_photos': number_of_photos,
        'number_of_colors': number_of_colors,
        'product_description': product_description
    })

def scrape_multiple_pages(list_urls, base_urls):
    driver = setup_driver()
    all_products = []

    for url, base_url in zip(list_urls, base_urls):
        products, listing_page_entry = scrape_listing_page(url, driver, base_url)

        for product in products:
            scrape_product_details(product, driver)

        # Collect unique brands after updating products with details
        unique_brands = list(set(product['brand'] for product in products if product['brand']))
        listing_page_entry['brand'] = unique_brands
        listing_page_entry['product_name'] = [product['product_name'] for product in products]
        listing_page_entry['discounted_price'] = [product['discounted_price'] for product in products if product['discounted_price'] is not None]

        all_products.extend(products)
        all_products.append(listing_page_entry)

    driver.quit()
    return all_products

# List of URLs to scrape
list_urls = [
    'https://www.missetam.nl/nl/collectie/jurken/'
]

base_urls = [
    'https://www.missetam.nl'
]

# Scrape listing pages
all_products = scrape_multiple_pages(list_urls, base_urls)

# Save to CSV
df = pd.DataFrame(all_products)


print("Scraping completed and data saved to scraped_products.csv")

Scraping completed and data saved to scraped_products.csv


In [32]:
df

Unnamed: 0,url,page_type,product_name,brand,price,discounted_price,position,number_of_photos,number_of_colors,product_description
0,https://www.missetam.nl/nl/3848152/jurk-print-...,product,Dress Print Purple,Jana,59.99,,1.0,2.0,0.0,Add a touch of summer elegance to your wardrob...
1,https://www.missetam.nl/nl/3848154/jurk-ruffle...,product,Dress Ruffles Print Blue,Regina,49.99,,2.0,8.0,0.0,We are ready for good weather!Dress 'Regina' i...
2,https://www.missetam.nl/nl/3848153/jurk-lang-p...,product,Dress long print green,Izzy,64.99,,3.0,9.0,0.0,The perfect summer dress!Dress 'Izzy' is a rea...
3,https://www.missetam.nl/nl/3848151/jurk-print-...,product,Dress Print Purple,Jana,59.99,,4.0,2.0,0.0,Add a touch of summer elegance to your wardrob...
4,https://www.missetam.nl/nl/3848145/jurk-lang-p...,product,Dress long print green,Izzy,59.99,,5.0,8.0,2.0,The perfect summer dress!Dress 'Izzy' is a rea...
5,https://www.missetam.nl/nl/3848144/jurk-lang-p...,product,Dress Long Print Black,Izzy,59.99,,6.0,9.0,2.0,The perfect summer dress!Dress 'Izzy' is a rea...
6,https://www.missetam.nl/nl/3845922/jurk-print-...,product,Dress Print Purple,Reva,44.99,,7.0,3.0,0.0,The perfect dress for those summery days!Dress...
7,https://www.missetam.nl/nl/3845918/jurk-print-...,product,Dress Print Blue,Poppy,59.99,,8.0,7.0,2.0,Go for this power print!Dress 'Poppy' is chara...
8,https://www.missetam.nl/nl/3845917/jurk-print-...,product,Dress Print Black,Poppy,59.99,,9.0,9.0,2.0,Go for this power print!Dress 'Poppy' is chara...
9,https://www.missetam.nl/nl/3845916/jurk-print-...,product,Dress Print White,Poppy,59.99,,10.0,9.0,2.0,Go for this power print!Dress 'Poppy' is chara...


# 2nd website

In [35]:
def setup_driver():
    chrome_options = Options()
    chrome_options.add_argument("--headless")  # Run in headless mode (no GUI)
    service = Service('C:\\Users\\chrome web driver\\chromedriver-win64\\chromedriver-win64\\chromedriver.exe')  # enter your file path to chromedriver(install compatible version if you don't have it already)
    driver = webdriver.Chrome(service=service, options=chrome_options)
    return driver

def scrape_listing_page(url, driver, base_url):
    driver.get(url)
    time.sleep(2)  # Wait for the page to load
    soup = BeautifulSoup(driver.page_source, 'html.parser')

    product_list = soup.find_all('div', {'data-testid': 'grid-root'})
    products = []
    translator = Translator()

    for index, product in enumerate(product_list[:10]):
        try:
            product_anchor = product.find('a', class_='category-page-0')
            if not product_anchor:
                print(f"Product anchor not found for product at index {index}")
                continue

            product_url = product_anchor['href']
            product_url = product_url if product_url.startswith('http') else base_url + product_url

            product_name = product_anchor.find('img')['alt']
            product_name = translator.translate(product_name, dest='en').text

            # Extract price
            price_span = product.find('span', class_='product-price__strike')
            if price_span:
                price = float(price_span.text.strip('$').replace(',', ''))
            else:
                price_div = product.find('div', class_='category-page-r5pe2u')
                price = float(price_div.find('span').text.strip('$').replace(',', '')) if price_div else None

            # Extract discounted price
            discounted_price_div = product.find('div', class_='product-price__highlight')
            discounted_price = float(discounted_price_div.text.strip('$').replace(',', '')) if discounted_price_div else None

            products.append({
                'url': product_url,
                'page_type': 'product',
                'product_name': product_name,
                'brand': 'Gap',
                'price': price,
                'discounted_price': discounted_price,
                'position': index + 1,
                'number_of_photos': None,
                'number_of_colors': None,
                'product_description': None
            })
        except Exception as e:
            print(f"Error processing product at index {index}: {e}")

    unique_brands = list(set(product['brand'] for product in products if product['brand']))

    listing_page_entry = {
        'url': url,
        'page_type': 'list',
        'product_name': [product['product_name'] for product in products],
        'brand': unique_brands,
        'price': [product['price'] for product in products],
        'discounted_price': [product['discounted_price'] for product in products if product['discounted_price'] is not None],
        'position': None,
        'number_of_photos': None,
        'number_of_colors': None,
        'product_description': None
    }

    return products, listing_page_entry

def scrape_product_details(product, driver):
    try:
        driver.get(product['url'])
        time.sleep(2)  # Wait for the page to load
        soup = BeautifulSoup(driver.page_source, 'html.parser')

        translator = Translator()

        # Number of photos
        photos_divs = soup.find_all('div', class_='brick__product-image-wrapper pdp-mfe-1teox8g')
        number_of_photos = len(photos_divs)

        # Number of colors
        color_divs = soup.find_all('div', class_='pdp-mfe-b3pn3b')
        number_of_colors = len(color_divs)

        # Product description
        product_description_div = soup.find('div', class_='pdp-mfe-1e07b82')
        product_description = ""
        if product_description_div:
            for element in product_description_div.find_all(recursive=False):
                product_description += element.text.strip() + "\n"

        if product_description:
            detected_lang = translator.detect(product_description).lang
            if detected_lang != 'en':
                product_description = translator.translate(product_description, dest='en').text

        product.update({
            'number_of_photos': number_of_photos,
            'number_of_colors': number_of_colors,
            'product_description': product_description.strip()
        })
    except Exception as e:
        print(f"Error processing product details for {product['url']}: {e}")

def scrape_multiple_pages(list_urls, base_urls):
    driver = setup_driver()
    all_products = []

    for url, base_url in zip(list_urls, base_urls):
        products, listing_page_entry = scrape_listing_page(url, driver, base_url)

        for product in products:
            scrape_product_details(product, driver)

        unique_brands = list(set(product['brand'] for product in products if product['brand']))
        listing_page_entry['brand'] = unique_brands
        listing_page_entry['product_name'] = [product['product_name'] for product in products]
        listing_page_entry['discounted_price'] = [product['discounted_price'] for product in products if product['discounted_price'] is not None]

        all_products.extend(products)
        all_products.append(listing_page_entry)

    driver.quit()
    return all_products

# List of URLs to scrape
list_urls = [
    'https://www.gap.com/browse/category.do?cid=5664&nav=meganav%3AWomen%3ACategories%3AJeans#pageId=0&department=136'
]

base_urls = [
    'https://www.gap.com'
]

# Scrape listing pages
all_products = scrape_multiple_pages(list_urls, base_urls)

# Save to CSV
df2 = pd.DataFrame(all_products)
#df.to_csv('scraped_products.csv', index=False)

print("Scraping completed and data saved to scraped_products.csv")

Product anchor not found for product at index 1
Product anchor not found for product at index 8
Product anchor not found for product at index 9
Scraping completed and data saved to scraped_products.csv


In [36]:
df2

Unnamed: 0,url,page_type,product_name,brand,price,discounted_price,position,number_of_photos,number_of_colors,product_description
0,https://www.gap.com/browse/product.do?pid=4849...,product,Mid Rise UltraSoft Baggy Jeans,Gap,79.95,47.0,1.0,0.0,1.0,"Our Mid Rise Jean has a 10"" (25 cm) rise.​Fitt..."
1,https://www.gap.com/browse/product.do?pid=4849...,product,Mid Rise UltraSoft Baggy Jeans,Gap,79.95,,3.0,0.0,1.0,"Our Mid Rise Jean has a 10"" (25 cm) rise.​Fitt..."
2,https://www.gap.com/browse/product.do?pid=4561...,product,Mid Rise UltraSoft Baggy Jeans,Gap,89.95,,4.0,0.0,1.0,"Our Mid Rise Jean has a 10"" (25 cm) rise.​Fitt..."
3,https://www.gap.com/browse/product.do?pid=4035...,product,Mid Rise UltraSoft Baggy Jeans,Gap,79.95,,5.0,0.0,2.0,"Our Mid Rise Jean has a 10"" (25 cm) rise.​Fitt..."
4,https://www.gap.com/browse/product.do?pid=4350...,product,Mid Rise UltraSoft Baggy Jeans,Gap,89.95,,6.0,0.0,2.0,"Our Mid Rise Jean has a 10"" (25 cm) rise.​Fitt..."
5,https://www.gap.com/browse/product.do?pid=5045...,product,Mid Rise Cargo Baggy Jeans,Gap,89.95,,7.0,0.0,1.0,"Our Mid Rise Jean has a 10"" (25 cm) rise.​Fitt..."
6,https://www.gap.com/browse/product.do?pid=8534...,product,Mid Rise Wide Baggy Cargo Jeans,Gap,79.95,47.0,8.0,0.0,1.0,"Our Mid Rise Jean has a 10"" (25 cm) rise.​Loos..."
7,https://www.gap.com/browse/category.do?cid=566...,list,"[Mid Rise UltraSoft Baggy Jeans, Mid Rise Ultr...",[Gap],"[79.95, 79.95, 89.95, 79.95, 89.95, 89.95, 79.95]","[47.0, 47.0]",,,,


# 3rd website

In [37]:
def setup_driver():
    chrome_options = Options()
    chrome_options.add_argument("--headless")  # Run in headless mode (no GUI)
    service = Service('C:\\Users\\chrome web driver\\chromedriver-win64\\chromedriver-win64\\chromedriver.exe')  # enter your file path to chromedriver(install compatible version if you don't have it already)
    driver = webdriver.Chrome(service=service, options=chrome_options)
    return driver

def scrape_listing_page(url, driver, base_url):
    driver.get(url)
    time.sleep(2)  # Wait for the page to load
    soup = BeautifulSoup(driver.page_source, 'html.parser')

    product_list = soup.find_all('a', {'data-testid': 'product'})
    products = []
    translator = Translator()

    for index, product in enumerate(product_list[:10]):
        try:
            product_url = product['href']
            product_url = product_url if product_url.startswith('http') else base_url + product_url

            product_name_element = product.find('strong', class_='sc-fed992a6-0 eJZrEW sc-b18a510e-5 cbyClg')
            product_name = product_name_element.text.strip() if product_name_element else "Unknown"
            product_name = translator.translate(product_name, dest='en').text

            # Extract price
            price_span = product.find('span', class_='sc-49115527-0 bVjZVj')
            price = float(price_span.text.replace('€', '').replace('&nbsp;', '').replace(',', '.').replace('-', '').strip()) if price_span else None

            # Extract discounted price
            discounted_price_span = product.find('span', class_='sc-49115527-0 bkKyDO')
            discounted_price = float(discounted_price_span.text.replace('€', '').replace('&nbsp;', '').replace(',', '.').replace('-', '').strip()) if discounted_price_span else None

            products.append({
                'url': product_url,
                'page_type': 'product',
                'product_name': product_name,
                'brand': 'Unknown',
                'price': price,
                'discounted_price': discounted_price,
                'position': index + 1,
                'number_of_photos': None,
                'number_of_colors': None,
                'product_description': None
            })
        except Exception as e:
            print(f"Error processing product at index {index}: {e}")

    unique_brands = list(set(product['brand'] for product in products if product['brand']))

    listing_page_entry = {
        'url': url,
        'page_type': 'list',
        'product_name': [product['product_name'] for product in products],
        'brand': unique_brands,
        'price': [product['price'] for product in products],
        'discounted_price': [product['discounted_price'] for product in products if product['discounted_price'] is not None],
        'position': None,
        'number_of_photos': None,
        'number_of_colors': None,
        'product_description': None
    }

    return products, listing_page_entry

def scrape_product_details(product, driver):
    try:
        driver.get(product['url'])
        time.sleep(2)  # Wait for the page to load
        soup = BeautifulSoup(driver.page_source, 'html.parser')

        translator = Translator()

        # Number of photos
        photos_divs = soup.find_all('div', class_='sc-f6fccf7e-0 dXZWLQ')
        number_of_photos = len(photos_divs)

        # Number of colors
        color_divs = soup.find_all('div', {'data-trackid': 'color-tile'})
        number_of_colors = len(color_divs)

        # Product description
        product_description_p = soup.find('p', class_='sc-fed992a6-0 erEHBL')
        product_description = product_description_p.text.strip() if product_description_p else ""

        if product_description:
            detected_lang = translator.detect(product_description).lang
            if detected_lang != 'en':
                product_description = translator.translate(product_description, dest='en').text

        product.update({
            'number_of_photos': number_of_photos,
            'number_of_colors': number_of_colors,
            'product_description': product_description.strip()
        })
    except Exception as e:
        print(f"Error processing product details for {product['url']}: {e}")

def scrape_multiple_pages(list_urls, base_urls):
    driver = setup_driver()
    all_products = []

    for url, base_url in zip(list_urls, base_urls):
        products, listing_page_entry = scrape_listing_page(url, driver, base_url)

        for product in products:
            scrape_product_details(product, driver)

        unique_brands = list(set(product['brand'] for product in products if product['brand']))
        listing_page_entry['brand'] = unique_brands
        listing_page_entry['product_name'] = [product['product_name'] for product in products]
        listing_page_entry['discounted_price'] = [product['discounted_price'] for product in products if product['discounted_price'] is not None]

        all_products.extend(products)
        all_products.append(listing_page_entry)

    driver.quit()
    return all_products

# List of URLs to scrape
list_urls = [
    'https://www.your-look-for-less.nl/goedkope-blouses'
]

base_urls = [
    'https://www.your-look-for-less.nl'
]

# Scrape listing pages
all_products = scrape_multiple_pages(list_urls, base_urls)

# Save to CSV
df3 = pd.DataFrame(all_products)


print("Scraping completed and data saved to scraped_products_example.csv")

Scraping completed and data saved to scraped_products_example.csv


In [38]:
df3

Unnamed: 0,url,page_type,product_name,brand,price,discounted_price,position,number_of_photos,number_of_colors,product_description
0,https://www.your-look-for-less.nl/p/134327,product,Blouse with satin print,Unknown,25.0,19.0,1.0,4.0,1.0,Attention fashionistas: with this blouse with ...
1,https://www.your-look-for-less.nl/p/159561,product,Tunic with 3/4 sleeves,Unknown,19.0,10.0,2.0,4.0,1.0,"Attracting, enjoying it and looking good: this..."
2,https://www.your-look-for-less.nl/p/134145,product,Long blouse in a-line,Unknown,25.0,15.0,3.0,4.0,1.0,A striking pattern mix in harmoniously colored...
3,https://www.your-look-for-less.nl/p/170077,product,Comfortable blouse,Unknown,19.0,10.0,4.0,3.0,3.0,This always looks good!Comfortable blouse with...
4,https://www.your-look-for-less.nl/p/173930,product,Comfortable blouse,Unknown,23.0,15.0,5.0,4.0,1.0,This comfortable blouse radiates lightness!The...
5,https://www.your-look-for-less.nl/p/173628,product,Comfortable blouse,Unknown,20.0,13.0,6.0,3.0,3.0,"With this comfortable blouse, the decorative s..."
6,https://www.your-look-for-less.nl/p/176862,product,Jeans blouse with fringes on the zoom,Unknown,25.0,15.0,7.0,4.0,1.0,This wonderfully soft jeans blouse is just the...
7,https://www.your-look-for-less.nl/p/127901,product,Longline blouse,Unknown,25.0,15.0,8.0,4.0,1.0,"Light, airy and wonderfully flattering: long b..."
8,https://www.your-look-for-less.nl/p/159398,product,Comfortable blouse,Unknown,23.0,11.0,9.0,4.0,1.0,The print: so beautiful.The cut: very fashiona...
9,https://www.your-look-for-less.nl/p/168806,product,Blouse met ruches,Unknown,23.0,20.0,10.0,4.0,1.0,Love at first sight: romantic blouse with flor...


## Data Merging

After scraping the data from the three websites (Miss Etam, Gap, and Your Look for Less), we will consolidate all the collected information into a single pandas DataFrame. This step ensures that we have a unified dataset for easier access and further use. The merging process involves:

1. **Loading DataFrames**: Each individual DataFrame from the web scraping process.
2. **Combining DataFrames**: Merging the DataFrames into a single DataFrame using appropriate keys and ensuring consistent data structure.

Once the DataFrame is merged, it will be saved as a CSV file. This CSV file will contain comprehensive details on the products collected from all three retail websites, providing a consolidated view of the product offerings from these retailers. This CSV can be used for further analysis by anyone.


In [39]:
# merge the df
merged_df = pd.concat([df, df2, df3], ignore_index=True)
merged_df


Unnamed: 0,url,page_type,product_name,brand,price,discounted_price,position,number_of_photos,number_of_colors,product_description
0,https://www.missetam.nl/nl/3848152/jurk-print-...,product,Dress Print Purple,Jana,59.99,,1.0,2.0,0.0,Add a touch of summer elegance to your wardrob...
1,https://www.missetam.nl/nl/3848154/jurk-ruffle...,product,Dress Ruffles Print Blue,Regina,49.99,,2.0,8.0,0.0,We are ready for good weather!Dress 'Regina' i...
2,https://www.missetam.nl/nl/3848153/jurk-lang-p...,product,Dress long print green,Izzy,64.99,,3.0,9.0,0.0,The perfect summer dress!Dress 'Izzy' is a rea...
3,https://www.missetam.nl/nl/3848151/jurk-print-...,product,Dress Print Purple,Jana,59.99,,4.0,2.0,0.0,Add a touch of summer elegance to your wardrob...
4,https://www.missetam.nl/nl/3848145/jurk-lang-p...,product,Dress long print green,Izzy,59.99,,5.0,8.0,2.0,The perfect summer dress!Dress 'Izzy' is a rea...
5,https://www.missetam.nl/nl/3848144/jurk-lang-p...,product,Dress Long Print Black,Izzy,59.99,,6.0,9.0,2.0,The perfect summer dress!Dress 'Izzy' is a rea...
6,https://www.missetam.nl/nl/3845922/jurk-print-...,product,Dress Print Purple,Reva,44.99,,7.0,3.0,0.0,The perfect dress for those summery days!Dress...
7,https://www.missetam.nl/nl/3845918/jurk-print-...,product,Dress Print Blue,Poppy,59.99,,8.0,7.0,2.0,Go for this power print!Dress 'Poppy' is chara...
8,https://www.missetam.nl/nl/3845917/jurk-print-...,product,Dress Print Black,Poppy,59.99,,9.0,9.0,2.0,Go for this power print!Dress 'Poppy' is chara...
9,https://www.missetam.nl/nl/3845916/jurk-print-...,product,Dress Print White,Poppy,59.99,,10.0,9.0,2.0,Go for this power print!Dress 'Poppy' is chara...


In [29]:
merged_df.to_csv('webscraped_mkt_data.csv', index=False)

## Conclusion

In this project, we successfully scraped product data from three different retail websites: Miss Etam, Gap, and Your Look for Less. The collected data was then merged into a single pandas DataFrame, providing a unified and comprehensive dataset of product details across these retailers. This consolidated DataFrame was saved as a CSV file, making it readily available for further analysis, reporting, or integration into other systems.

Key outcomes:
- Successfully extracted product information from a major retailer and two European stores based in non-English speaking countries.
- Merged the collected data into a single, cohesive DataFrame.
- Saved the merged DataFrame as a CSV file for easy access and analysis.

This project demonstrates the effectiveness of web scraping for gathering and consolidating data from multiple sources, enabling more efficient data management and utilization.
