# Assignment: Scraping eBay Data Using Selenium

This assignment will guide you through the steps required to scrape product data from [eBay](https://www.ebay.com/) using Selenium. Your goal is to collect data about products based on a specific search query and store the data in a CSV file for analysis.

## Instructions

Below is a step-by-step outline of the scraping process. Follow these steps and implement the required code to complete the assignment. Comment your code wherever necessary to explain your thought process.

### **Step 1: Set Up Selenium**
1. Import the necessary modules from Selenium (e.g., `webdriver`, `By`, `Keys`, etc.).
2. Set up the Chrome WebDriver to control the browser. Ensure you have downloaded the ChromeDriver executable and placed it in the correct directory.
3. Navigate to the eBay homepage using the WebDriver.

In [1]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
import pandas as pd
import time

# Set up the ChromeDriver service
service = Service(executable_path="chromedriver.exe")
driver = webdriver.Chrome(service=service)

# Navigate to ebay website
page_url = "https://www.ebay.com/"
driver.get(page_url)

### **Step 2: Perform a Search**
1. Identify the search bar element on the eBay homepage using an appropriate locator (e.g., `id`, `name`, `XPath`).
2. Send a specific search query (e.g., "laptops") to the search bar and simulate pressing the Enter key.
3. Wait for the search results page to load.

In [10]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
import time

# Set up the ChromeDriver service
service = Service(executable_path="chromedriver.exe")
driver = webdriver.Chrome(service=service)

# Navigate to eBay website
page_url = "https://www.ebay.com/"
driver.get(page_url)

try:
    # Ensure the driver is still active
    if driver.current_window_handle:
        # Locate the search bar using the correct XPath
        search_bar = driver.find_element(By.XPATH, "//*[@id='gh-ac']")
        search_bar.send_keys("laptop")
        # Press Enter to initiate the search
        search_bar.send_keys(Keys.ENTER)
        # Wait for the page to load
        time.sleep(3)
except Exception as e:
    print(f"An error occurred: {e}")

### **Step 3: Extract Product Data**
1. Use `find_elements` to locate product titles, prices, and other relevant data on the search results page. For example:
   - Product title: Locate elements displaying the product names.
   - Price: Locate elements showing product prices.
   - (Optional) Link: Extract the URL for each product.
2. Loop through the extracted elements and store the data in a structured format (e.g., a Python list of dictionaries).

In [12]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

# Set up the ChromeDriver service
service = Service(executable_path="chromedriver.exe")
driver = webdriver.Chrome(service=service)

# Navigate to eBay website
page_url = "https://www.ebay.com/"
driver.get(page_url)

try:
    # Ensure the driver is still active
    if driver.current_window_handle:
        # Locate the search bar and search for "laptop"
        search_bar = WebDriverWait(driver, 10).until(
            EC.visibility_of_element_located((By.XPATH, "//*[@id='gh-ac']"))
        )
        search_bar.send_keys("laptop")
        search_bar.send_keys(Keys.ENTER)

        # Wait for the search results to load
        time.sleep(3)

        # Locate product titles, prices, and links
        products = driver.find_elements(By.XPATH, "//li[contains(@class, 's-item')]")
        
        product_data = []

        for product in products:
            #<div class="s-item__title"><span role="heading" aria-level="3"><!--F#f_0-->FAST CHEAP TOP BRAND INTEL CORE i5 8TH GEN 16GB RAM 480GB SSD WINDOWS 11 LAPTOP<!--F/--></span></div>
            title_element = product.find_element(By.XPATH, ".//div[contains(@class, 's-item__title')]")
            price_element = product.find_element(By.XPATH, ".//span[contains(@class, 's-item__price')]")
            link_element = product.find_element(By.XPATH, ".//a[contains(@class, 's-item__link')]")
            
            product_data.append({
                "title": title_element.text,
                "price": price_element.text,
                "link": link_element.get_attribute("href")
            })

        # Print extracted product data
        for item in product_data:
            print(item)

except Exception as e:
    print(f"An error occurred: {e}")
finally:
    # Close the driver
    driver.quit()

{'title': '', 'price': '', 'link': 'https://ebay.com/itm/123456?itmmeta=012DEW30YG0MEEKND7NH&hash=item123546:g:acwAA9KNiJowH:sc:ShippingMethodStandard!95008!US!-1&itmprp=enc%3AbgepL1tlUHjMGCVfSTGJh%2BzsVKeJ3CQk7NizDI4BZeppuFnmyS6Ijyp8lh%2FnEw%2BWqO7uTV1Q6izE1R0T54aV8j71F4xlWfVcGft4%2FiOQhtqVXA1rW6M1atPARQRmhqUxtEPJKhKtSFgI%2Bvwlzb0GwVCtkp%3ABlBMUObkmabpYw'}
{'title': '', 'price': '', 'link': 'https://ebay.com/itm/123456?itmmeta=012DEW30YG0MEEKND7NH&hash=item123546:g:acwAA9KNiJowH:sc:ShippingMethodStandard!95008!US!-1&itmprp=enc%3AbgepL1tlUHjMGCVfSTGJh%2BzsVKeJ3CQk7NizDI4BZeppuFnmyS6Ijyp8lh%2FnEw%2BWqO7uTV1Q6izE1R0T54aV8j71F4xlWfVcGft4%2FiOQhtqVXA1rW6M1atPARQRmhqUxtEPJKhKtSFgI%2Bvwlzb0GwVCtkp%3ABlBMUObkmabpYw'}
{'title': 'FAST CHEAP TOP BRAND INTEL CORE i5 8TH GEN 16GB RAM 480GB SSD WINDOWS 11 LAPTOP', 'price': '$161.49 to $316.77', 'link': 'https://www.ebay.com/itm/256280150930?_skw=laptop&itmmeta=01JGW5FJ64B6RP2GAMGGXD9K6Y&hash=item3bab7cc392:g:nYIAAOSwKmNmufRh&itmprp=enc%3AAQAJAAAA4H

### **Step 4: Handle Pagination**
1. Check for the presence of a "Next" button to navigate to the next page of results.
2. Implement a loop to scrape multiple pages of search results. Break the loop when no more pages are available or after a set number of pages (e.g., 5 pages).

In [14]:
#<a href="https://www.ebay.com/sch/i.html?_from=R40&amp;_nkw=laptop&amp;_sacat=0&amp;_pgn=2" 
# type="next" data-track="{&quot;eventFamily&quot;:&quot;LST&quot;,&quot;eventAction&quot;:&quot;ACTN&quot;,&quot;actionKind&quot;
# :&quot;NAVSRC&quot;,&quot;actionKinds&quot;:[&quot;NAVSRC&quot;],&quot;operationId&quot;:&quot;2351460&quot;,&quot;flushImmediately&quot;
# :false,&quot;eventProperty&quot;:{&quot;moduledtl&quot;:&quot;mi%3A4115%7Ciid%3A1%7Cli%3A1514%7Cluid%3Anext%7Ckind%3Apages%7C&quot;
# ,&quot;pageci&quot;:&quot;cd8a5b48-cbaa-11ef-93f0-7a744e3ccda3&quot;,&quot;parentrq&quot;:&quot;3856ae851940aab093f86969ffed1204&quot;}}
# " _sp="p2351460.m4115.l8631" class="pagination__next icon-link" aria-label="Go to next search page" 
# style="min-width:40px;"><svg class="icon icon--16" 
# focusable="false" aria-hidden="true"><use href="#icon-arrow-right-16"></use></svg></a>

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

# Set up the ChromeDriver service
service = Service(executable_path="chromedriver.exe")
driver = webdriver.Chrome(service=service)

# Navigate to eBay website
page_url = "https://www.ebay.com/"
driver.get(page_url)

# Initialize an empty list to store product data
product_data = []

try:
    # Ensure the driver is still active
    if driver.current_window_handle:
        # Locate the search bar and search for "laptop"
        search_bar = WebDriverWait(driver, 10).until(
            EC.visibility_of_element_located((By.XPATH, "//*[@id='gh-ac']"))
        )
        search_bar.send_keys("laptop")
        search_bar.send_keys(Keys.ENTER)

        # Set up a loop for pagination
        for page in range(5):  # Limiting to 5 pages
            time.sleep(3)  # Wait for the page to load
            
            # Locate product items on the current page
            products = driver.find_elements(By.XPATH, "//li[contains(@class, 's-item')]")
            
            for product in products:
                try:
                    title_element = product.find_element(By.XPATH, ".//div[contains(@class, 's-item__title')]")
                    price_element = product.find_element(By.XPATH, ".//span[contains(@class, 's-item__price')]")
                    link_element = product.find_element(By.XPATH, ".//a[contains(@class, 's-item__link')]")

                    product_data.append({
                        "title": title_element.text,
                        "price": price_element.text,
                        "link": link_element.get_attribute("href")
                    })
                except Exception as e:
                    print(f"Error extracting product data: {e}")

            # Check for the presence of the "Next" button
            try:
                next_button = driver.find_element(By.XPATH, "//a[contains(@class, 'pagination__next icon-link')]")
                if "disabled" in next_button.get_attribute("class"):
                    print("No more pages to navigate.")
                    break  # Exit the loop if the "Next" button is disabled
                next_button.click()  # Click the "Next" button to go to the next page
            except Exception as e:
                print("Next button not found or no more pages.")
                break  # Exit the loop if the "Next" button is not found

except Exception as e:
    print(f"An error occurred: {e}")
finally:
    # Print the extracted product data
    for item in product_data:
        print(item)

    # Close the driver
    driver.quit()
   

{'title': '', 'price': '', 'link': 'https://ebay.com/itm/123456?itmmeta=012DEW30YG0MEEKND7NH&hash=item123546:g:acwAA9KNiJowH:sc:ShippingMethodStandard!95008!US!-1&itmprp=enc%3AbgepL1tlUHjMGCVfSTGJh%2BzsVKeJ3CQk7NizDI4BZeppuFnmyS6Ijyp8lh%2FnEw%2BWqO7uTV1Q6izE1R0T54aV8j71F4xlWfVcGft4%2FiOQhtqVXA1rW6M1atPARQRmhqUxtEPJKhKtSFgI%2Bvwlzb0GwVCtkp%3ABlBMUObkmabpYw'}
{'title': '', 'price': '', 'link': 'https://ebay.com/itm/123456?itmmeta=012DEW30YG0MEEKND7NH&hash=item123546:g:acwAA9KNiJowH:sc:ShippingMethodStandard!95008!US!-1&itmprp=enc%3AbgepL1tlUHjMGCVfSTGJh%2BzsVKeJ3CQk7NizDI4BZeppuFnmyS6Ijyp8lh%2FnEw%2BWqO7uTV1Q6izE1R0T54aV8j71F4xlWfVcGft4%2FiOQhtqVXA1rW6M1atPARQRmhqUxtEPJKhKtSFgI%2Bvwlzb0GwVCtkp%3ABlBMUObkmabpYw'}
{'title': 'FAST CHEAP TOP BRAND INTEL CORE i5 8TH GEN 16GB RAM 480GB SSD WINDOWS 11 LAPTOP', 'price': '$161.49 to $316.77', 'link': 'https://www.ebay.com/itm/256280150930?_skw=laptop&itmmeta=01JGW5VA64N29KWS73C8Q2HCV4&hash=item3bab7cc392:g:nYIAAOSwKmNmufRh&itmprp=enc%3AAQAJAAAA4H

### **Step 5: Save Data to CSV**
1. Use the `pandas` library to convert the scraped data into a DataFrame.
2. Save the DataFrame to a CSV file with appropriate column headers.

In [16]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time

# Set up the ChromeDriver service
service = Service(executable_path="chromedriver.exe")
driver = webdriver.Chrome(service=service)

# Navigate to eBay website
page_url = "https://www.ebay.com/"
driver.get(page_url)

# Initialize an empty list to store product data
product_data = []

try:
    # Ensure the driver is still active
    if driver.current_window_handle:
        # Locate the search bar and search for "laptop"
        search_bar = WebDriverWait(driver, 10).until(
            EC.visibility_of_element_located((By.XPATH, "//*[@id='gh-ac']"))
        )
        search_bar.send_keys("laptop")
        search_bar.send_keys(Keys.ENTER)

        # Set up a loop for pagination
        for page in range(5):  # Limiting to 5 pages
            time.sleep(3)  # Wait for the page to load
            
            # Locate product items on the current page
            products = driver.find_elements(By.XPATH, "//li[contains(@class, 's-item')]")
            
            for product in products:
                try:
                    title_element = product.find_element(By.XPATH, ".//div[contains(@class, 's-item__title')]")
                    price_element = product.find_element(By.XPATH, ".//span[contains(@class, 's-item__price')]")
                    link_element = product.find_element(By.XPATH, ".//a[contains(@class, 's-item__link')]")

                    product_data.append({
                        "Title": title_element.text,
                        "Price": price_element.text,
                        "Link": link_element.get_attribute("href")
                    })
                except Exception as e:
                    print(f"Error extracting product data: {e}")

            # Check for the presence of the "Next" button
            try:
                next_button = driver.find_element(By.XPATH, "//a[contains(@class, 'pagination__next icon-link')]")
                if "disabled" in next_button.get_attribute("class"):
                    print("No more pages to navigate.")
                    break  # Exit the loop if the "Next" button is disabled
                next_button.click()  # Click the "Next" button to go to the next page
            except Exception as e:
                print("Next button not found or no more pages.")
                break  # Exit the loop if the "Next" button is not found

except Exception as e:
    print(f"An error occurred: {e}")
finally:
    # Convert the scraped data into a DataFrame
    df = pd.DataFrame(product_data)

    # Save the DataFrame to a CSV file
    df.to_csv('ebay_laptops.csv', index=False)

    print("Data saved to ebay_laptops.csv")

    # Close the driver
    driver.quit()

Data saved to ebay_laptops.csv


### **Step 6: Close the Browser**
1. Once the scraping is complete, ensure the WebDriver is closed to release system resources.

In [None]:
# Close the driver
driver.quit()

### **Deliverables**
- Submit the Python script you implemented on your github, following the above steps.
- Ensure that your script:
  - Extracts data for at least 50 products.
  - Includes product titles, prices, and links (if applicable).
  - Saves the data to a CSV file named `ebay_products.csv`.

### **Bonus Challenge**
1. Add functionality to scrape product ratings and the number of reviews (if available).
2. Include error handling to skip elements that might be missing data or inaccessible.

**Good luck!** 🚀