[Reference](https://medium.com/@datajournal/scraping-amazon-best-sellers-2781b1399bc9)

# Step 1: Install Python

In [1]:
!pip install selenium webdriver-manager pandas

Collecting selenium
  Downloading selenium-4.27.1-py3-none-any.whl.metadata (7.1 kB)
Collecting webdriver-manager
  Downloading webdriver_manager-4.0.2-py2.py3-none-any.whl.metadata (12 kB)
Collecting trio~=0.17 (from selenium)
  Downloading trio-0.28.0-py3-none-any.whl.metadata (8.5 kB)
Collecting trio-websocket~=0.9 (from selenium)
  Downloading trio_websocket-0.11.1-py3-none-any.whl.metadata (4.7 kB)
Collecting python-dotenv (from webdriver-manager)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting sortedcontainers (from trio~=0.17->selenium)
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting outcome (from trio~=0.17->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl.metadata (2.6 kB)
Collecting wsproto>=0.14 (from trio-websocket~=0.9->selenium)
  Downloading wsproto-1.2.0-py3-none-any.whl.metadata (5.6 kB)
Downloading selenium-4.27.1-py3-none-any.whl (9.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# Step 2: Inspecting Amazon’s Best Sellers Page
- Product title: The product title is located within a specific HTML element with a class that uniquely identifies it.
Sellers Page
- Product price: The price is located within another specific HTML element.
Sellers Page
- Product URL: Each product has a URL linking to its detailed product page.

# Step 3: Setting Up Selenium for Web Scraping

In [2]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
import time

In [3]:
def init_chrome_driver():
    chrome_options = Options()
    chrome_options.add_argument(" - headless")
    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service, options=chrome_options)
    return driver

# Step 4: Writing the Scraping Logic

In [4]:
def get_products_from_page(url, driver):
    driver.get(url)
    time.sleep(3) # Wait for the page to load
    # Find all products on the page
    product_elements = driver.find_elements(By.CLASS_NAME, "zg-item")
    # List to store product data
    products = []
    # Loop through the products and extract data
    for product in product_elements:
        try:
            title = product.find_element(By.CLASS_NAME, "p13n-sc-truncate").text
            url = product.find_element(By.CLASS_NAME, "a-link-normal").get_attribute("href")
            price = product.find_element(By.CLASS_NAME, "p13n-sc-price").text
            products.append({"title": title, "url": url, "price": price})
        except Exception as e:
            print(f"Error extracting product data: {e}")
            continue
    return products

# Step 5: Exporting Data to CSV

In [5]:
def save_to_csv(products, filename):
    df = pd.DataFrame(products)
    df.to_csv(filename, index=False)

# Step 6: Putting It All Together

In [7]:
def main():
    url = "https://www.amazon.com/Best-Sellers-Kitchen-Dining/zgbs/kitchen/"
    driver = init_chrome_driver()
    try:
        products = get_products_from_page(url, driver)
        save_to_csv(products, "amazon_best_sellers.csv")
    finally:
        driver.quit()
if __name__ == "__main__":
    main()

# Step 7: Running the Script
```
python main.py
```

# Step 8: Handling Scraping Challenges

1. Rate-limiting: Amazon can block your IP address if it detects too many requests in a short period. To avoid this, implement a delay between requests using time.sleep().
2. CAPTCHA: Amazon uses CAPTCHAs to prevent bots from scraping their site. Selenium cannot solve CAPTCHAs, so you may need a service like 2Captcha to bypass them.
3. IP blocking: To prevent your IP from being blocked, consider using a proxy service like ScraperAPI or rotating IP addresses.

# Step 9: Scraping More Categories

In [8]:
url = "https://www.amazon.com/Best-Sellers-Books/zgbs/books/"

# Step 10: Using an Amazon Scraping API (Alternative Method)

In [10]:
import requests
def scrape_amazon_api():
    payload = {
    "source": "amazon_bestsellers",
    "domain": "com",
    "query": "284507",
    "render": "html",
    "start_page": 1,
    "parse": True,
    }
    response = requests.post("https://realtime.oxylabs.io/v1/queries", json=payload, auth=("USERNAME", "PASSWORD"))
    data = response.json()
    # Process the data and save it to CSV
    products = data["results"][0]["content"]["results"]
    df = pd.DataFrame(products)
    df.to_csv("amazon_products_api.csv", index=False)
scrape_amazon_api()