 <style>
    body {
      font-family: Arial, sans-serif;
      margin: 20px;
    }
    h1 {
      color: #333;
    }
    p {
      color: #666;
    }
    table {
      border-collapse: collapse;
      width: 100%;
    }
    th, td {
      border: 1px solid #ddd;
      padding: 8px;
      text-align: left;
    }
    th {
      background-color: #f2f2f2;
    }
    tr:nth-child(even) {
      background-color: #f2f2f2;
    }
  </style>
</head>
<body>

<h1>JavaScript-Enabled Web Scraping with Selenium</h1>
<p>When dealing with JavaScript-enabled websites that dynamically load content, traditional web scraping libraries like BeautifulSoup alone may not suffice. In such cases, Selenium, a powerful automation tool, comes to the rescue.</p>
<p>Selenium allows you to automate interactions with a web browser, such as clicking buttons, filling forms, and scrolling through pages. This makes it possible to scrape data from pages that rely heavily on JavaScript for content generation.</p>

<h2>Example: Scraping AJIO Kidswear Page</h2>
<p>In this example, we demonstrate scraping data from an AJIO webpage selling kidswear. The page uses JavaScript to load content dynamically as the user scrolls down.</p>

</body>

In [32]:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time

# Configure the webdriver for Edge (provide the path to your Edge webdriver executable)
driver = webdriver.Edge()

# URL of the infinitely scrolling website
url = "https://www.ajio.com/s/kidswear-5320-56041"

# Navigate to the URL
driver.get(url)

# Initialize last_height
last_height = 0

# Lists to store scraped data
brands = []
product_names = []
ratings = []
num_ratings = []
prices = []
original_prices = []
discounts = []
offer_prices = []

# Scroll down the page to load additional content
SCROLL_PAUSE_TIME = 20  # Adjust this value based on the website's loading speed

while True:
    # Get the dynamically loaded HTML
    html = driver.page_source

    # Parse the HTML using BeautifulSoup
    soup = BeautifulSoup(html, 'html.parser')

    # Extracting data
    items = soup.find_all(class_='item')

    for item in items:
        brand_elem = item.find(class_='brand')
        product_name_elem = item.find(class_='nameCls')
        rating_elem = item.find(class_='_1N0OO')
        parent_div = item.find(class_='_2QgMK _13cxi _1BsqP _2564l')  # Parent div with specified classes
        num_ratings_elem = parent_div.find_all('p')[1].text.strip() if parent_div else "N/A"
        price_elem = item.find(class_='price')
        original_price_elem = item.find(class_='orginal-price')
        discount_elem = item.find(class_='discount')
        offer_price_elem = item.find(class_='offer-pricess')

        # Append data to lists
        brands.append(brand_elem.text if brand_elem else "N/A")
        product_names.append(product_name_elem.text if product_name_elem else "N/A")
        ratings.append(rating_elem.text if rating_elem else "N/A")
        num_ratings.append(num_ratings_elem)
        prices.append(price_elem.text if price_elem else "N/A")
        original_prices.append(original_price_elem.text if original_price_elem else "N/A")
        discounts.append(discount_elem.text if discount_elem else "N/A")
        offer_prices.append(offer_price_elem.text if offer_price_elem else "N/A")

    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        # If no more content is loaded, exit the loop
        break
    last_height = new_height

# Close the webdriver
driver.quit()

# Create DataFrame
df = pd.DataFrame({
    'Brand': brands,
    'Product Name': product_names,
    'Rating': ratings,
    'Number of Ratings': num_ratings,
    'Price': prices,
    'Original Price': original_prices,
    'Discount': discounts,
    'Offer Price': offer_prices
})

# Display the DataFrame
print(df)


           Brand                                  Product Name Rating  \
0         ALANTA         Typographic Print Pants & T-Shirt Set    3.7   
1           Lofn  Pack Of 3 Graphic Print T-Shirt & Shorts Set    2.8   
2     TIGERTRAIL             Lightly Washed Straight Fit Jeans    3.8   
3       Jashvila                   Boys Novelty Print Suit Set    1.6   
4        Hellcat          Pack of 5 Printed Round-Neck T-Shirt    3.8   
...          ...                                           ...    ...   
9085   RIO GIRLS                    Minnie Mouse Print T-Shirt    4.4   
9086   RIO GIRLS                   Mickey Mouse Print Leggings    4.2   
9087    Gap Kids                Brand Embroidered Polo T-Shirt      5   
9088   KG FRENDZ               Knit Shorts with Insert Pockets    4.1   
9089  NAUTI NATI                     Printed Shirt with Shorts    4.2   

     Number of Ratings Price Original Price    Discount Offer Price  
0              |   508  ₹578         ₹1,699   (66% of

In [34]:
df

Unnamed: 0,Brand,Product Name,Rating,Number of Ratings,Price,Original Price,Discount,Offer Price
0,ALANTA,Typographic Print Pants & T-Shirt Set,3.7,| 508,₹578,"₹1,699",(66% off),₹510
1,Lofn,Pack Of 3 Graphic Print T-Shirt & Shorts Set,2.8,,₹481,"₹1,299",(63% off),₹416
2,TIGERTRAIL,Lightly Washed Straight Fit Jeans,3.8,| 75,₹195,₹649,(70% off),
3,Jashvila,Boys Novelty Print Suit Set,1.6,,₹461,"₹1,299",(65% off),₹390
4,Hellcat,Pack of 5 Printed Round-Neck T-Shirt,3.8,| 503,₹695,"₹6,495",(89% off),
...,...,...,...,...,...,...,...,...
9085,RIO GIRLS,Minnie Mouse Print T-Shirt,4.4,| 16,₹203,₹399,(49% off),₹200
9086,RIO GIRLS,Mickey Mouse Print Leggings,4.2,| 952,₹120,₹399,(70% off),
9087,Gap Kids,Brand Embroidered Polo T-Shirt,5,| 2,₹650,"₹1,299",(50% off),₹624
9088,KG FRENDZ,Knit Shorts with Insert Pockets,4.1,| 791,₹168,₹349,(52% off),₹140


In [35]:
df.duplicated().sum()

7845

In [38]:
df =df.drop_duplicates()

In [39]:
df

Unnamed: 0,Brand,Product Name,Rating,Number of Ratings,Price,Original Price,Discount,Offer Price
0,ALANTA,Typographic Print Pants & T-Shirt Set,3.7,| 508,₹578,"₹1,699",(66% off),₹510
1,Lofn,Pack Of 3 Graphic Print T-Shirt & Shorts Set,2.8,,₹481,"₹1,299",(63% off),₹416
2,TIGERTRAIL,Lightly Washed Straight Fit Jeans,3.8,| 75,₹195,₹649,(70% off),
3,Jashvila,Boys Novelty Print Suit Set,1.6,,₹461,"₹1,299",(65% off),₹390
4,Hellcat,Pack of 5 Printed Round-Neck T-Shirt,3.8,| 503,₹695,"₹6,495",(89% off),
...,...,...,...,...,...,...,...,...
9085,RIO GIRLS,Minnie Mouse Print T-Shirt,4.4,| 16,₹203,₹399,(49% off),₹200
9086,RIO GIRLS,Mickey Mouse Print Leggings,4.2,| 952,₹120,₹399,(70% off),
9087,Gap Kids,Brand Embroidered Polo T-Shirt,5,| 2,₹650,"₹1,299",(50% off),₹624
9088,KG FRENDZ,Knit Shorts with Insert Pockets,4.1,| 791,₹168,₹349,(52% off),₹140
