In [1]:
!pip install playwright nest_asyncio
!playwright install chromium

Collecting playwright
  Downloading playwright-1.57.0-py3-none-manylinux1_x86_64.whl.metadata (3.5 kB)
Collecting pyee<14,>=13 (from playwright)
  Downloading pyee-13.0.0-py3-none-any.whl.metadata (2.9 kB)
Downloading playwright-1.57.0-py3-none-manylinux1_x86_64.whl (46.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 MB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyee-13.0.0-py3-none-any.whl (15 kB)
Installing collected packages: pyee, playwright
Successfully installed playwright-1.57.0 pyee-13.0.0
Downloading Chromium 143.0.7499.4 (playwright build v1200)[2m from https://cdn.playwright.dev/dbazure/download/playwright/builds/chromium/1200/chromium-linux.zip[22m
[1G164.7 MiB [] 0% 339.3s[0K[1G164.7 MiB [] 0% 163.7s[0K[1G164.7 MiB [] 0% 527.5s[0K[1G164.7 MiB [] 0% 440.4s[0K[1G164.7 MiB [] 0% 592.6s[0K[1G164.7 MiB [] 0% 523.6s[0K[1G164.7 MiB [] 0% 475.8s[0K[1G164.7 MiB [] 0% 437.3s[0K[1G164.7 MiB [] 0% 408.6s[0K[1G164.7 

In [5]:
!apt-get install libatk1.0-0 libatk-bridge2.0-0 libatspi2.0-0 libxcomposite1


import asyncio, json, csv
from pathlib import Path
import nest_asyncio
nest_asyncio.apply()
from playwright.async_api import async_playwright



#  3. BASE URL
BASE_URL = "https://webscraper.io/test-sites/e-commerce/static/computers/laptops"

# 4. MAIN SCRAPING FUNCTION
async def scrape_ajax_site():

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)   # Launch browser
        ctx = await browser.new_context()                  # New browser session
        page = await ctx.new_page()                        # New tab

        rows = []                                         # Stores ALL products
        page_no = 1                                       #  Start from page 1

        # 5. PAGE NUMBER LOOP (1 → 20)
        while True:
            url = f"{BASE_URL}?page={page_no}"             #  Build page URL
            print(f"Scraping Page {page_no} → {url}")
            await page.goto(url, timeout=60000)            #  Open that page
            try:
                await page.wait_for_selector(".thumbnail", timeout=10000)
                #  Wait for product cards
            except:
                print(" No more pages left. Stopping.")
                break                                     #  Stop when no products found
            cards = await page.query_selector_all(".thumbnail")

            if not cards:                                 # Safety stop
                print(" Last page reached.")
                break
            #  Extract products from the CURRENT page
            for card in cards:

                title_el = await card.query_selector(".title")
                title = (await title_el.text_content()).strip() if title_el else None

                url = await title_el.get_attribute("href") if title_el else None

                price_el = await card.query_selector(".price")
                price = (await price_el.text_content()).strip() if price_el else None

                stars = await card.query_selector_all(".ratings .glyphicon-star")
                rating = len(stars) if stars else 0

                img_el = await card.query_selector("img")
                img_src = await img_el.get_attribute("src") if img_el else None

                rows.append({
                    "title": title,
                    "price": price,
                    "rating_stars": rating,
                    "product_url": url,
                    "image_url": img_src,
                    "page_no": page_no
                })

            page_no += 1                                   # Go to next page number

        await browser.close()
        return rows

#  6. RUN SCRAPER

data = asyncio.get_event_loop().run_until_complete(scrape_ajax_site())
print(f" Collected {len(data)} total products")


# 7. SAVE OUTPUT FILES

Path("ioutput").mkdir(exist_ok=True)                       #  Create output folder

csv_path = Path("ioutput/products_all_ajax.csv")
json_path = Path("ioutput/products_all_ajax.json")

# Save CSV
with open(csv_path, "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(f, fieldnames=data[0].keys())
    writer.writeheader()
    writer.writerows(data)

# Save JSON
with open(json_path, "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

print(f"Saved CSV → {csv_path}")
print(f"Saved JSON → {json_path}")


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  at-spi2-core gsettings-desktop-schemas libatk1.0-data libxtst6
  session-migration
The following NEW packages will be installed:
  at-spi2-core gsettings-desktop-schemas libatk-bridge2.0-0 libatk1.0-0
  libatk1.0-data libatspi2.0-0 libxcomposite1 libxtst6 session-migration
0 upgraded, 9 newly installed, 0 to remove and 1 not upgraded.
Need to get 318 kB of archives.
After this operation, 1,497 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 libatspi2.0-0 amd64 2.44.0-3 [80.9 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 libxtst6 amd64 2:1.2.3-1build4 [13.4 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/main amd64 session-migration amd64 0.3.6 [9,774 B]
Get:4 http://archive.ubuntu.com/ubuntu jammy/main amd64 gsettings-desktop-schemas all 42.0-1ubuntu1 [31.1 kB]
Get:5 http:

**Observation**

1. The Playwright library was successfully configured in Google Colab by installing the required system dependencies (libatk, libxcomposite, etc.), which allowed Chromium to run in headless mode without graphical errors.

2. The asynchronous scraping approach enabled efficient navigation across multiple pages of the e-commerce website without reloading the browser repeatedly.

3. Pagination was handled dynamically using a while loop, which continued scraping until no product elements (.thumbnail) were detected, ensuring that all available product pages were covered automatically.

4. For each page, the scraper accurately extracted key product
details:

  -Product title

  -Price

  -Rating (number of stars)

  -Product URL

  -Image URL

  -Page number

5. Data from all pages was collected into a single list, confirming that no product data was overwritten or duplicated during pagination.

6. The scraper successfully stored the extracted data into two structured formats:

*   CSV file for spreadsheet-based analysis
*   JSON file for programmatic and API-based usage


7. The use of exception handling during page loading prevented runtime crashes and allowed the scraper to terminate gracefully once the last page was reached.

8. The overall execution confirmed that Playwright can reliably scrape dynamically paginated websites in a controlled testing environment.

**Conclusion**

The web scraping system was successfully implemented using Playwright with asynchronous Python programming to extract product data from a dynamic e-commerce website. The scraper efficiently navigated through multiple pages, identified product elements, and captured relevant information with high accuracy.

By exporting the scraped data into both CSV and JSON formats, the system ensures compatibility with data analysis tools, databases, and machine learning pipelines. The implementation demonstrates that Playwright is a robust and scalable solution for scraping JavaScript-rendered websites where traditional HTTP-based scraping tools may fail.

Overall, the project validates the effectiveness of asynchronous browser automation for real-world data extraction tasks and provides a strong foundation for extending the scraper to include additional product attributes, categories, or automated data processing workflows.
