## Scraping data from multiple pages

Scraping data from multiple pages involves iterating through each page's URL and extracting the desired information.

### 🧠 Basic Steps
Inspect the pagination pattern on the website.

Use a loop to go through each page.

Parse the HTML using BeautifulSoup.

Extract the data from each page.

Store or process the data.

In [1]:
import requests
from bs4 import BeautifulSoup

base_url = 'https://example.com/products?page='
all_data = []

# Define how many pages to scrape
for page in range(1, 6):  # Scrape pages 1 to 5
    url = base_url + str(page)
    print(f"Scraping: {url}")
    
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Example: Extract product names from <h2 class="product-title">
    products = soup.find_all('h2', class_='product-title')
    
    for product in products:
        title = product.text.strip()
        all_data.append(title)

print("Scraped Products:")
for item in all_data:
    print(item)


Scraping: https://example.com/products?page=1
Scraping: https://example.com/products?page=2
Scraping: https://example.com/products?page=3
Scraping: https://example.com/products?page=4
Scraping: https://example.com/products?page=5
Scraped Products:


## Example: Stop When No More Items Are Found

In [2]:
import requests
from bs4 import BeautifulSoup

base_url = 'https://example.com/products?page='
all_data = []
page = 1

while True:
    url = base_url + str(page)
    print(f"Scraping: {url}")
    
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find all product entries
    products = soup.find_all('h2', class_='product-title')
    
    if not products:
        print("No more products found. Stopping.")
        break  # Exit the loop if no products found
    
    for product in products:
        title = product.text.strip()
        all_data.append(title)
    
    page += 1  # Move to next page

print(f"\nTotal products scraped: {len(all_data)}")


Scraping: https://example.com/products?page=1
No more products found. Stopping.

Total products scraped: 0


### Other conditions might be used

next_button = soup.find('a', class_='next')
if not next_button:
    break


if page_data == previous_data:
    break



In [5]:
import requests
from bs4 import BeautifulSoup

# Base URL for a category page on Open Food Facts (e.g., "snacks")
base_url = "https://www.flipkart.com/search?q=mobile&page"
page = 1
all_products = []

while True:
    # Construct the URL for the current page (e.g., /snacks/1, /snacks/2, etc.)
    url = base_url + str(page)
    print(f"Scraping: {url}")
    
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    
    # Find the container for each product on the page
    # (change "div" and "product_item" to the real tag and class used on the website)
    products = soup.find_all("div", class_="_5M58Mb")
    
    # If no products are found, stop the loop
    if not products:
        print("No more products found. Stopping.")
        break
        
    for product in products:
        # Extract the product name (change tag and class as needed)
        name_tag = product.find("a", class_="product_name")
        product_name = name_tag.text.strip() if name_tag else "No Name Found"
        
        # Extract nutrition info (change tag and class as needed)
        nutrition_tag = product.find("span", class_="nutrition")
        nutrition_info = nutrition_tag.text.strip() if nutrition_tag else "No nutrition info"
        
        # Save the product's data
        all_products.append({
            "name": product_name,
            "nutrition": nutrition_info
        })
    
    # Move to the next page
    page += 1

# Print all the products collected
print("Scraped Products:")
for p in all_products:
    print(p)


Scraping: https://www.flipkart.com/search?q=mobile&page1
No more products found. Stopping.
Scraped Products:
