Write a Python program to scrape all available books from the website (https://books.toscrape.com/) Books to Scrape – a live site built for practicing scraping (safe, legal, no anti-bot). For each book, extract the following details:

1. Title
2. Price
3. Availability (In stock / Out of stock)
4. Star Rating (One, Two, Three, Four, Five)

Store the scraped results into a Pandas DataFrame and export them to a CSV file named books.csv.

In [2]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [3]:
BASE_URL = "http://books.toscrape.com/catalogue/"
TOTAL_PAGES = 50

def scrape_page(page_no):
    books_list = []
    url = f"{BASE_URL}page-{page_no}.html"
    res = requests.get(url)
    soup = BeautifulSoup(res.text, "html.parser")

    items = soup.find_all("article", class_="product_pod")
    if len(items) == 0:
        return []

    for item in items:
        t = item.h3.a["title"] if item.h3 and item.h3.a else None
        p = item.find("p", class_="price_color")
        price = p.get_text(strip=True) if p else None

        stock = item.find("p", class_="instock availability")
        avail = stock.get_text(strip=True) if stock else None

        rating_tag = item.find("p", class_="star-rating")
        rating = rating_tag["class"][1] if rating_tag and len(rating_tag["class"]) > 1 else None

        books_list.append({
            "Title": t or "NaN",
            "Price": price or "NaN",
            "Availability": avail or "NaN",
            "Star Rating": rating or "NaN"
        })
    return books_list


In [6]:
all_data = []
for page in range(1, TOTAL_PAGES + 1):
    all_data.extend(scrape_page(page))

df = pd.DataFrame(all_data)

if not df.empty:
    df.to_csv('book_data.csv')

df

Unnamed: 0,Title,Price,Availability,Star Rating
0,A Light in the Attic,Â£51.77,In stock,Three
1,Tipping the Velvet,Â£53.74,In stock,One
2,Soumission,Â£50.10,In stock,One
3,Sharp Objects,Â£47.82,In stock,Four
4,Sapiens: A Brief History of Humankind,Â£54.23,In stock,Five
...,...,...,...,...
995,Alice in Wonderland (Alice's Adventures in Won...,Â£55.53,In stock,One
996,"Ajin: Demi-Human, Volume 1 (Ajin: Demi-Human #1)",Â£57.06,In stock,Four
997,A Spy's Devotion (The Regency Spies of London #1),Â£16.97,In stock,Five
998,1st to Die (Women's Murder Club #1),Â£53.98,In stock,One


Write a Python program to scrape the IMDB Top 250 Movies list
(https://www.imdb.com/chart/top/) . For each movie, extract the following details:

1. Rank (1–250)
2. Movie Title
3. Year of Release
4. IMDB Rating

Store the results in a Pandas DataFrame and export it to a CSV file named imdb_top250.csv.

In [10]:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36")

driver = webdriver.Chrome(options=options)
driver.get("https://www.imdb.com/chart/top/")

wait = WebDriverWait(driver, 20)
list_container = wait.until(
    EC.presence_of_element_located((By.CSS_SELECTOR, "ul.ipc-metadata-list"))
)
movies = list_container.find_elements(By.TAG_NAME, "li")

movies_data = []
for movie_item in movies:
    title_text = movie_item.find_element(By.CSS_SELECTOR, "h3.ipc-title__text").text
    rank_str, title = title_text.split(". ", 1)

    metadata_items = movie_item.find_elements(By.CSS_SELECTOR, "span.cli-title-metadata-item")
    year_str = metadata_items[0].text

    rating_str = movie_item.find_element(By.CSS_SELECTOR, "span.ipc-rating-star").text.split("\n")[0]

    movies_data.append({
        "Rank": int(rank_str),
        "Movie Title": title,
        "Year of Release": int(year_str),
        "IMDB Rating": float(rating_str)
    })

driver.quit()

df = pd.DataFrame(movies_data)
df = df.sort_values(by="Rank").reset_index(drop=True)
df.to_csv("imdb_top250.csv", index=False, encoding='utf-8')

Write a Python program to scrape the weather information for top world cities from the
given website (https://www.timeanddate.com/weather/) . For each city, extract the following
details:

1. City Name
2. Temperature
3. Weather Condition (e.g., Clear, Cloudy, Rainy, etc.)

Store the results in a Pandas DataFrame and export it to a CSV file named weather.csv.

In [11]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

def parse_temp_to_float(temp_str):
    t = temp_str.replace("°C", "").replace("°F", "").strip()
    t = t.replace("\u00a0", "").replace("\xa0", "")
    return float(t)

url = "https://www.timeanddate.com/weather/"
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")

cells = soup.find_all("td")
data = []
city_name, condition = None, ""

for idx, cell in enumerate(cells):
    if cell.find("a"):
        city_name = cell.get_text(strip=True)
        condition = ""

        for step in range(1, 3):
            if idx + step < len(cells):
                img_tag = cells[idx + step].find("img")
                if img_tag and img_tag.get("alt"):
                    condition = img_tag["alt"]
                    break

    elif "rbi" in cell.get("class", []) and city_name:
        temp_val = cell.get_text(strip=True)
        try:
            temp_float = parse_temp_to_float(temp_val)
        except Exception as e:
            temp_float = None

        data.append({
            "City": city_name,
            "Temperature": temp_float,
            "Condition": condition
        })
        city_name, condition = None, ""

df_weather = pd.DataFrame(data)
df_weather.to_csv("weather.csv", index=False)
