### Task Description:
    You are tasked to perform web scraping on a provided HTML page that contains different types of elements.
    The goal is to extract spexcific data from the page and process it into structured formats 
    such as CSV or JSON.

In [1]:
import requests
from bs4 import BeautifulSoup
import csv
import json

In [3]:
url = "https://www.baraasallout.com/test.html"
print('URL of the webpage to scrape is  : ' , url)

URL of the webpage to scrape is  :  https://www.baraasallout.com/test.html


In [5]:
# Fetch the webpage
response = requests.get(url)
response

<Response [200]>

In [6]:
soup = BeautifulSoup(response.content, 'html.parser')
soup

<!DOCTYPE html>

<html>
<head>
<title>Web Scraping Task with Form</title>
<style>
        body {
            font-family: Arial, sans-serif;
            margin: 20px;
            background-color: #f5f5f5;
        }

        h1 {
            color: darkred;
            text-align: center;
        }

        h2 {
            color: darkblue;
            font-style: italic;
        }

        p {
            color: #555;
            font-size: 14px;
        }

        img {
            width: 250px;
            height: auto;
            border-radius: 10px;
        }

        table {
            width: 100%;
            border-collapse: collapse;
            margin: 20px 0;
        }

        table, th, td {
            border: 1px solid #ccc;
        }

        th {
            background-color: #333;
            color: white;
            padding: 10px;
        }

        td {
            text-align: center;
            padding: 10px;
        }

        .btn {
            background-col

     1. Extract Text Data: ()
     ● Extract all headings (<h1>, <h2>).
     ● Extract all text content inside <p> and <li> tags
     ● Savethis data into a Extract_Text_Data.CSV file.
https://www.pythontutorial.net/python-basics/python-write-csv-file

In [8]:
# 1. Extract Text Data
headings = [tag.text for tag in soup.find_all(['h1', 'h2'])]
headings

['Web Scraping Practice',
 'Available Products',
 'Product Table',
 'Watch This Video',
 'Contact Us',
 'Product Information',
 'Featured Products']

In [9]:
paragraphs = [tag.text for tag in soup.find_all('p')]
paragraphs

['Welcome to the web scraping task! Use your skills to extract the required data from this page.',
 'Sharp Objects',
 '£47.82',
 '✔ In stock',
 'In a Dark, Dark Wood',
 '£19.63',
 '✔ In stock',
 'The Past Never Ends',
 '£56.50',
 '✔ In stock',
 'A Murder in Time',
 '£16.64',
 ' Out stock',
 'Wireless Headphones',
 '$49.99',
 'Available colors: Black, White, Blue',
 'Smart Speaker',
 '$89.99',
 'Available colors: Grey, Black',
 'Smart Watch',
 '$149.99',
 'Available colors: Black, Silver, Gold',
 '© 2024 Web Scraping Practice. All Rights Reserved.']

In [10]:
list_items = [tag.text for tag in soup.find_all('li')]
list_items

['Laptop', 'Smartphone', 'Tablet', 'Smartwatch']

In [11]:
with open('Extract_Text_Data.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['Type', 'Content'])
    for heading in headings:
        writer.writerow(['Heading', heading])
    for paragraph in paragraphs:
        writer.writerow(['Paragraph', paragraph])
    for item in list_items:
        writer.writerow(['List Item', item])

    2. Extract Table Data:
     ● Extract data from the table, including:
     ● Product Name.
     ● Price.
     ● StockStatus.
     ● Savethis data into a Extract_Table_Data.CSV file.
https://www.pythontutorial.net/python-basics/python-write-csv-file/

In [12]:
# 2. Extract Table Data
table = soup.find('table')
table

<table>
<tr>
<th>Product</th>
<th>Price</th>
<th>In Stock</th>
</tr>
<tr>
<td>Laptop</td>
<td>$1000</td>
<td>Yes</td>
</tr>
<tr>
<td>Smartphone</td>
<td>$800</td>
<td>No</td>
</tr>
<tr>
<td>Tablet</td>
<td>$500</td>
<td>Yes</td>
</tr>
</table>

In [13]:
rows = table.find_all('tr')
rows

[<tr>
 <th>Product</th>
 <th>Price</th>
 <th>In Stock</th>
 </tr>,
 <tr>
 <td>Laptop</td>
 <td>$1000</td>
 <td>Yes</td>
 </tr>,
 <tr>
 <td>Smartphone</td>
 <td>$800</td>
 <td>No</td>
 </tr>,
 <tr>
 <td>Tablet</td>
 <td>$500</td>
 <td>Yes</td>
 </tr>]

In [15]:
with open('Extract_Table_Data.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['Product Name', 'Price', 'Stock Status'])
    for row in rows[1:]:  # Skip header row
        cols = row.find_all('td')
        writer.writerow([col.text.strip() for col in cols])

     3. Extract Product Information (Cards Section):
     ● Extract data from the book cards at the bottom of the page, including:
     ● BookTitle.
     ● Price.
     ● StockAvailability.
     ● Button text (e.g., "Add to basket").
     ● Savethedata into a Product_Information.JSON file.
https://www.geeksforgeeks.org/how-to-convert-python-dictionary-to-json/

In [21]:
# 3. Extract Product Information
cards = soup.find_all('div', class_='card')

In [36]:
product_data = []
for card in cards:
    title = card.find('h3').text
    price = card.find('span', class_='price').text
    availability = card.find('span', class_='availability').text
    button_text = card.find('button').text

    product_data.append({
        'Book Title': title,
        'Price': price,
        'Stock Availability': availability,
        'Button Text': button_text})

In [23]:
with open('Product_Information.json', 'w', encoding='utf-8') as file:
    json.dump(product_data, file, ensure_ascii=False, indent=4)

     4. Extract Form Details:
     ● Extract all input fields from the form, including:
     ● Field name (e.g., username, password).
     ● Input type (e.g., text, password, checkbox, etc.).
     ● Default values, if any.
     ● Savethedata into a JSON file.
https://www.geeksforgeeks.org/how-to-convert-python-dictionary-to-json

In [24]:
# 4. Extract Form Details
form = soup.find('form')
form

<form>
<label for="username">Username:</label>
<input id="username" name="username" placeholder="Enter your username" type="text"/>
<label for="password">Password:</label>
<input id="password" name="password" placeholder="Enter your password" type="password"/>
<label for="options">Choose an option:</label>
<select id="options" name="options">
<option value="option1">Option 1</option>
<option value="option2">Option 2</option>
<option value="option3">Option 3</option>
</select>
<label>
<input name="terms" type="checkbox"/> I agree to the terms and conditions
            </label>
<input type="submit" value="Submit"/>
</form>

In [25]:
inputs = form.find_all('input')
inputs

[<input id="username" name="username" placeholder="Enter your username" type="text"/>,
 <input id="password" name="password" placeholder="Enter your password" type="password"/>,
 <input name="terms" type="checkbox"/>,
 <input type="submit" value="Submit"/>]

In [26]:
form_data = []
for input_tag in inputs:
    field_name = input_tag.get('name')
    input_type = input_tag.get('type')
    default_value = input_tag.get('value', '')

    form_data.append({
        'Field Name': field_name,
        'Input Type': input_type,
        'Default Value': default_value
    })

In [27]:
with open('Form_Details.json', 'w', encoding='utf-8') as file:
    json.dump(form_data, file, ensure_ascii=False, indent=4)

     5. Extract Links and Multimedia:
     ● Extract the hyperlink (<a> tag) and its href value.
     ● Extract the video link from the <iframe> tag.
     ● Savethedata into a JSON file.
https://www.geeksforgeeks.org/how-to-convert-python-dictionary-to-json/

In [28]:
# 5. Extract Links and Multimedia
links = [{'text': a.text, 'href': a.get('href')} for a in soup.find_all('a')]
links

[]

In [29]:
iframes = [{'video_src': iframe.get('src')} for iframe in soup.find_all('iframe')]
iframes

[{'video_src': 'https://www.youtube.com/watch?v=ujf9RNuBdCU'}]

In [30]:
with open('Links_and_Multimedia.json', 'w', encoding='utf-8') as file:
    json.dump({'Links': links, 'Videos': iframes}, file, ensure_ascii=False, indent=4)

     6. Scraping Challenge:
     Students must write a script to extract data from the Featured Products section with the following
     requirements:
     ● Product Name: Located within <span class="name">.
     ● HiddenPrice: Located within <span class="price">, which has style="display: none;".
     ● Available Colors: Located within <span class="colors">.
     ● Product ID: The value stored in the data-id attribute.
     ● ExampleOutput:
     [
     {'id': '101', 'name': 'Wireless Headphones', 'price': '$49.99', 'colors': 'Black, White, Blue'},
    {'id': '102', 'name': 'Smart Speaker', 'price': '$89.99', 'colors': 'Grey, Black'},
     {'id': '103', 'name': 'Smart Watch', 'price': '$149.99', 'colors': 'Black, Silver, Gold'}
     ]

In [31]:
# 6. Scraping Challenge
featured_products = []

In [32]:
products = soup.find_all('div', class_='featured-product')
products

[]

In [33]:
for product in products:
    product_id = product.get('data-id')
    name = product.find('span', class_='name').text
    price = product.find('span', class_='price').text
    colors = product.find('span', class_='colors').text

    featured_products.append({
        'id': product_id,
        'name': name,
        'price': price.strip(),
        'colors': colors.strip()
    })

In [34]:
with open('Featured_Products.json', 'w', encoding='utf-8') as file:
    json.dump(featured_products, file, ensure_ascii=False, indent=4)