# Lab 5: Web Scraping & API Interaction

## Objective
To extract data from web pages using web scraping techniques and to interact with external web services using REST and SOAP APIs. This lab demonstrates HTML parsing, RESTful communication using GET and POST methods, and SOAP-based service interaction.

## Theory
Web Scraping involves programmatically fetching and parsing HTML content from websites. Python libraries such as `requests` and `BeautifulSoup` are widely used for this purpose. REST APIs allow communication between client and server using HTTP methods and usually exchange data in JSON format. SOAP APIs use XML-based messaging and WSDL definitions, commonly used in enterprise systems.

## 1. Import Required Libraries

In [None]:
import requests
import sys
from bs4 import BeautifulSoup

## 2. Define Target URL and Headers (Quotes Website)

In [None]:
TARGET_URL = "https://quotes.toscrape.com"
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
    'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get(TARGET_URL, headers=headers)
html_content = response.text

## 3. Initialize BeautifulSoup and Extract Page Title

In [None]:
soap = BeautifulSoup(html_content, 'lxml')
page_title = soap.find('title').text
print("Page Title:", page_title)

## 4. Quote Scraping Using Class Names

In [None]:
quotes_data = []
quote_divs = soap.find_all('div', class_='quote')

for quote_div in quote_divs:
    text_element = quote_div.find('span', class_='text')
    quote_text = text_element.text if text_element else "N/A"

    author_element = quote_div.find('small', class_='author')
    author_name = author_element.text if author_element else "N/A"

    tag_list = []
    tags_div = quote_div.find('div', class_='tags')
    if tags_div:
        for tag_item in tags_div.find_all('a', class_='tag'):
            tag_list.append(tag_item.text)

    quotes_data.append({
        'quote': quote_text,
        'author': author_name,
        'tags': tag_list
    })

## 5. Display Extracted Quotes

In [None]:
print("\n--- Extracted Quotes (First 3) ---")
for item in quotes_data[:3]:
    print(f"Author: {item['author']}\nQuote: {item['quote']}\nTags: {', '.join(item['tags'])}\n")

## 6. Quote Scraping Using CSS Selectors

In [None]:
quote_texts = soap.select('div.quote span.text')
print(f"Found {len(quote_texts)} quotes using CSS selector.")

## 7. REST API (GET Request)

In [None]:
api_url = "https://jsonplaceholder.typicode.com/todos"
response = requests.get(api_url)
if response.status_code == 200:
    data = response.json()
    for item in data[:5]:
        print(item)
else:
    print("API request failed")

## 8. REST API (POST Request)

In [None]:
post_data = {
    "userId": 1,
    "title": "yoo",
    "completed": False
}
post_response = requests.post(api_url, json=post_data)
print("POST Status Code:", post_response.status_code)
print("POST Response:", post_response.text)

## 9. SOAP API Interaction

In [None]:
from zeep import Client
wsdl_url = "http://www.dneonline.com/calculator.asmx?WSDL"
client = Client(wsdl=wsdl_url)
result = client.service.Add(intA=10, intB=20)
print("SOAP Add Operation Result:", result)

## Discussion
This experiment successfully demonstrated web scraping and API interaction using Python. Quotes were extracted using both HTML class-based searching and CSS selectors. REST API communication was performed using GET and POST requests with JSON data. SOAP API interaction was implemented using a WSDL-based calculator service. Errors related to missing HTML content, incorrect selectors, and POST data transmission were identified and resolved.

## Conclusion
The lab achieved its objectives by demonstrating practical techniques for web scraping, RESTful communication, and SOAP-based service interaction. These skills are essential for modern web data extraction and system integration.