<div align="center">
<img src="https://poorit.in/image.png" alt="Poorit" width="40" style="vertical-align: middle;"> <b>AI SYSTEMS ENGINEERING 1</b>

## Unit 1 Exercises: Web Scraping and Summarization

**CV Raman Global University, Bhubaneswar**  
*AI Center of Excellence*

</div>

---

Complete the exercises below using the helper functions and setup provided. Each question has one or more empty code cells for your solution.

## Setup

Run the cells below to install packages, import libraries, and define helper functions.

In [None]:
# Install required packages
!pip install -q openai requests beautifulsoup4

In [None]:
# Import required libraries
import os
from openai import OpenAI
from bs4 import BeautifulSoup
import requests
from IPython.display import Markdown, display

In [None]:
# Configure Gemini API Key
from google.colab import userdata
from getpass import getpass
 
GEMINI_API_KEY = getpass("Enter your Gemini API Key: ")

GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"

client = OpenAI(
    base_url=GEMINI_BASE_URL,
    api_key=GEMINI_API_KEY
)

MODEL = "gemini-2.0-flash-lite"
print(f"Gemini configured with model: {MODEL}")

In [None]:
# Helper functions for web scraping

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}


def fetch_website_contents(url, max_chars=2000):
    """
    Fetch and return the title and text content of a website.
    Removes scripts, styles, and other non-text elements.
    """
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.content, "html.parser")

    title = soup.title.string if soup.title else "No title found"

    if soup.body:
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        text = soup.body.get_text(separator="\n", strip=True)
    else:
        text = ""

    return (title + "\n\n" + text)[:max_chars]


def fetch_website_links(url):
    """
    Return all links found on a webpage.
    """
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.content, "html.parser")
    links = [link.get("href") for link in soup.find_all("a")]
    return [link for link in links if link]

---

## Q1: Summarize Your College Website

Using `fetch_website_contents()` and the Gemini API, scrape and summarize the CV Raman Global University website (https://cvraman.ac.in).

Write a **system prompt** that produces a summary **targeted at prospective students** — it should highlight programs offered, campus highlights, and why a student might want to apply.

**Steps:**
1. Fetch the website contents using `fetch_website_contents()`
2. Write a system prompt tailored for prospective students
3. Call the API and display the summary

In [None]:
# Step 1: Fetch the website contents
url = "https://cvraman.ac.in"
content = fetch_website_contents(___)

# Step 2: Write a system prompt for prospective students
system_prompt = """
___
"""

# Step 3: Create the messages list and call the API
messages = [
    {"role": "system", "content": ___},
    {"role": "user", "content": "Here are the contents of a website. Summarize it.\n\n" + ___}
]

response = client.chat.completions.create(model=MODEL, messages=___)

# Step 4: Display the result
display(Markdown(response.choices[0].message.content))

---

## Q2: Link Explorer

Using `fetch_website_links()`, extract all links from https://cvraman.ac.in. Then pick **any two** of those links, fetch their contents with `fetch_website_contents()`, and summarize each one.

**Steps:**
1. Get all links from the website
2. Print the list of links (or a subset) so you can choose two
3. Fetch the contents of your two chosen links
4. Summarize each one using the API and print both summaries

In [None]:
# Step 1: Get all links from the website
links = fetch_website_links(___)

# Print first 20 links to browse through them
for i, link in enumerate(links[:20]):
    print(f"{i}: {link}")

In [None]:
# Step 2: Pick two links from the list above and summarize each

# Replace these with actual links you picked from the output above
link_1 = "___"
link_2 = "___"

for link in [link_1, link_2]:
    print(f"\n--- Summarizing: {link} ---")
    # Fetch contents
    content = fetch_website_contents(___)
    # Create messages and call API
    messages = [
        {"role": "system", "content": "You are a helpful assistant that summarizes website content."},
        {"role": "user", "content": "Summarize this website content:\n\n" + ___}
    ]
    response = client.chat.completions.create(model=___, messages=___)
    print(response.choices[0].message.content)

---

## Q3: Tone Control with System Prompts

Write **three different system prompts** and call the API three times with the **same user message** (e.g., "What is machine learning?"). Your system prompts should produce:

1. A response suitable for a **10-year-old child**
2. A **technical response** for an engineering student
3. A response in **exactly 2 sentences**

Display all three outputs.

In [None]:
# The same user message for all three calls
user_message = "What is machine learning?"

# Three different system prompts
system_prompts = [
    "___",  # 1. For a 10-year-old child
    "___",  # 2. Technical response for engineering student
    "___",  # 3. Respond in exactly 2 sentences
]

labels = ["For a 10-year-old", "For an engineering student", "In 2 sentences"]

for label, system_prompt in zip(labels, system_prompts):
    messages = [
        {"role": "system", "content": ___},
        {"role": "user", "content": ___}
    ]
    response = client.chat.completions.create(model=___, messages=___)
    print(f"\n--- {label} ---")
    print(response.choices[0].message.content)

**Written Response:** In 2–3 sentences, explain how the system prompt changed the model's behavior.

*Your answer here:*



---

**Course Information:**
- **Institution:** CV Raman Global University, Bhubaneswar
- **Program:** AI Center of Excellence
- **Course:** AI Systems Engineering 1
- **Developed by:** [Poorit Technologies](https://poorit.in) - *Transform Graduates into Industry-Ready Professionals*

---