<a href="https://colab.research.google.com/github/Saisuman55/E_commerce_website_minor_project/blob/main/01_ai_systems_engineering_1_unit1_web_scraping_exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div align="center">
<img src="https://poorit.in/image.png" alt="Poorit" width="40" style="vertical-align: middle;"> <b>AI SYSTEMS ENGINEERING 1</b>

## Unit 1 Exercises: Web Scraping and Summarization

**CV Raman Global University, Bhubaneswar**  
*AI Center of Excellence*

</div>

---

Complete the exercises below using the helper functions and setup provided. Each question has one or more empty code cells for your solution.

## Setup

Run the cells below to install packages, import libraries, and define helper functions.

In [None]:
# Install required packages
!pip install -q openai requests beautifulsoup4

In [None]:
# Import required libraries
import os
from openai import OpenAI
from bs4 import BeautifulSoup
import requests
from IPython.display import Markdown, display

In [None]:
# Configure Gemini API Key
from google.colab import userdata
from getpass import getpass

GEMINI_API_KEY = getpass("Enter your Gemini API Key: ")

GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"

client = OpenAI(
    api_key=GEMINI_API_KEY
)

MODEL = "gpt-4o-mini"
print(f"Gemini configured with model: {MODEL}")

Enter your Gemini API Key: ··········
Gemini configured with model: gpt-4o-mini


In [None]:
# Helper functions for web scraping

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}


def fetch_website_contents(url, max_chars=2000):
    """
    Fetch and return the title and text content of a website.
    Removes scripts, styles, and other non-text elements.
    """
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.content, "html.parser")

    title = soup.title.string if soup.title else "No title found"

    if soup.body:
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        text = soup.body.get_text(separator="\n", strip=True)
    else:
        text = ""

    return (title + "\n\n" + text)[:max_chars]


def fetch_website_links(url):
    """
    Return all links found on a webpage.
    """
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.content, "html.parser")
    links = [link.get("href") for link in soup.find_all("a")]
    return [link for link in links if link]

---

## Q1: Summarize Your College Website

Using `fetch_website_contents()` and the Gemini API, scrape and summarize the CV Raman Global University website (https://cvraman.ac.in).

Write a **system prompt** that produces a summary **targeted at prospective students** — it should highlight programs offered, campus highlights, and why a student might want to apply.

**Steps:**
1. Fetch the website contents using `fetch_website_contents()`
2. Write a system prompt tailored for prospective students
3. Call the API and display the summary

In [None]:
# Step 1: Fetch the website contents
url = "https://cgu-odisha.ac.in/"
content = fetch_website_contents(url)

# Step 2: Write a system prompt for prospective students
system_prompt = """
You are a helpful AI assistant tasked with summarizing website content for prospective students.
Highlight the academic programs offered, campus facilities, student life, and key reasons why a student would want to apply to this university.
Focus on positive aspects and opportunities available.
"""

# Step 3: Create the messages list and call the API
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Here are the contents of a website. Summarize it." + content}
]

response = client.chat.completions.create(model=MODEL, messages=messages)

# Step 4: Display the result
display(Markdown(response.choices[0].message.content))

**Overview of CGU-Odisha**

CGU-Odisha is a leading private university located in Bhubaneswar, renowned for its commitment to innovation and quality education. It offers a diverse range of academic programs, advanced campus facilities, vibrant student life, and numerous opportunities for personal and professional growth.

**Academic Programs Offered:**
The university features a broad spectrum of academic disciplines, emphasizing engineering, technology, pharmacy, and research. Upcoming conferences, such as the IEEE International Conference on Intelligent Computing and various national conferences, highlight the institution's focus on cutting-edge subjects and collaborative opportunities within these fields. 

**Campus Facilities:**
CGU boasts modern infrastructure and a purpose-built campus that supports a conducive learning environment. Key facilities are designed to enhance student experiences, including well-equipped laboratories, libraries, and spaces for innovation and research.

**Student Life:**
Life at CGU is dynamic, with numerous clubs and committees fostering a sense of community. The university promotes inclusiveness through initiatives like the Equal Opportunity Centre and SC/ST Cell, ensuring that all students have access to support and resources. Various extracurricular activities complement academic life, creating a holistic educational experience.

**Reasons to Apply:**
1. **Innovative Learning Environment:** With a focus on practical skills and innovation, students can look forward to engaging in real-world applications.
2. **Diverse Opportunities:** Access to national and international conferences and webinars enriches academic and professional knowledge, providing networking opportunities.
3. **Supportive Campus Community:** A well-established support system, including grievance redressal committees and clubs, ensures a safe and inclusive campus culture.
4. **Career Readiness:** The university’s strong emphasis on training and skill development, such as through Bajaj Engineering Skills Training, equips students for career success.

Prospective students are encouraged to join CGU-Odisha to take advantage of these opportunities and become part of a vibrant academic community dedicated to fostering the leaders of tomorrow.

---

## Q2: Link Explorer

Using `fetch_website_links()`, extract all links from https://cvraman.ac.in. Then pick **any two** of those links, fetch their contents with `fetch_website_contents()`, and summarize each one.

**Steps:**
1. Get all links from the website
2. Print the list of links (or a subset) so you can choose two
3. Fetch the contents of your two chosen links
4. Summarize each one using the API and print both summaries

In [None]:
# Step 1: Get all links from the website
links = fetch_website_links("https://cgu-odisha.ac.in/")

# Print first 20 links to browse through them
for i, link in enumerate(links[:20]):
    print(f"{i}: {link}")

0: https://cgu-odisha.ac.in/wp-content/uploads/2025/12/Notice-for-Registration-for-6th-Convocation-of-CGU-Odisha.pdf
1: https://cgu-odisha.ac.in/wp-content/uploads/2025/12/Topper-list-of-students-for-Gold-Medal-in-6th-Convocation.pdf
2: https://ic-icns.in/
3: http://www.actsoft.org/esda2025/
4: https://ic2ns2.in/
5: https://cgu-odisha.ac.in/wp-content/uploads/2025/07/Conference-Pharmacy-Brochure.pdf
6: https://cgu-odisha.ac.in/bajaj-engineering-skills-training/
7: https://cgu-odisha.ac.in/CPOG-2025/
8: https://cgu-odisha.ac.in/6th_convocation/
9: https://cgu-odisha.ac.in/wp-content/uploads/2025/06/CGU-Public-self-disclosure-final_.pdf
10: https://cgu-odisha.ac.in/online-fee-payment/
11: https://cgu-odisha.ac.in/CVRCEVIRTUALTOURWEB/CVRCEVIRTUALTOURWEB/
12: https://admission.cgu-odisha.ac.in/
13: https://cgu-odisha.ac.in
14: https://cgu-odisha.ac.in/
15: #
16: https://cgu-odisha.ac.in/inventing-the-future-with-learning/
17: https://cgu-odisha.ac.in/our-vision-mission/
18: https://cgu-odi

In [None]:
# Step 2: Pick two links from the list above and summarize each

# Replace these with actual links you picked from the output above
link_1 = "https://cgu-odisha.ac.in/bajaj-engineering-skills-training/"
link_2 = "https://cgu-odisha.ac.in/inventing-the-future-with-learning/"

for link in [link_1, link_2]:
    print(f"\n--- Summarizing: {link} ---")
    # Fetch contents
    content = fetch_website_contents(link)
    # Create messages and call API
    messages = [
        {"role": "system", "content": "You are a helpful assistant that summarizes website content."},
        {"role": "user", "content": "Summarize this website content:\n\n" + content}
    ]
    response = client.chat.completions.create(model=MODEL, messages=messages)
    print(response.choices[0].message.content)


--- Summarizing: https://cgu-odisha.ac.in/bajaj-engineering-skills-training/ ---
The Bajaj Engineering Skills Training section of the C V Raman Global University website highlights various announcements and events related to the university, including a notice for registration for the 6th Convocation of CGU-Odisha, upcoming international conferences, and webinars organized by different departments. The site also provides information about admissions for the academic year 2026-27, online fee payment options, and details about the CGU campus and its facilities. Additionally, it includes resources for students such as various committees, clubs, scholarship options, and guidance for internal complaints and grievances. The university emphasizes its vision and mission focused on innovative learning and future-oriented education.

--- Summarizing: https://cgu-odisha.ac.in/inventing-the-future-with-learning/ ---
The website for C V Raman Global University (CGU) highlights its commitment to inn

---

## Q3: Tone Control with System Prompts

Write **three different system prompts** and call the API three times with the **same user message** (e.g., "What is machine learning?"). Your system prompts should produce:

1. A response suitable for a **10-year-old child**
2. A **technical response** for an engineering student
3. A response in **exactly 2 sentences**

Display all three outputs.

In [None]:
# The same user message for all three calls
user_message = "What is machine learning?"

# Three different system prompts
system_prompts = [
    "You are a helpful assistant explaining complex topics to a 10-year-old child. Use simple words and analogies they can understand.",  # 1. For a 10-year-old child
    "You are a highly knowledgeable and technical AI assistant. Provide a detailed and in-depth explanation of machine learning, including key concepts, algorithms, and applications, suitable for an engineering student.",  # 2. Technical response for engineering student
    "You are a concise AI assistant. Respond to the user's question in exactly two sentences.",  # 3. Respond in exactly 2 sentences
]

labels = ["For a 10-year-old", "For an engineering student", "In 2 sentences"]

for label, system_prompt in zip(labels, system_prompts):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message}
    ]
    response = client.chat.completions.create(model=MODEL, messages=messages)
    print(f"\n--- {label} ---")
    print(response.choices[0].message.content)


--- For a 10-year-old ---
Okay! Imagine you have a really smart robot friend. At first, this robot doesn’t know how to do anything, like sorting your toys. If you show it how to put the cars in one box and the dolls in another, that’s like teaching it a lesson.

Machine learning is kind of like that! It’s a way of teaching computers to learn from information, just like you learn from practice. Instead of telling the computer exactly what to do, we give it lots of examples. Over time, the computer figures things out by itself! 

So if we show it a lot of pictures of cats and dogs, it learns to tell the difference between them. It’s like how you get better at a game the more you play it! 

In short, machine learning helps computers learn from experience, so they can make good guesses or decisions without needing a person to tell them every little thing.

--- For an engineering student ---
Machine learning (ML) is a subset of artificial intelligence (AI) focused on the development of alg

**Written Response:** In 2–3 sentences, explain how the system prompt changed the model's behavior.

*Your answer here:*



---

**Course Information:**
- **Institution:** CV Raman Global University, Bhubaneswar
- **Program:** AI Center of Excellence
- **Course:** AI Systems Engineering 1
- **Developed by:** [Poorit Technologies](https://poorit.in) - *Transform Graduates into Industry-Ready Professionals*

---