<div align="center">
<img src="https://poorit.in/image.png" alt="Poorit" width="40" style="vertical-align: middle;"> <b>AI SYSTEMS ENGINEERING 1</b>

## Unit 1 Exercises: JSON Prompting, Chaining, and Streaming

**CV Raman Global University, Bhubaneswar**  
*AI Center of Excellence*

</div>

---

Complete the exercises below using the helper functions and setup provided. Each question has one or more empty code cells for your solution.

## Setup

Run the cells below to install packages, import libraries, and define helper functions.

In [None]:
# Install required packages
!pip install -q openai requests beautifulsoup4

In [None]:
# Import required libraries
import os
import json
from openai import OpenAI
from bs4 import BeautifulSoup
import requests
from IPython.display import Markdown, display, update_display

In [None]:
# Configure Gemini API Key
from google.colab import userdata
from getpass import getpass

GEMINI_API_KEY = getpass("Enter your Gemini API Key: ")

GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"

client = OpenAI(
    base_url=GEMINI_BASE_URL,
    api_key=GEMINI_API_KEY
)

MODEL = "gemini-2.0-flash"
print(f"Gemini configured with model: {MODEL}")

In [None]:
# Helper functions for web scraping

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}


def fetch_website_contents(url, max_chars=2000):
    """
    Fetch and return the title and text content of a website.
    Removes scripts, styles, and other non-text elements.
    """
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.content, "html.parser")

    title = soup.title.string if soup.title else "No title found"

    if soup.body:
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        text = soup.body.get_text(separator="\n", strip=True)
    else:
        text = ""

    return (title + "\n\n" + text)[:max_chars]


def fetch_website_links(url):
    """
    Return all links found on a webpage.
    """
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.content, "html.parser")
    links = [link.get("href") for link in soup.find_all("a")]
    return [link for link in links if link]

---

## Q1: JSON Structured Output

Ask the model to analyze the topic **"Artificial Intelligence"** and return a JSON response with the following keys:
- `topic` — the topic name
- `summary` — a 2-3 sentence summary
- `key_concepts` — a list of 3-5 key concepts

**Steps:**
1. Write a system prompt that tells the model to respond in JSON with the keys above
2. Call the API with `response_format` set to enable JSON mode
3. Parse the JSON response and print it formatted

In [None]:
# Step 1: Write a system prompt that specifies the JSON keys
system_prompt = """
___
"""

# Step 2: Call the API with JSON mode enabled
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "Analyze the topic: Artificial Intelligence"}
    ],
    response_format={"type": "___"}  # What value enables JSON mode?
)

# Step 3: Parse the JSON response
result = json.___(response.choices[0].message.content)  # Which json function parses a string?
print(json.dumps(result, indent=2))

---

## Q2: One-Shot JSON Prompting for Link Classification

Using the **one-shot prompting** pattern from the lecture, create a system prompt that includes an example JSON structure. Then use it to classify links from `https://anthropic.com` as relevant for a company pamphlet.

**Steps:**
1. Write a system prompt with an example JSON showing the expected output format
2. Fetch links from the website and build the user prompt
3. Call the API with JSON mode and display the selected links

In [None]:
# Step 1: System prompt with one-shot JSON example
LINK_SYSTEM_PROMPT = """
You are provided with a list of links found on a webpage.
Decide which links would be most relevant for a company pamphlet,
such as About page, Company page, or Careers/Jobs pages.
Respond in JSON as in this example:

___
"""
# Hint: The example JSON should have a "links" key containing a list of objects,
# each with "type" and "url" keys

In [None]:
# Step 2: Build the user prompt with links from the website
url = "https://anthropic.com"
links = fetch_website_links(___)

user_prompt = f"""
Here is the list of links on the website {url} -
Please decide which are relevant for a company pamphlet.
Respond with the full https URL in JSON format.
Do not include Terms of Service, Privacy, or email links.

Links:
"""
user_prompt += "\n".join(links[:50])

In [None]:
# Step 3: Call the API with JSON mode
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": ___},
        {"role": "user", "content": ___}
    ],
    response_format={"type": "json_object"}
)

selected_links = json.loads(response.choices[0].message.content)
print(json.dumps(selected_links, indent=2))

# Print just the URLs
print("\nSelected URLs:")
for link in selected_links.get("links", []):
    print(f"  - [{link['type']}] {link['url']}")

---

## Q3: Chaining LLM Calls

Build a **2-step pipeline** that chains LLM calls:
1. **Step 1:** Use the link selection from Q2 to get relevant links
2. **Step 2:** Fetch content from those links and generate a company pamphlet

This demonstrates the **chaining pattern** — the output of one LLM call drives the input to the next.

**Steps:**
1. Complete `fetch_page_and_relevant_links()` to aggregate content from the main page and relevant links
2. Write the pamphlet system prompt
3. Complete `create_pamphlet()` to generate the final pamphlet

In [None]:
def select_relevant_links(url):
    """Use LLM to select relevant links from a website (from Q2)."""
    links = fetch_website_links(url)
    user_prompt = f"""
Here is the list of links on the website {url} -
Please decide which are relevant for a company pamphlet.
Respond with the full https URL in JSON format.
Do not include Terms of Service, Privacy, or email links.

Links:
"""
    user_prompt += "\n".join(links[:50])

    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": LINK_SYSTEM_PROMPT},
            {"role": "user", "content": user_prompt}
        ],
        response_format={"type": "json_object"}
    )
    return json.loads(response.choices[0].message.content)

In [None]:
# Step 1: Complete the function that chains Step 1 → Step 2
def fetch_page_and_relevant_links(url):
    """Fetch main page and content from relevant links."""
    # Get the main page content
    contents = fetch_website_contents(url)

    # Use LLM to pick relevant links (this is the "chain")
    relevant_links = ___(url)  # Which function from Q2 selects links?

    # Combine everything into one string
    result = f"## Landing Page:\n\n{contents}\n\n## Relevant Links:\n"

    for link in relevant_links.get('___', [])[:3]:  # What key holds the links list?
        result += f"\n\n### {link['type']}\n"
        try:
            result += fetch_website_contents(link["___"])  # What key holds the URL?
        except:
            result += "(Could not fetch content)"

    return result

In [None]:
# Step 2: Write the pamphlet system prompt
PAMPHLET_SYSTEM_PROMPT = """
___
"""
# Hint: Tell the model to analyze company website content and create a professional
# pamphlet in markdown. Include details about culture, products, and careers.

In [None]:
# Step 3: Complete the pamphlet generation function
def create_pamphlet(company_name, url):
    """Generate a company pamphlet using chained LLM calls."""
    # Build the user prompt with aggregated content
    user_prompt = f"""
You are looking at a company called: {company_name}
Here are the contents of its landing page and other relevant pages.
Use this information to build a short pamphlet in markdown.

"""
    user_prompt += ___(url)  # Which function aggregates page + relevant link content?
    user_prompt = user_prompt[:5000]  # Truncate to fit context

    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": ___},
            {"role": "user", "content": ___}
        ]
    )
    return response.choices[0].message.content

In [None]:
# Test it!
pamphlet = create_pamphlet("Anthropic", "https://anthropic.com")
display(Markdown(pamphlet))

---

## Q4: Streaming Responses

Convert the `create_pamphlet()` function into a **streaming** version that displays the response in real-time as it is generated.

**Steps:**
1. Enable streaming in the API call
2. Loop through chunks and extract the content from each chunk
3. Use `update_display()` to show the response as it builds up

In [None]:
def stream_pamphlet(company_name, url):
    """Generate a pamphlet with streaming output."""
    user_prompt = f"""
You are looking at a company called: {company_name}
Here are the contents of its landing page and other relevant pages.
Use this information to build a short pamphlet in markdown.

"""
    user_prompt += fetch_page_and_relevant_links(url)
    user_prompt = user_prompt[:5000]

    stream = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": PAMPHLET_SYSTEM_PROMPT},
            {"role": "user", "content": user_prompt}
        ],
        ___=True  # What parameter enables streaming?
    )

    response = ""
    display_handle = display(Markdown(""), display_id=True)

    for chunk in stream:
        content = chunk.choices[0].___.content or ''  # How do you access content in a stream chunk?
        response += content
        ___(Markdown(response), display_id=display_handle.display_id)  # What function updates the display?

In [None]:
# Test streaming
stream_pamphlet("Anthropic", "https://anthropic.com")

**Written Response:** In 2–3 sentences, explain why chaining multiple LLM calls is considered an early form of Agentic AI.

*Your answer here:*


---

**Course Information:**
- **Institution:** CV Raman Global University, Bhubaneswar
- **Program:** AI Center of Excellence
- **Course:** AI Systems Engineering 1
- **Developed by:** [Poorit Technologies](https://poorit.in) - *Transform Graduates into Industry-Ready Professionals*

---