<div align="center">
<img src="https://poorit.in/image.png" alt="Poorit" width="40" style="vertical-align: middle;"> <b>AI SYSTEMS ENGINEERING 1</b>

## Unit 1: JSON Prompting, Chaining, and Streaming

**CV Raman Global University, Bhubaneswar**  
*AI Center of Excellence*

</div>

---

### What You'll Learn

In this notebook, you will:

1. **Use JSON structured outputs** to get predictable responses from LLMs
2. **Chain multiple LLM calls** to build complex workflows
3. **Implement streaming responses** for better user experience
4. **Build a company brochure generator** as a practical project

**Duration:** ~2 hours

---

## 1. Environment Setup

In [None]:
# Install required packages
!pip install -q openai requests beautifulsoup4

In [None]:
import os
import json
from getpass import getpass
from openai import OpenAI
from bs4 import BeautifulSoup
import requests
from IPython.display import Markdown, display, update_display

In [None]:
# Configure OpenAI
api_key = getpass("Enter your OpenAI API Key: ")
os.environ['OPENAI_API_KEY'] = api_key
client = OpenAI(api_key=api_key)
MODEL = "gpt-4o-mini"

---

## 2. Web Scraping Utilities

First, let's set up our web scraping functions (same as notebook 01).

In [None]:
HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

def fetch_website_contents(url, max_chars=2000):
    """Fetch and return the text content of a website."""
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.content, "html.parser")
    
    title = soup.title.string if soup.title else "No title found"
    
    if soup.body:
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        text = soup.body.get_text(separator="\n", strip=True)
    else:
        text = ""
    
    return (title + "\n\n" + text)[:max_chars]


def fetch_website_links(url):
    """Return all links found on a webpage."""
    response = requests.get(url, headers=HEADERS)
    soup = BeautifulSoup(response.content, "html.parser")
    links = [link.get("href") for link in soup.find_all("a")]
    return [link for link in links if link]

---

## 3. JSON Structured Outputs

When you need predictable, parseable responses from an LLM, use **JSON mode**.

This is essential for:
- Building pipelines where output feeds into code
- Extracting structured data from text
- Creating reliable automation

### One-Shot Prompting

We provide an example in the prompt to show the expected format:

In [None]:
# System prompt with JSON example (one-shot prompting)
LINK_SYSTEM_PROMPT = """
You are provided with a list of links found on a webpage.
Decide which links would be most relevant for a company brochure,
such as About page, Company page, or Careers/Jobs pages.
Respond in JSON as in this example:

{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [None]:
def create_links_prompt(url):
    """Create the user prompt for link selection."""
    links = fetch_website_links(url)
    
    prompt = f"""
Here is the list of links on the website {url} -
Please decide which are relevant for a company brochure.
Respond with the full https URL in JSON format.
Do not include Terms of Service, Privacy, or email links.

Links:
"""
    prompt += "\n".join(links[:50])  # Limit to first 50 links
    return prompt

In [None]:
def select_relevant_links(url):
    """Use LLM to select relevant links from a website."""
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": LINK_SYSTEM_PROMPT},
            {"role": "user", "content": create_links_prompt(url)}
        ],
        response_format={"type": "json_object"}  # Force JSON output
    )
    
    result = response.choices[0].message.content
    return json.loads(result)

In [None]:
# Test link selection
links = select_relevant_links("https://anthropic.com")
print(json.dumps(links, indent=2))

---

## 4. Chaining LLM Calls

**Chaining** means using the output of one LLM call as input to another.

This is an early example of **Agentic AI** patterns.

### Our Pipeline:
1. **Step 1**: Extract relevant links from website (using JSON output)
2. **Step 2**: Fetch content from those links
3. **Step 3**: Generate brochure from aggregated content

In [None]:
def fetch_page_and_relevant_links(url):
    """Fetch main page and content from relevant links."""
    # Get main page content
    contents = fetch_website_contents(url)
    
    # Get relevant links using LLM
    relevant_links = select_relevant_links(url)
    
    # Aggregate content
    result = f"## Landing Page:\n\n{contents}\n\n## Relevant Links:\n"
    
    for link in relevant_links.get('links', [])[:3]:  # Limit to 3 links
        result += f"\n\n### {link['type']}\n"
        try:
            result += fetch_website_contents(link["url"])
        except:
            result += "(Could not fetch content)"
    
    return result

In [None]:
# Test content aggregation
content = fetch_page_and_relevant_links("https://anthropic.com")
print(content[:1000] + "...")

---

## 5. Building the Brochure Generator

In [None]:
BROCHURE_SYSTEM_PROMPT = """
You are an assistant that analyzes company website content
and creates a professional brochure for prospective customers, investors, and recruits.
Respond in markdown without code blocks.
Include details of company culture, products/services, and careers if available.
"""

In [None]:
def create_brochure_prompt(company_name, url):
    """Create the prompt for brochure generation."""
    prompt = f"""
You are looking at a company called: {company_name}
Here are the contents of its landing page and other relevant pages.
Use this information to build a short brochure in markdown.

"""
    prompt += fetch_page_and_relevant_links(url)
    return prompt[:5000]  # Truncate to fit context

In [None]:
def create_brochure(company_name, url):
    """Generate a company brochure."""
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": BROCHURE_SYSTEM_PROMPT},
            {"role": "user", "content": create_brochure_prompt(company_name, url)}
        ]
    )
    return response.choices[0].message.content

In [None]:
# Generate a brochure
brochure = create_brochure("Anthropic", "https://anthropic.com")
display(Markdown(brochure))

---

## 6. Streaming Responses

**Streaming** shows the response as it's generated, providing a better user experience.

Instead of waiting for the complete response, you see text appear in real-time.

In [None]:
def stream_brochure(company_name, url):
    """Generate a brochure with streaming output."""
    stream = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": BROCHURE_SYSTEM_PROMPT},
            {"role": "user", "content": create_brochure_prompt(company_name, url)}
        ],
        stream=True  # Enable streaming
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    
    for chunk in stream:
        content = chunk.choices[0].delta.content or ''
        response += content
        update_display(Markdown(response), display_id=display_handle.display_id)

In [None]:
# Test streaming
stream_brochure("Anthropic", "https://anthropic.com")

---

## 7. Changing Tone with System Prompts

You can easily change the output style by modifying the system prompt.

In [None]:
# Humorous tone example
HUMOROUS_SYSTEM_PROMPT = """
You are an assistant that creates witty, entertaining brochures about companies.
Use humor and clever observations while still being informative.
Respond in markdown without code blocks.
"""

def stream_humorous_brochure(company_name, url):
    """Generate a humorous brochure."""
    stream = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": HUMOROUS_SYSTEM_PROMPT},
            {"role": "user", "content": create_brochure_prompt(company_name, url)}
        ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    
    for chunk in stream:
        content = chunk.choices[0].delta.content or ''
        response += content
        update_display(Markdown(response), display_id=display_handle.display_id)

In [None]:
# Try the humorous version
# stream_humorous_brochure("Anthropic", "https://anthropic.com")

---

## 8. Exercise: Build a Product Description Generator

Apply what you've learned to create a different application.

In [None]:
# Exercise: Create a product description generator
# that takes a product URL and generates marketing copy

def generate_product_description(product_url):
    """
    Generate marketing copy for a product page.
    
    Steps:
    1. Fetch the product page content
    2. Use LLM to generate compelling description
    3. Return with streaming
    """
    # Your implementation here
    pass

---

## Key Takeaways

1. **JSON mode** (`response_format={"type": "json_object"}`) ensures parseable outputs

2. **One-shot prompting** - provide an example in the prompt for better formatting

3. **Chaining LLM calls** creates powerful pipelines - output of one feeds into another

4. **Streaming** (`stream=True`) provides better UX with real-time output

5. **Tone control** - system prompts easily change the style of output

### Pipeline Pattern

```
Input → LLM Call 1 (Extract/Analyze) → LLM Call 2 (Generate) → Output
```

This is an early form of **Agentic AI** - we'll explore this more in later units!

---

## Additional Resources

- [OpenAI JSON Mode](https://platform.openai.com/docs/guides/structured-outputs)
- [OpenAI Streaming](https://platform.openai.com/docs/api-reference/streaming)

---

**Course Information:**
- **Institution:** CV Raman Global University, Bhubaneswar
- **Program:** AI Center of Excellence
- **Course:** AI Systems Engineering 1
- **Developed by:** [Poorit Technologies](https://poorit.in) - *Transform Graduates into Industry-Ready Professionals*

---