### <span style="color:purple; font-weight:bold;">Candidate Name: Anirban Bose</span>
<h3><strong style="color:purple;">Assignment: Multi Language Brochure Generation</strong></h3>

In [1]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [4]:
company_name = "CNN"
website_url = "https://edition.cnn.com/"
website_details = Website(website_url)
website_details.links

['https://edition.cnn.com',
 'https://edition.cnn.com/us',
 'https://edition.cnn.com/world',
 'https://edition.cnn.com/politics',
 'https://edition.cnn.com/business',
 'https://edition.cnn.com/health',
 'https://edition.cnn.com/entertainment',
 'https://edition.cnn.com/style',
 'https://edition.cnn.com/travel',
 'https://edition.cnn.com/sports',
 'https://edition.cnn.com/science',
 'https://edition.cnn.com/climate',
 'https://edition.cnn.com/weather',
 'https://edition.cnn.com/world/europe/ukraine',
 'https://edition.cnn.com/world/middleeast/israel',
 'https://edition.cnn.com/games',
 'https://edition.cnn.com/us',
 'https://edition.cnn.com/world',
 'https://edition.cnn.com/politics',
 'https://edition.cnn.com/business',
 'https://edition.cnn.com/health',
 'https://edition.cnn.com/entertainment',
 'https://edition.cnn.com/style',
 'https://edition.cnn.com/travel',
 'https://edition.cnn.com/sports',
 'https://edition.cnn.com/science',
 'https://edition.cnn.com/climate',
 'https://edition

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [5]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [6]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [7]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [8]:
print(get_links_user_prompt(website_details))

Here is the list of links on the website of https://edition.cnn.com/ - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edition.cnn.com
https://edition.cnn.com/us
https://edition.cnn.com/world
https://edition.cnn.com/politics
https://edition.cnn.com/business
https://edition.cnn.com/health
https://edition.cnn.com/entertainment
https://edition.cnn.com/style
https://edition.cnn.com/travel
https://edition.cnn.com/sports
https://edition.cnn.com/science
https://edition.cnn.com/climate
https://edition.cnn.com/weather
https://edition.cnn.com/world/europe/ukraine
https://edition.cnn.com/world/middleeast/israel
https://edition.cnn.com/games
https://edition.cnn.com/us
https://edition.cnn.com/world
https://edition.cnn.com/politics
https://edition.cnn.com/business
https://edition.cnn.com/health
https://edition.cnn.c

In [9]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [10]:
get_links(website_url)

{'links': [{'type': 'about page', 'url': 'https://edition.cnn.com/about'},
  {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'},
  {'type': 'company page', 'url': 'https://edition.cnn.com'}]}

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [11]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [12]:
print(get_all_details(website_url))

Found links: {'links': [{'type': 'about page', 'url': 'https://edition.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}, {'type': 'company page', 'url': 'https://edition.cnn.com'}]}
Landing page:
Webpage Title:
Breaking News, Latest News and Videos | CNN
Webpage Contents:
CNN values your feedback
1. How relevant is this ad to you?
2. Did you encounter any technical issues?
Video player was slow to load content
Video content never loaded
Ad froze or did not finish loading
Video content did not start after ad
Audio on ad was too loud
Other issues
Ad never loaded
Ad prevented/slowed the page from loading
Content moved around while ad loaded
Ad was repetitive to ads I've seen previously
Other issues
Cancel
Submit
Thank You!
Your effort and contribution in providing this feedback is much
                                        appreciated.
Close
Ad Feedback
Close icon
US
World
Politics
Business
Health
Entertainment
Style
Travel
Sports
Science
Climate
Weath

In [13]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [14]:
def get_brochure_user_prompt(company_name, url, brochure_tone="professional"):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += f"Make sure you maintain a {brochure_tone} tone in the brochure.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [15]:
get_brochure_user_prompt(company_name, website_url, brochure_tone="jovial")

Found links: {'links': [{'type': 'about page', 'url': 'https://edition.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}, {'type': 'homepage', 'url': 'https://edition.cnn.com'}, {'type': 'profiles page', 'url': 'https://edition.cnn.com/profiles'}, {'type': 'leadership page', 'url': 'https://edition.cnn.com/profiles/cnn-leadership'}]}


"You are looking at a company called: CNN\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nMake sure you maintain a jovial tone in the brochure.\nLanding page:\nWebpage Title:\nBreaking News, Latest News and Videos | CNN\nWebpage Contents:\nCNN values your feedback\n1. How relevant is this ad to you?\n2. Did you encounter any technical issues?\nVideo player was slow to load content\nVideo content never loaded\nAd froze or did not finish loading\nVideo content did not start after ad\nAudio on ad was too loud\nOther issues\nAd never loaded\nAd prevented/slowed the page from loading\nContent moved around while ad loaded\nAd was repetitive to ads I've seen previously\nOther issues\nCancel\nSubmit\nThank You!\nYour effort and contribution in providing this feedback is much\n                                        appreciated.\nClose\nAd Feedback\nClose icon\nUS\nWorld\nPolitics\nBusiness\nHealth\

In [16]:
brochure_tone="jovial"

In [17]:
def create_brochure(company_name, url, brochure_tone):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url, brochure_tone)}
          ],
    )
    result = response.choices[0].message.content
    return result.strip('`').replace('```markdown\n', '').replace('```', '')


In [18]:
english_brochure = create_brochure(company_name, website_url, brochure_tone)
display(Markdown(english_brochure))

Found links: {'links': [{'type': 'about page', 'url': 'https://edition.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}, {'type': 'company page', 'url': 'https://edition.cnn.com'}]}


markdown
# 🌟 Welcome to CNN! 🌍

At CNN, we’re not just about headlines; we’re about heart, hustle, and helping you stay informed about the world around you! We believe knowledge is power, and our mission is to deliver news and stories that matter—swiftly, accurately, and vividly!

---

## 🎤 Who Are We? 

CNN, a pioneer in news broadcasting, is dedicated to bringing you **Breaking News, Latest Trends, and Noteworthy Videos** from every corner of the globe. With a team of skilled reporters, analysts, and storytellers, we dive deep into topics ranging from **Politics** and **Business** to **Science**, **Health**, and **Entertainment**!

---

## 🎉 Our Company Culture

### **Feedback Drives Us!** 
At CNN, your voice matters! We encourage feedback to improve our services and user experience—because we believe in continuously evolving and ensuring that our audience gets what they want, when they want it. Don’t hesitate to let us know how we can better serve you! 🗣️

### **Collaboration & Diversity**
We celebrate a culture that thrives on diversity and collaboration, creating an inclusive environment where every opinion is valued. Join us as we explore the fascinating tales that shape our world!

---

## 👥 Our Audience: 
CNN’s audience spans across the globe, from curious consumers to informed investors and everyone in between! Whether you’re seeking **in-depth analysis** of world events or **entertainment updates**, we’ve got you covered! 🗞️

---

## 🚀 Careers at CNN 

### **Join the CNN Family!**
We are always on the lookout for passionate and innovative individuals who are ready to make a difference. If you’re eager to tell stories that resonate and to create impactful content, then your dream job awaits you here at CNN! 

- **Dynamic Work Environment**: Work in an atmosphere that encourages creativity and connectivity. 
- **Growth Opportunities**: Ample chances for personal and professional growth! 
- **Be Part of Something Bigger**: Contribute to impactful journalism that shapes perspectives and sparks conversations.

---

## 🌈 Connect with Us!

We provide news in various languages and formats across multiple platforms, ensuring that everyone has access to the stories that drive our world forward. Sign up for our newsletters, follow us on social media, or dive into our live broadcasts—stay updated no matter where you are!

---

Join us in making sense of the world, one story at a time! 📺✨

[Work for CNN](#) | [Connect with Us](#) | [Feedback](#)

Together, let's explore the powerful tales unfolding around us. Welcome to CNN!


## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [19]:
def stream_brochure(company_name, url, brochure_tone):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url, brochure_tone)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [20]:
stream_brochure(company_name, website_url, brochure_tone)

Found links: {'links': [{'type': 'about page', 'url': 'https://edition.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}, {'type': 'company overview', 'url': 'https://edition.cnn.com/profiles/cnn-leadership'}]}


# Welcome to CNN!

## 🔥 Breaking News, Insightful Analysis & Engaging Stories!

At CNN, we bring you the latest and most relevant news from around the world straight to your screens! Whether it's politics that keep you on your toes, business insights for future investors, or health tips for a better life, we've got it all covered!

### 🌍 Your Source for Everything!

**Categories We Cover:**
- **US & World News**: Stay informed about domestic and global affairs.
- **Politics**: From congressional debates to presidential elections, we keep it relevant!
- **Business & Technology**: Track market trends and innovations shaping our economy's future.
- **Health & Science**: Learn about wellness tips, breakthroughs, and everything life-science related.
- **Entertainment & Sports**: Catch the latest scoop on your favorite celebrities and thrilling sports events!

### 🌟 Company Culture: Where Innovation Meets Engagement

At CNN, we pride ourselves on fostering an inclusive and collaborative work environment. Our culture encourages creativity and innovation, empowering every team member to express their ideas. We believe that great stories come from diverse perspectives, and we are committed to building a team that reflects this diversity. 

### 💼 Careers at CNN: Join the News Revolution!

Are you passionate about storytelling? Do you want to make a difference in how the world consumes news? We’d love to hear from you! Join a dedicated team of journalists, producers, and digital innovators who share a common goal: to inform, engage, and inspire our audience. Explore exciting career opportunities where your voice matters, and your work can shape the future of news! 

### 🎉 Customers & Community

Our global audience counts on CNN for reliable information and compelling narratives. Whether it's through breaking news alerts or engaging video content, we’re deeply committed to keeping our users informed and entertained. By tuning in, our viewers become part of the CNN community!

### Let’s Connect!

For more updates or if you have feedback, feel free to reach out! Your input helps us improve and provide the content you want to see. With CNN, the stories that matter to you are just a click away!

_Thank you for being part of our journey as we continue to inform, inspire, and engage the world._

In [21]:
translation_language = "Hindi"
translate_system_prompt = f"Imagine you are a professional {translation_language} translator. Now translated the generated brochure in {translation_language}.\n"

In [22]:
def translate_to_languge(base_brochure, translation_language):
    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": translate_system_prompt},
            {"role": "user", "content": base_brochure}
        ]
    )
    result = response.choices[0].message.content
    print("\nEnglish Brochure\n")
    display(Markdown(base_brochure))
    print(f"\n{translation_language} Brochure\n")
    display(Markdown(result))

In [23]:
translate_to_languge(english_brochure, translation_language)


English Brochure



markdown
# 🌟 Welcome to CNN! 🌍

At CNN, we’re not just about headlines; we’re about heart, hustle, and helping you stay informed about the world around you! We believe knowledge is power, and our mission is to deliver news and stories that matter—swiftly, accurately, and vividly!

---

## 🎤 Who Are We? 

CNN, a pioneer in news broadcasting, is dedicated to bringing you **Breaking News, Latest Trends, and Noteworthy Videos** from every corner of the globe. With a team of skilled reporters, analysts, and storytellers, we dive deep into topics ranging from **Politics** and **Business** to **Science**, **Health**, and **Entertainment**!

---

## 🎉 Our Company Culture

### **Feedback Drives Us!** 
At CNN, your voice matters! We encourage feedback to improve our services and user experience—because we believe in continuously evolving and ensuring that our audience gets what they want, when they want it. Don’t hesitate to let us know how we can better serve you! 🗣️

### **Collaboration & Diversity**
We celebrate a culture that thrives on diversity and collaboration, creating an inclusive environment where every opinion is valued. Join us as we explore the fascinating tales that shape our world!

---

## 👥 Our Audience: 
CNN’s audience spans across the globe, from curious consumers to informed investors and everyone in between! Whether you’re seeking **in-depth analysis** of world events or **entertainment updates**, we’ve got you covered! 🗞️

---

## 🚀 Careers at CNN 

### **Join the CNN Family!**
We are always on the lookout for passionate and innovative individuals who are ready to make a difference. If you’re eager to tell stories that resonate and to create impactful content, then your dream job awaits you here at CNN! 

- **Dynamic Work Environment**: Work in an atmosphere that encourages creativity and connectivity. 
- **Growth Opportunities**: Ample chances for personal and professional growth! 
- **Be Part of Something Bigger**: Contribute to impactful journalism that shapes perspectives and sparks conversations.

---

## 🌈 Connect with Us!

We provide news in various languages and formats across multiple platforms, ensuring that everyone has access to the stories that drive our world forward. Sign up for our newsletters, follow us on social media, or dive into our live broadcasts—stay updated no matter where you are!

---

Join us in making sense of the world, one story at a time! 📺✨

[Work for CNN](#) | [Connect with Us](#) | [Feedback](#)

Together, let's explore the powerful tales unfolding around us. Welcome to CNN!



Hindi Brochure



# 🌟 सीएनएन में आपका स्वागत है! 🌍

सीएनएन में, हम सिर्फ हेडलाइन्स के बारे में नहीं हैं; हम दिल, मेहनत, और आपको आपके चारों ओर की दुनिया के बारे में सूचित रहने में मदद करने के बारे में हैं! हमे ज्ञान की शक्ति मानते हैं, और हमारा मिशन समाचार और कहानियों को जो मायने रखते हैं—तेजी से, सटीकता से, और जीवंतता से!

---

## 🎤 हम कौन हैं?

सीएनएन, समाचार प्रसारण में एक अग्रणी, हर कोने से **ब्रेकिंग न्यूज, नवीनतम ट्रेंड्स, और महत्वपूर्ण वीडियो** आप तक पहुंचाने के लिए समर्पित है। हमारे पास कुशल पत्रकार, विश्लेषक, और कहानीकारों की टीम है, जो **राजनीति** और **व्यापार** से लेकर **विज्ञान**, **स्वास्थ्य**, और **मनोरंजन** जैसे विषयों में गहराई से जा रही है!

---

## 🎉 हमारा कंपनीय संस्कृति

### **प्रतिक्रिया हमें प्रेरित करती है!**
सीएनएन में, आपकी आवाज मायना रखती है! हम सेवाओं और उपयोगकर्ता अनुभव में सुधार करने के लिए प्रतिक्रिया को प्रोत्साहित करते हैं—क्योंकि हम सतत विकास और सुनिश्चित करने में विश्वास रखते हैं कि हमारे दर्शक को वो मिले जो चाहते हैं, जब चाहें। हमें बताने में हिचकिचाना मत! 🗣️

### **सहयोग और विविधता**
हम विविधता और सहयोग पर समृद्धि करने वाली संस्कृति की प्रशंसा करते हैं, जहां हर राय को मूल्य दिया जाता है। हमें साथ जुड़ें जैसे हम उन रोचक कथाओं का अन्वेषण करें जो हमारी दुनिया को आकार देती हैं!

---

## 👥 हमारा दर्शक:
सीएनएन का दर्शनकर्ता पूरी दुनिया में फैला हुआ है, जिनमें जिज्ञासु उपभोक्ता से जानकार निवेशक तक और सभी कुछ आता है! चाहे आप दुनिया के घटनाओं का **गहरा विश्लेषण** ढूँढ रहे हों या **मनोरंजन अपडेट** को, हमने आपको पोर्ट किया है! 🗞️

---

## 🚀 सीएनएन में करियर

### **सीएनएन परिवार में शामिल हों!**
हम हमेशा प्रेरित और नवाचारी व्यक्तियों की खोज कर रहें हैं जो तय करने के लिये अब कुछ करने को तैयार हैं। अगर आप कहानियों को महसूस करने और प्रभावी सामग्री बनाने में उत्साही हैं, तो तो आपका सपना नौकरी आपका सीएनएन में इंतजार कर रहा है! 

- **गतिशील कार्य वातावरण**: एक वातावरण में काम करें जो रचनात्मकता और जुड़ाव को प्रोत्साहित करता है। 
- **वृद्धि के अवसर**: निजी और पेशेवर विकास के लिए अधिक अवसर! 
- **कुछ बड़े का हिस्सा बनें**: प्रभावकारी पत्रकारिता में योगदान दें जो दृश्यांतरों को आकार देता है और चर्चाओं को स्फूर्ति देता है।

---

## 🌈 हमसे जुड़ें!

हम विभिन्न भाषाओं और प्रारूपों में समाचार प्रदान करते हैं और कई प्लेटफ़ॉर्मों पर, सुनिश्चित करते हैं कि हर कोई उन कहानियों को प्राप्त करता है जो हमारी दुनिया को आगे बढ़ाने में मदद करती हैं। हमारे न्यूजलेटर्स के लिए साइन अप करें, हमें सोशल मीडिया पर फॉलो करें, या हमारी लाइव प्रसारण में डूबें—हर जगह अपडेट रहें, जहां भी रहें!

---

हमारे साथ जुड़कर दुनिया को समझने में एक कहानी बार-ए-बार मदद करें! 📺✨

[सीएनएन में काम करें](#) | [हमसे जुड़ें](#) | [प्रतिक्रिया](#)

साथ में चलो, हमारी दुनिया में क्रियाशील कथाओं का अन्वेषण करने। सीएनएन में आपका स्वागत है!