# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [4]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

import httpx

os.environ['PYTHONHTTPSVERIFY'] = '0'
#os.environ['REQUESTS_CA_BUNDLE'] = ''


In [16]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'

http_client = httpx.Client(verify=False)

openai = OpenAI(http_client=http_client)

API key looks good so far


In [6]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [7]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'ht

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [8]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [9]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [10]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [11]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/
https://edwarddonner.com/2024/12/21/

In [15]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
        
        
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [13]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..
# "https://huggingface.co"
#huggingFace = Website("https://huggingface.co")
#print(huggingFace)
CNN = Website("https://www.cnn.com")
CNN.links

['https://www.cnn.com',
 'https://www.cnn.com/us',
 'https://www.cnn.com/world',
 'https://www.cnn.com/politics',
 'https://www.cnn.com/business',
 'https://www.cnn.com/health',
 'https://www.cnn.com/entertainment',
 'https://www.cnn.com/style',
 'https://www.cnn.com/travel',
 'https://www.cnn.com/sports',
 'https://www.cnn.com/science',
 'https://www.cnn.com/climate',
 'https://www.cnn.com/weather',
 'https://www.cnn.com/world/europe/ukraine',
 'https://www.cnn.com/world/middleeast/israel',
 'https://www.cnn.com/cnn-underscored',
 'https://www.cnn.com/games',
 'https://www.cnn.com/us',
 'https://www.cnn.com/world',
 'https://www.cnn.com/politics',
 'https://www.cnn.com/business',
 'https://www.cnn.com/health',
 'https://www.cnn.com/entertainment',
 'https://www.cnn.com/style',
 'https://www.cnn.com/travel',
 'https://www.cnn.com/sports',
 'https://www.cnn.com/science',
 'https://www.cnn.com/climate',
 'https://www.cnn.com/weather',
 'https://www.cnn.com/world/europe/ukraine',
 'https:

In [18]:
print(get_links("https://www.cnn.com"))

{'links': [{'type': 'about page', 'url': 'https://www.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}, {'type': 'profiles page', 'url': 'https://www.cnn.com/profiles'}, {'type': 'leadership profiles page', 'url': 'https://www.cnn.com/profiles/cnn-leadership'}, {'type': 'newsletters page', 'url': 'https://www.cnn.com/newsletters'}]}


## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [19]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [21]:
print(get_all_details("https://cnn.com"))

Found links: {'links': [{'type': 'home page', 'url': 'https://www.cnn.com'}, {'type': 'about page', 'url': 'https://www.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}, {'type': 'profiles page', 'url': 'https://www.cnn.com/profiles'}, {'type': 'company culture page', 'url': 'https://www.cnn.com/newsletters'}]}
Landing page:
Webpage Title:
Breaking News, Latest News and Videos | CNN
Webpage Contents:
CNN values your feedback
1. How relevant is this ad to you?
2. Did you encounter any technical issues?
Video player was slow to load content
Video content never loaded
Ad froze or did not finish loading
Video content did not start after ad
Audio on ad was too loud
Other issues
Ad never loaded
Ad prevented/slowed the page from loading
Content moved around while ad loaded
Ad was repetitive to ads I've seen previously
Other issues
Cancel
Submit
Thank You!
Your effort and contribution in providing this feedback is much
                                        

In [22]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [23]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [24]:
get_brochure_user_prompt("CNN", "https://cnn.com")

Found links: {'links': [{'type': 'about page', 'url': 'https://www.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}, {'type': 'company page', 'url': 'https://www.cnn.com/profiles'}, {'type': 'leadership page', 'url': 'https://www.cnn.com/profiles/cnn-leadership'}]}


"You are looking at a company called: CNN\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nBreaking News, Latest News and Videos | CNN\nWebpage Contents:\nCNN values your feedback\n1. How relevant is this ad to you?\n2. Did you encounter any technical issues?\nVideo player was slow to load content\nVideo content never loaded\nAd froze or did not finish loading\nVideo content did not start after ad\nAudio on ad was too loud\nOther issues\nAd never loaded\nAd prevented/slowed the page from loading\nContent moved around while ad loaded\nAd was repetitive to ads I've seen previously\nOther issues\nCancel\nSubmit\nThank You!\nYour effort and contribution in providing this feedback is much\n                                        appreciated.\nClose\nAd Feedback\nClose icon\nUS\nWorld\nPolitics\nBusiness\nHealth\nEntertainment\nStyle\nTravel\nSports\nScience\nClimate

In [26]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [27]:
create_brochure("CNN", "https://cnn.com")

Found links: {'links': [{'type': 'company page', 'url': 'https://www.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}, {'type': 'profiles page', 'url': 'https://www.cnn.com/profiles'}, {'type': 'newsletters page', 'url': 'https://www.cnn.com/newsletters'}]}


# CNN Company Brochure

**Welcome to CNN**  
Your trusted source for breaking news, insightful analysis, and engaging video content across a wide range of topics.

---

## Our Mission
CNN is committed to delivering the latest news and videos that inform and empower audiences around the world. Our mission is to provide accurate, comprehensive journalism that matters, with a focus on the significant events that shape our lives.

---

## Company Culture
At CNN, we foster a dynamic and diverse work environment where creativity and collaboration are at the forefront. Our team believes in transparency, open communication, and the importance of feedback to enhance both our content and the overall viewer experience. We value each employee's contribution to our continuous journey in redefining news and storytelling.

### Core Values:
- **Integrity**: We prioritize factual reporting and ethical journalism.
- **Innovation**: Embracing new technologies to present news in engaging formats.
- **Inclusivity**: Creating a workplace where every voice is heard and valued.

---

## Our Audience
CNN serves a diverse global audience, including individuals and entities interested in the following topics:
- **World Events**: From political changes to international conflicts.
- **Business & Economy**: Insights into markets, innovations, and financial trends.
- **Health & Lifestyle**: Updates on well-being, fitness, and personal growth.
- **Entertainment**: Coverage of the latest in movies, television, and celebrity news.
- **Sports**: Live updates from major sports events and in-depth analysis.

---

## Explore Opportunities with Us
CNN is always on the lookout for passionate individuals who want to be part of a transformative media organization. We offer career paths in various areas including journalism, production, technology, and business operations. 

### Career Benefits:
- **Professional Development**: Opportunities for training and growth in your field.
- **Work-Life Balance**: Flexible work schedules to help you thrive both personally and professionally.
- **Collaborative Environment**: Work alongside experts and innovators in a supportive team.

---

## Join Us
Interested in being part of a leading news organization? Check out current job openings at [Work for CNN](#) and take the next step in your career.

---

For more updates, follow us on our social media platforms and subscribe to our newsletters to stay informed and engaged with the latest from CNN.

**Thank You for Choosing CNN**  
Together, let’s navigate the complexities of the world with clarity and insight.

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [34]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [35]:
stream_brochure("CNN", "https://cnn.com")

Found links: {'links': [{'type': 'about page', 'url': 'https://www.cnn.com/about'}, {'type': 'careers page', 'url': 'https://careers.wbd.com/cnnjobs'}, {'type': 'profiles page', 'url': 'https://www.cnn.com/profiles'}, {'type': 'leadership profiles page', 'url': 'https://www.cnn.com/profiles/cnn-leadership'}]}


# CNN Company Brochure

## About CNN
CNN, founded in 1980, is a global leader in news and information, delivering breaking news, engaging articles, and compelling videos across a variety of platforms. Our mission is to inform and empower people worldwide with the latest, most relevant news coverage tailored to their interests. 

## Services Offered
- **Comprehensive News Coverage**: From US and world news to politics, business, health, and entertainment, CNN covers topics that matter to our audience.
- **Innovative Broadcasting**: Available 24/7 through live TV, podcasts, and video content, we ensure our viewers are always in the loop.
- **Analysis and Investigative Reporting**: Our well-respected journalists provide deep insights and thorough investigations on pressing issues, making complex matters understandable.

## Customers
CNN serves a diverse array of customers including:
- **General Public**: Anyone seeking trustworthy news from local to global perspectives.
- **Business Professionals**: Those who require up-to-date market insights and business news.
- **Politicians and Policy Makers**: Individuals looking for detailed analysis of political events and government affairs.
- **Entertainment Enthusiasts**: Viewers interested in the latest in movies, television, and celebrity news.

## Company Culture
At CNN, we foster an inclusive workplace that values diversity in thought, experience, and background. Our culture emphasizes collaboration, innovation, and integrity. Employees are encouraged to share their ideas, take ownership of their projects, and contribute to a dynamic and fast-paced environment.

### Employee Testimonials
> "Working at CNN has been a rewarding experience. The supportive environment allows me to grow and excel in my career." - A CNN Journalist

> "I love that we are constantly evolving and adapting to new technologies and storytelling methods." - A CNN Producer

## Career Opportunities
CNN offers a variety of career paths for individuals who are passionate about storytelling and journalism. Opportunities are available in various departments, including:
- **News and Reporting**: Journalists, editors, and producers.
- **Technology**: Data analysts and software developers.
- **Marketing and Sales**: Marketing strategists and sales representatives.
- **Digital Media**: Content creators and social media specialists.

### Benefits of Working at CNN:
- Competitive salaries and benefits
- Opportunities for professional development
- A healthy work-life balance
- A chance to contribute to impactful news coverage that resonates around the globe

## Join Us
If you are looking for a challenging and fulfilling career in media, CNN is the place for you. Be a part of a team that shapes global conversations every day.

### Learn More
For more information on our offerings, to review current job openings, or to learn about CNN’s mission and values, visit our [official website](https://www.cnn.com).

---

Elevate your career and stories with CNN—where every second counts in shaping world narratives!

In [36]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

stream_brochure("Edward Donner", "https://edwarddonner.com")

Found links: {'links': [{'type': 'about page', 'url': 'https://edwarddonner.com/about-me-and-about-nebula/'}]}


# Edward Donner Brochure

## Welcome to Edward Donner

At Edward Donner, we are at the forefront of innovation in the field of Artificial Intelligence, specifically focusing on enhancing recruitment dynamics through advanced Large Language Models (LLMs). Our mission is clear: to empower individuals to discover their potential and pursue their purpose in life, a concept rooted in **Ikigai**.

---

## Our Vision

Founded by Ed, a passionate coder and AI pioneer, Edward Donner is more than just a tech company. We strive to change the narrative around job fulfillment by addressing the staggering statistic that 77% of individuals feel disengaged at work. Our goal is to create a world where people find roles that are not only lucrative but also fulfilling.

---

## Our Approach

### Proprietary Technology
- **Generative AI**: We leverage cutting-edge AI technologies to revolutionize the way recruiters source, engage, and manage talent.
- **Patented Matching Model**: Our unique matching algorithm finds the best-fit candidates for roles with unparalleled accuracy and speed—no keywords required.

### Customer-Centric Focus
- We collaborate closely with our clients, ensuring that we fully understand their needs and the dynamics of their industries. Our award-winning platform is not just about filling positions; it's about building impactful connections between talent and opportunity.

---

## Company Culture

### A Collaborative Environment
At Edward Donner, we foster a culture of collaboration and curiosity. Our team is composed of individuals who are not only experts in their fields but also enthusiastic about technology, problem-solving, and creativity. We believe in maintaining an open dialogue and encourage our team members to share ideas and innovate.

### Growth-Oriented Mindset
We are dedicated to personal and professional development, encouraging continuous learning through workshops, hands-on projects, and networking opportunities. Each team member is empowered to take charge of their career paths, contributing to both their success and that of our company.

---

## Join Us!

At Edward Donner, we are always on the lookout for talented individuals who share our vision of using AI to improve lives. If you are a software engineer, data scientist, or have experience in AI technologies, we want to hear from you!

- **Work with Us**: Explore various career opportunities that align with your skills and ambitions.
- **Connect with Ed**: Ed values networking and mentorship; reach out for a virtual coffee chat or in-person meet-up if you’re in NYC!

---

## Let's Connect!

We invite you to join us in our journey to reshape the recruitment industry for the better.

- **Website**: [www.edwarddonner.com](http://www.edwarddonner.com)
- **Email**: ed [at] edwarddonner [dot] com
- **Follow us on Social Media**: 
  - [LinkedIn](https://linkedin.com)
  - [Twitter](https://twitter.com)
  - [Facebook](https://facebook.com)

Together, we can raise the level of human prosperity and make meaningful connections in the world of work.

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business applications</h2>
            <span style="color:#181;">In this exercise we extended the Day 1 code to make multiple LLM calls, and generate a document.

This is perhaps the first example of Agentic AI design patterns, as we combined multiple calls to LLMs. This will feature more in Week 2, and then we will return to Agentic AI in a big way in Week 8 when we build a fully autonomous Agent solution.

Generating content in this way is one of the very most common Use Cases. As with summarization, this can be applied to any business vertical. Write marketing content, generate a product tutorial from a spec, create personalized email content, and so much more. Explore how you can apply content generation to your business, and try making yourself a proof-of-concept prototype. See what other students have done in the community-contributions folder -- so many valuable projects -- it's wild!</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you move to Week 2 (which is tons of fun)</h2>
            <span style="color:#900;">Please see the week1 EXERCISE notebook for your challenge for the end of week 1. This will give you some essential practice working with Frontier APIs, and prepare you well for Week 2.</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">A reminder on 3 useful resources</h2>
            <span style="color:#f71;">1. The resources for the course are available <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">here.</a><br/>
            2. I'm on LinkedIn <a href="https://www.linkedin.com/in/eddonner/">here</a> and I love connecting with people taking the course!<br/>
            3. I'm trying out X/Twitter and I'm at <a href="https://x.com/edwarddonner">@edwarddonner<a> and hoping people will teach me how it's done..  
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../thankyou.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#090;">Finally! I have a special request for you</h2>
            <span style="color:#090;">
                My editor tells me that it makes a MASSIVE difference when students rate this course on Udemy - it's one of the main ways that Udemy decides whether to show it to others. If you're able to take a minute to rate this, I'd be so very grateful! And regardless - always please reach out to me at ed@edwarddonner.com if I can help at any point.
            </span>
        </td>
    </tr>
</table>