## Task Overview

We will take this project from Day 1 to the next level by building a complete business solution.

Business Challenge

Create a product that generates a professional company brochure designed for prospective clients, investors, and potential recruits.

The product will take as input:

- A company name
- The company’s primary website

Using this information, the system will produce a polished brochure suitable for real-world business use.

Refer to the end of this notebook for examples of practical business applications.

In [24]:
# imports
import os
import json
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from web_scraper import fetch_website_links, fetch_website_contents
from openai import OpenAI

In [25]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")

API key looks good so far


In [26]:
MODEL = 'gpt-5-nano'
openai = OpenAI()

In [27]:
links = fetch_website_links("https://www.techtarget.com/")
links

['https://www.techtarget.com/',
 'https://www.techtarget.com/register',
 'https://www.techtarget.com/login',
 'https://www.techtarget.com/register',
 '#',
 'https://www.informatechtarget.com',
 'https://www.techtarget.com/searchenterpriseai/',
 'https://www.techtarget.com/searchsecurity/',
 'https://www.techtarget.com/searchnetworking/',
 'https://www.techtarget.com/searchcustomerexperience/',
 'https://www.techtarget.com/searchbusinessanalytics/',
 '#',
 'https://www.techtarget.com/searchcloudcomputing/',
 'https://www.techtarget.com/searchhrsoftware/',
 'https://www.techtarget.com/searchitoperations/',
 'https://x.com/TechTargetNews',
 'https://www.linkedin.com/showcase/techtarget-news/',
 'https://www.youtube.com/c/EyeonTech',
 'https://www.tiktok.com/@eyeontech',
 'https://www.informatechtarget.com/',
 'https://www.techtarget.com/searchsecurity/resources/Threats-and-vulnerabilities',
 'https://www.techtarget.com/searchsecurity/news/366638312/News-brief-Patch-critical-and-high-sever

Identify Relevant Links Using GPT-5-nano
Task Description

Use a call to GPT-5-nano to scan and interpret the links found on a company’s webpage and return the results in structured JSON format.

The model should:

Determine which links are relevant for building a company brochure

Convert relative links such as /about into fully qualified URLs like https://company.com/about

We will use a one-shot prompting approach, providing a single example in the prompt that demonstrates the expected JSON response format.

This is a strong use case for an LLM because it requires contextual and semantic judgment. Implementing the same logic with traditional parsing and rule-based code would be complex and brittle.

Note

There is a more advanced approach called Structured Outputs, where the model is constrained to follow a predefined response schema. This technique will be covered later during the autonomous agent project.

In [28]:
link_system_prompt = """
You are provided with a list of links found on a webpage.
You are able to decide which of the links would be most relevant to include in a brochure about the company,
such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:

{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}
"""

In [29]:
def get_links_user_prompt(url, links):
    return f"""
Website: {url}

Here are the links found on the site:
{links}

Select the most relevant links for a company brochure.
Return JSON in this format:
{{
  "links": [
    {{"type": "about", "url": "https://..."}},
    {{"type": "products", "url": "https://..."}},
    {{"type": "careers", "url": "https://..."}}
  ]
}}
"""

In [31]:
print(get_links_user_prompt("https://www.techtarget.com/", links))


Website: https://www.techtarget.com/

Here are the links found on the site:
['https://www.techtarget.com/', 'https://www.techtarget.com/register', 'https://www.techtarget.com/login', 'https://www.techtarget.com/register', '#', 'https://www.informatechtarget.com', 'https://www.techtarget.com/searchenterpriseai/', 'https://www.techtarget.com/searchsecurity/', 'https://www.techtarget.com/searchnetworking/', 'https://www.techtarget.com/searchcustomerexperience/', 'https://www.techtarget.com/searchbusinessanalytics/', '#', 'https://www.techtarget.com/searchcloudcomputing/', 'https://www.techtarget.com/searchhrsoftware/', 'https://www.techtarget.com/searchitoperations/', 'https://x.com/TechTargetNews', 'https://www.linkedin.com/showcase/techtarget-news/', 'https://www.youtube.com/c/EyeonTech', 'https://www.tiktok.com/@eyeontech', 'https://www.informatechtarget.com/', 'https://www.techtarget.com/searchsecurity/resources/Threats-and-vulnerabilities', 'https://www.techtarget.com/searchsecurity

In [32]:
def select_relevant_links(url):
    print(f"Selecting relevant links for {url} by calling {MODEL}")
    site_links = fetch_website_links(url)

    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(url, site_links)}
        ],
        response_format={"type": "json_object"}
    )

    result = response.choices[0].message.content
    links = json.loads(result)

    print(f"Found {len(links['links'])} relevant links")
    return links

In [33]:
select_relevant_links("https://www.techtarget.com/")

Selecting relevant links for https://www.techtarget.com/ by calling gpt-5-nano
Found 1 relevant links


{'links': [{'type': 'about page', 'url': 'https://www.techtarget.com/'}]}

## Making Brochure

Now that we got the links in json format we will now make the brochure by using an llm gpt5-nano.

In [41]:
from urllib.parse import urljoin

def fetch_page_and_all_relevant_links(url):
    contents = fetch_website_contents(url)
    relevant_links = select_relevant_links(url)  # dict with "links"

    result = f"## Landing Page:\n\n{contents}\n\n## Relevant Links:\n"

    for link in relevant_links.get("links", []):
        link_type = link.get("type", "unknown")
        href = link.get("url", "")

        if not href:
            continue

        full_url = urljoin(url, href)

        # Include the actual link in the brochure output
        result += f"\n\n### {link_type}\n"
        result += f"Source: {full_url}\n\n"

        # Fetch and include the content
        result += fetch_website_contents(full_url)

    return result

In [42]:
print(fetch_page_and_all_relevant_links("https://www.techtarget.com/"))

Selecting relevant links for https://www.techtarget.com/ by calling gpt-5-nano
Found 6 relevant links
## Landing Page:

TechTarget - Global Network of Information Technology Websites and Contributors

TechTarget
Search the TechTarget Network
Sign-up now. Start my free, unlimited access.
Login
Register
Explore the Network
Informa TechTarget Products & Services
Enterprise AI
Security
Networking
Customer Experience
Business Analytics
More Topics
Cloud Computing
HRSoftware
IT Operations
Follow:
Looking for information about Informa TechTarget products and services? The commercial homepage has moved.
Visit Informa TechTarget
News
30 Jan 2026 /
Threats & Vulnerabilities
News brief: Patch critical and high-severity vulnerabilities now
Check out the latest security news from the Informa TechTarget team.
29 Jan 2026 /
Cloud Providers
Microsoft Maia 200 AI chip could boost cloud GPU supply
Industry watchers predict ancillary effects for enterprise cloud buyers from Microsoft's AI accelerator lau

In [43]:
brochure_system_prompt = """
You are an assistant that analyzes content from a company's website and creates a concise, engaging brochure for potential customers, investors, and recruits.
Respond in markdown.
Highlight the company's offerings, mission and culture, customer base, and career opportunities when such information is available.
"""

In [44]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"""
You are looking at a company called: {company_name}
Here are the contents of its landing page and other relevant pages;
use this information to build a short brochure of the company in markdown without code blocks.\n\n
"""
    user_prompt += fetch_page_and_all_relevant_links(url)
    user_prompt = user_prompt[:5_000] # 5000 characters
    return user_prompt

In [45]:
brochure = get_brochure_user_prompt("TechTarget", "https://www.techtarget.com/")

Selecting relevant links for https://www.techtarget.com/ by calling gpt-5-nano
Found 7 relevant links


In [46]:
display(Markdown(brochure))


You are looking at a company called: TechTarget
Here are the contents of its landing page and other relevant pages;
use this information to build a short brochure of the company in markdown without code blocks.


## Landing Page:

TechTarget - Global Network of Information Technology Websites and Contributors

TechTarget
Search the TechTarget Network
Sign-up now. Start my free, unlimited access.
Login
Register
Explore the Network
Informa TechTarget Products & Services
Enterprise AI
Security
Networking
Customer Experience
Business Analytics
More Topics
Cloud Computing
HRSoftware
IT Operations
Follow:
Looking for information about Informa TechTarget products and services? The commercial homepage has moved.
Visit Informa TechTarget
News
30 Jan 2026 /
Threats & Vulnerabilities
News brief: Patch critical and high-severity vulnerabilities now
Check out the latest security news from the Informa TechTarget team.
29 Jan 2026 /
Cloud Providers
Microsoft Maia 200 AI chip could boost cloud GPU supply
Industry watchers predict ancillary effects for enterprise cloud buyers from Microsoft's AI accelerator launch this week, from GPU availability to Nvidia disruption.
28 Jan 2026 /
Data Management Strategies
Alteryx launches in-warehouse data prep tool for BigQuery
Live Query for BigQuery eliminates the need to move data between systems, saving customers from spending on data egress and reducing the risk of security leaks.
28 Jan 2026 /
Automation & Orchestration
Dynatrace AI agents draw on new observability integrations
Dynatrace rolls out AI agents grounded in a newly consolidated observability data back end and user interfaces, as enterprise AI ROI now hinges on context engineering.
28 Jan 2026 /
BI Technology
Domo adds App Catalyst to platform to aid AI development
By combining natural language code generation with enterprise-grade security and governance, the vendor aims to help customers more successfully build cutting-edge applications.
27 Jan 2026 /
Data Integration
Teradata's AgentStack aims to simplify building, managing AI
Featuring capabilities for developing, deploying and governing agents, the vendor's new suite addresses many of the problems enterprises face when trying to launch agentic system

## Relevant Links:


### about
Source: https://www.techtarget.com/

TechTarget - Global Network of Information Technology Websites and Contributors

TechTarget
Search the TechTarget Network
Sign-up now. Start my free, unlimited access.
Login
Register
Explore the Network
Informa TechTarget Products & Services
Enterprise AI
Security
Networking
Customer Experience
Business Analytics
More Topics
Cloud Computing
HRSoftware
IT Operations
Follow:
Looking for information about Informa TechTarget products and services? The commercial homepage has moved.
Visit Informa TechTarget
News
30 Jan 2026 /
Threats & Vulnerabilities
News brief: Patch critical and high-severity vulnerabilities now
Check out the latest security news from the Informa TechTarget team.
29 Jan 2026 /
Cloud Providers
Microsoft Maia 200 AI chip could boost cloud GPU supply
Industry watchers predict ancillary effects for enterprise cloud buyers from Microsoft's AI accelerator launch this week, from GPU availability to Nvidia disruption.
28 Jan 2026 /
Data Management Strategies
Alteryx launches in-warehouse data prep tool for BigQuery
Live Query for BigQuery eliminates the need to move data between systems, saving customers from spending on data egress and reducing the risk of security leaks.
28 Jan 2026 /
Automation & Orchestration
Dynatrace AI agents draw on new observability integrations
Dynatrace rolls out AI agents grounded in a newly consolidated observability data back end and user interfaces, as enterprise AI ROI now hinges on context engineering.
28 Jan 2026 /
BI Technology
Domo adds App Catalyst to platform to aid AI development
By combining natural language code generation with enterprise-grade security and governance, the vendor aims to help customers more successfully build cutting-edge applications.
27 Jan 2026 /
Data Integration
Teradata's AgentStack aims to simplify building, managing AI
Featuring capabilities for developing, deploying and governing agents, the vendor's new suite addresses many of the problems enterprises face when trying to launch agentic system

### press
Source: https://www.techtarget.com/news/

TechTarget Enterprise Technology News

TechTarget
Search the TechTarget Network
Sign-up now. Start my free, unlimited access.
Login
Register
Explore the Network
Informa TechTarget Products & Services
Enterprise AI
Security
Networking
Customer Experience
Business Analytics
More Topics
Cloud Computing
HRSoftware
IT Operations
Follow:
TechTarget News
News from TechTarget's global network of independent journalists. Stay current with the latest news stories. Browse thousands of articles covering hundreds of focused tech and business topics available on TechTarget's platform.
Latest News
30 Jan 2026
News brief: Patch critical and high-severi