# Web page Summarizer

In [1]:
import os
from dotenv import load_dotenv
from scraper import fetch_website_contents
from extract_pdf import extract_pdf_text_any
from IPython.display import Markdown, display
from openai import OpenAI


### Connect to LLMs

In [2]:

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if not api_key:
    print('No API key found - please set the OPENAI_API_KEY environment variable')
elif not api_key.startswith('sk-proj'):
    print('Invalid API key - please use the right project API key')
elif api_key.strip() != api_key:
    print('API key has whitespace - please check the OPENAI_API_KEY environment variable')
else:
    print('API key is valid')

API key is valid


### Call openAI LLM

In [3]:
message = 'Hello, GPT! This is my first time using you.'

messages = [{'role': 'user', 'content': message}]

messages

[{'role': 'user', 'content': 'Hello, GPT! This is my first time using you.'}]

In [4]:
openai = OpenAI()

response = openai.chat.completions.create(model='gpt-5-nano', messages=messages)

response.choices[0].message.content

'Hi there! Welcome — I’m glad you’re here. I’m ChatGPT, and I can help with lots of things like explaining ideas, writing and editing, brainstorming, coding help, planning, and more.\n\nA quick idea of what I can do:\n- Explain concepts in simple terms\n- Draft or edit emails, essays, resumes, or social posts\n- Summarize articles or long texts\n- Help you plan projects, trips, or study schedules\n- Debug or write code (and explain it)\n- Brainstorm ideas or organize tasks\n- Translate or practice a new language\n\nHow to get started:\n- Just ask a question or describe a task. The more specific you are, the better I can help.\n- Tell me your target format if you have one (bullet list, steps, a short answer, code, etc.).\n- If you share context or constraints (tone, length, audience, deadline), I’ll tailor the response.\n\nA few starter prompts you can copy-paste:\n- “Explain quantum computing in simple terms.”\n- “Draft a friendly email to Jane asking for feedback on my project.”\n- “C

## Web Summariser

In [5]:
dt_careers = fetch_website_contents('https://careers.dynatrace.com/jobs/')
print(dt_careers)

Jobs | Dynatrace Careers

All Dynatrace jobs
Innovate, push boundaries, and make big things happen alongside a highly skilled and motivated team.
Spotlight jobs
Senior AI Software Developer & Mentor
Linz, Hagenberg, Austria
View opportunity
Senior Fullstack Engineer (Java & React)
Linz, Austria
View opportunity
Senior Product Security Engineer
Linz, Austria
View opportunity
Principal Software Engineer
Linz, Austria
View opportunity
Senior Software Engineer (Kubernetes)
Linz, Austria
View opportunity
Search jobs
Saved jobs
List view
Grid view
Team
Select...
Country
Select...
City
Select...
Work Model
Select...
Clear all filters
Saved jobs
List view
Grid view
Dynatrace takes the security and privacy of prospective employees seriously.
Please be cautious of phishing, scams, and other fraudulent job-related communications that claim to be from our company or affiliated with us. Don’t respond, click on any links, or download any attachments from an email that might be fraudulent.
Be aware t

### Prompt types
- A System Prompt tells an LLM what task it is performing and what tone it should use
- A User Prompt is the conversation starter that a user expect and LLM to reply to

In [6]:
system_prompt = """
You are a snarky assistant that analyzes the contents of a a website, and provides a short, snarky, humorous summary,
ignoring text that might be navigation related.
Respond in markdown. Do not mark the markdown in a code block - respond just with the markdown.
"""


In [7]:
user_prompt = """
Here are the contents of the website.
Provide a short summary of this website. Focusing on AI releated roles.
If it includes news or announcements, then summarize these too.
"""




### Messages

The API from OpenAI expects to receive messages in a particular structure. Many of the other APIs share this structure:
```
[
    {'role': 'system', 'content': 'system message goes here'},
    {'role': 'user', 'content': 'user message goes here'}
]
```

In [8]:
# simple message example
messages = [
    {'role': 'system', 'content': 'You are helpful assistant'},
    {'role': 'user', 'content': 'What is 2 + 2?'}
]

response = openai.chat.completions.create(model='gpt-4.1-nano', messages=messages)
response.choices[0].message.content

'2 + 2 equals 4.'

In [9]:
# the following function creates the format of the LLM message structure
def messages_for(website):
    return [
        {'role': 'system', 'content': system_prompt},
        {'role': 'user', 'content': user_prompt + website}
    ]

In [10]:
messages_for(dt_careers)

[{'role': 'system',
  'content': '\nYou are a snarky assistant that analyzes the contents of a a website, and provides a short, snarky, humorous summary,\nignoring text that might be navigation related.\nRespond in markdown. Do not mark the markdown in a code block - respond just with the markdown.\n'},
 {'role': 'user',
  'content': '\nHere are the contents of the website.\nProvide a short summary of this website. Focusing on AI releated roles.\nIf it includes news or announcements, then summarize these too.\nJobs | Dynatrace Careers\n\nAll Dynatrace jobs\nInnovate, push boundaries, and make big things happen alongside a\u202fhighly skilled\u202fand motivated team.\nSpotlight jobs\nSenior AI Software Developer & Mentor\nLinz, Hagenberg, Austria\nView opportunity\nSenior Fullstack Engineer (Java & React)\nLinz, Austria\nView opportunity\nSenior Product Security Engineer\nLinz, Austria\nView opportunity\nPrincipal Software Engineer\nLinz, Austria\nView opportunity\nSenior Software Engin

### The actual summarizer function

In [11]:
# Call openAPI  API

def summarize(url):
    website = fetch_website_contents(url)
    response = openai.chat.completions.create(model='gpt-4.1-mini', messages=messages_for(website))
    return response.choices[0].message.content

In [12]:
summarize('https://careers.dynatrace.com/jobs/')

'# Dynatrace Careers: AI Job Edition\n\nLooking to merge your AI wizardry with job hunting? Dynatrace has a spotlight on a **Senior AI Software Developer & Mentor** role in Austria, clearly seeking someone who can code and also school others in AI sorcery. Other gigs are mostly your usual suspects—fullstack, security, and Kubernetes engineers—all seasoned senior roles.\n\nIn case you’re wondering, they don’t fall for shady recruitment scams. They’re basically the "official mafia" of job offers: only calls on Zoom or phone, proper @dynatrace.com emails, and zero requests for your life\'s savings or passport info. So, if you stumble across a “Dynatrace job” promising you’ll make AI fly while needing a bribe—run.\n\nNo juicy news or announcements here, just a straightforward, no-nonsense job board with a sprinkle of AI brilliance and a hard line on recruitment scams.'

In [13]:
# Display the output nicely

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [14]:
display_summary('https://careers.dynatrace.com/jobs/')

# Dynatrace Careers: AI Edition (aka Where the Robots Get Their Code)

Looking to get your hands dirty in AI at Dynatrace? You're in luck! They've got a **Senior AI Software Developer & Mentor** spot open in Austria. So if you fancy yourself a brainy coder who can both create AI magic and beam up some wisdom to juniors, this is your stage.

Other tech roles abound, but if you’re after AI, keep your eyes on Linz and Hagenberg, Austria for that AI guru gig. 

On the announcements front, it’s mostly a PSA about job scams. Dynatrace wants to make crystal clear they only do legit hiring via their official sites and emails, so don’t fall for any shady "We want your money and soul" job scams posing as them. Interview calls over Zoom, no WhatsApp stalking, no money transfers. Got it? Good.

So, if you want to innovate with AI and avoid scams, Dynatrace is waving for you from Austria. Just bring your coding chops and a healthy skepticism for weird job offers.

In [15]:
display_summary('https://cnn.com')

Ah, CNN’s website: the digital newsroom buffet where you can feast on world drama, politics, and the occasional tech blip—if you manage to survive the ad onslaught. But when it comes to AI roles or news? Nada. The site is more about global crises, elections, and celebrity gossip than artificial intellects taking over the planet. If you want AI job listings or updates, you might want to look elsewhere because here it’s all “news, news, and more news” with a sprinkle of “we value your feedback” on ad annoyances.

In summary? CNN’s website is your go-to for breaking news, not AI careers or announcements—unless the AI uprising is the next headline, and then they’ll have you covered. Until then, just enjoy the non-AI related chaos.

In [16]:
display_summary('https://anthropic.com')

# Anthropic: AI with a Side of Safety and Existential Caution

So, Anthropic is basically that public benefit corporation trying to make AI that’s smart *and* safe because apparently, AI apocalypse is so 2020. Their flagship stuff—Claude and its spinoffs Sonnet and Haiku (version 4.5 if you’re counting)—are designed to be top-notch at agents, coding, and general computer wizardry. 

They’re super serious about “securing benefits and mitigating risks,” which is just a fancy way of saying "Don’t let our AI go all Skynet." They throw around words like "Responsible Scaling Policy" and have a "Trust Center" because, hey, transparency in AI is the new black.

If you’re hunting AI gigs, Anthropic’s careers page (buried somewhere in that menu) is where the action is. They’re all about research, engineering, and developer platforms for those who want to build on top of their AI magic.

**News flash:** They recently dropped Claude Sonnet 4.5 and Claude Haiku 4.5—because what’s better than poetry-themed AI names? They’re bragging about managing context better and generally being the "best model in the world" for smart tasks.

In summary: Anthropic is your go-to if you want AI that’s powerful enough to impress but tame enough not to destroy humanity. And yes, they have apps and platforms if you want to mess around with their brainchildren.

In [17]:
display_summary('https://openai.com')

Ah yes, OpenAI's treasure trove of AI wizardry. The site is basically a playground for AI nerds and code sorcerers with a heavy focus on the latest and greatest AI research projects like GPT-4.5, GPT-5, and their mischievous sibling models (o3, o4-mini, and the cryptically named o1 and 4o). They've got a Research Residency for those who want to dive deep into AI black magic and safety procedures to keep the robots from taking over your toaster.

If you're hunting for AI gigs, skim the Careers section—they're always on the lookout for brave souls to join their legion of AI developers, researchers, and safety experts shaping the future one neural net at a time.

News? There's a livestream about their Foundation's commitments and structure, because you can’t have AI domination without a solid corporate manifesto. 

In short: OpenAI’s website is like an all-you-can-eat buffet of AI innovation with a side of ethical mumbo-jumbo and a hint of cutting-edge tech to make your job hunt worthwhile.

## PDF document summurizer

In [18]:
system_prompt = """
You are a precise, trustworthy summarizer. Be concise, factual, and neutral.
Prefer extractive phrasing (use the document’s own wording) over invention.
Preserve key numbers, names, and definitions. If unsure, say so.
When the input includes page refs, keep them in the output.

"""

user_prompt = """
Summarize the following PDF chunk into 5–8 concise bullet points.
– Preserve key facts, numbers, and definitions.
– Include page refs if present in the chunk header.
– Do not repeat points.
– No opinions or speculation.

"""

In [19]:
# message for pdf summarisations

def message_for_pdf(text):
    system_prompt = "You are a precise, factual summarizer. Be concise and neutral."
    user_prompt = "Summarize the following PDF content in 6–10 concise bullet points:\n\n"
    messages = [
        {"role":"system","content":system_prompt},
        {"role":"user","content":user_prompt + text},
    ]
    return  messages


In [20]:
def llm_call(messages, model="gpt-4.1-mini", temperature=0):
    """
    messages: list of {"role": "...", "content": "..."} dicts
    returns: string (assistant message content)
    """
    resp = openai.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
    )
    return resp.choices[0].message.content.strip()

In [21]:
def summarize_pdf_text(path, llm_call):
    content = extract_pdf_text_any(path)
    messages = message_for_pdf(content)

    return llm_call(messages)


In [22]:
def display_pdf_summary(summary):
    display(Markdown(summary))

In [24]:
path = 'content/Statistics NG.pdf'
summary = summarize_pdf_text(path, llm_call)
display_pdf_summary(summary)


Could get FontBBox from font descriptor because None cannot be parsed as 4 floats


Could get FontBBox from font descriptor because None cannot be parsed as 4 floats
Could get FontBBox from font descriptor because None cannot be parsed as 4 floats
Could get FontBBox from font descriptor because None cannot be parsed as 4 floats


- Statistics Nigeria is a sovereign, independent data intelligence company focused on transforming Nigeria and Africa’s understanding through verified, ethical, and AI-enhanced statistics across sectors like population, economy, environment, governance, and education.

- Its mission is to empower Nigeria’s sustainable development by turning verified data into actionable insights, aiming to make Nigeria the most data-driven and truth-empowered nation in Africa.

- Core values include integrity, objectivity, innovation (AI, blockchain, cloud), collaboration, inclusion, and impact, ensuring data is transparent, non-politicized, accessible, and drives measurable progress.

- Key strategic domains cover Population & Society, Economy, Environment & Energy, Public Sector & Governance, Fact-Checking & Truth Intelligence, Education & Data Literacy, and Open Data & Innovation, each producing specialized reports, dashboards, and APIs.

- The organization develops AI-powered fact-checking tools, misinformation detection models, and offers data literacy training, partnering with universities and media to build a data-literate society and protect Nigeria’s information ecosystem.

- Statistics Nigeria is creating a National Data Governance and Privacy Framework emphasizing African data ethics, sovereign data principles, privacy by design, and independent oversight to ensure ethical, secure, and transparent data management.

- Target users include government, businesses, media, researchers, educators, and citizens, providing them with reliable data for policymaking, economic planning, journalism, academic research, and informed civic participation.

- Its impact goals are to build public trust through transparency, enable evidence-based leadership, promote critical data literacy among youth, establish an African-led data governance model, and position Nigeria as a continental hub for data innovation.