## Summarize a webpage

In [2]:
import requests
from bs4 import BeautifulSoup

In [4]:
# Define/Choose the LLM models to use
models = ['gemma:2b', 'gemma2:latest', 'phi4:latest', 'llama3.2:latest','deepseek-r1:8b','qwen2.5-coder:7b']
model_name = models[-1]

In [6]:
import ollama

In [8]:
# Test if ollama is working

result = ollama.generate(
    model=model_name, 
    prompt="Explain the concept of a black hole in simple terms, as if you were talking to a child.", 
)

# Inspect and parse the result['response']
response_str = result['response']
print(response_str)

Imagine you have a super heavy ball made of rubber. If you squeeze it very hard, it gets smaller and smaller until it becomes so small that nothing can get out of it not even light! That's kind of like what a black hole is.

A black hole is like a cosmic vacuum cleaner in space. It has such a strong pull because it has so much mass packed into a tiny space. This strong pull is called gravity. Even though black holes are invisible, we know they're there because their gravity affects the things around them.

Think of it like this: if you threw a ball at a black hole, it would never come back out again! It's like the ball has been swallowed by a giant cosmic magnet that can't let go.

Scientists use special tools and observations to find black holes. They're very exciting because they help us understand more about how the universe works!


### Displaying Markdown in Jupyter Notebook

In [None]:
from IPython.display import Markdown, display

def output(reponse):
    """
    Output the response in markdown format
    """
    return display(Markdown(reponse))

In [12]:
output(response_str)

Imagine you have a super heavy ball made of rubber. If you squeeze it very hard, it gets smaller and smaller until it becomes so small that nothing can get out of it not even light! That's kind of like what a black hole is.

A black hole is like a cosmic vacuum cleaner in space. It has such a strong pull because it has so much mass packed into a tiny space. This strong pull is called gravity. Even though black holes are invisible, we know they're there because their gravity affects the things around them.

Think of it like this: if you threw a ball at a black hole, it would never come back out again! It's like the ball has been swallowed by a giant cosmic magnet that can't let go.

Scientists use special tools and observations to find black holes. They're very exciting because they help us understand more about how the universe works!

In [13]:
from urllib.parse import urljoin

In [14]:
def scrape_website(url):
    """
    Scrape a website to extract title, image links, and URLs.
    
    Args:
        url (str): The URL of the website to scrape
        
    Returns:
        tuple: Contains page title, list of image links, and list of URLs
    """
    try:
        # Send HTTP request and get response
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        }
        response = requests.get(url, headers=headers)
        response.raise_for_status()  # Raise exception for bad status codes
        
        # Create BeautifulSoup object
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Extract page title
        page_title = soup.title.string if soup.title else "No title found"

        # Extract body content
        body_content = ""
        if soup.body:
            # Remove script and style elements
            for element in soup.body(['script', 'style']):
                element.decompose()
            body_content = soup.body.get_text(separator='\n', strip=True)
        
        # Extract all image links
        images = []
        for img in soup.find_all('img'):
            img_url = img.get('src')
            if img_url:
                # Convert relative URLs to absolute URLs
                absolute_img_url = urljoin(url, img_url)
                images.append({
                    'url': absolute_img_url,
                    'alt': img.get('alt', 'No alt text')
                })
        
        # Extract all URLs
        urls = []
        for link in soup.find_all('a'):
            href = link.get('href')
            if href:
                # Convert relative URLs to absolute URLs
                absolute_url = urljoin(url, href)
                urls.append({
                    'url': absolute_url,
                    'text': link.text.strip()
                })
        
        return page_title, body_content, images, urls
        
    except requests.RequestException as e:
        print(f"Error fetching the website: {e}")
        return None, None, None, None

#### Provide URL and extract contents

In [15]:
title, body, images, urls = scrape_website("https://www.github.com")

In [16]:
title

'GitHub · Build and ship software on a single, collaborative platform · GitHub'

In [17]:
output(body)

Skip to content
GitHub Copilot is now available for free.
Learn more
Navigation Menu
Toggle navigation
Sign in
Product
GitHub Copilot
Write better code with AI
Security
Find and fix vulnerabilities
Actions
Automate any workflow
Codespaces
Instant dev environments
Issues
Plan and track work
Code Review
Manage code changes
Discussions
Collaborate outside of code
Code Search
Find more, search less
Explore
All features
Documentation
GitHub Skills
Blog
Solutions
By company size
Enterprises
Small and medium teams
Startups
Nonprofits
By use case
DevSecOps
DevOps
CI/CD
View all use cases
By industry
Healthcare
Financial services
Manufacturing
Government
View all industries
View all solutions
Resources
Topics
AI
DevOps
Security
Software Development
View all
Explore
Learning Pathways
White papers, Ebooks, Webinars
Customer Stories
Partners
Executive Insights
Open Source
GitHub Sponsors
Fund open source developers
The ReadME Project
GitHub community articles
Repositories
Topics
Trending
Collections
Enterprise
Enterprise platform
AI-powered developer platform
Available add-ons
Advanced Security
Enterprise-grade security features
GitHub Copilot
Enterprise-grade AI features
Premium Support
Enterprise-grade 24/7 support
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search
Clear
Search syntax tips
Provide feedback
We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
Name
Query
To see all available qualifiers, see our
documentation
.
Cancel
Create saved search
Sign in
Sign up
Reseting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
Build and ship software on a single, collaborative platform
Join the world’s most widely adopted AI-powered developer platform.
Enter your email
Sign up for GitHub
Try GitHub Copilot
GitHub features
A demonstration animation of a code editor using GitHub Copilot Chat, where the user requests GitHub Copilot to refactor duplicated logic and extract it into a reusable function for a given code snippet.
Code
Plan
Collaborate
Automate
Secure
Code
Build code quickly and more securely with GitHub Copilot embedded throughout your workflows.
GitHub is used by
Pause
Accelerate performance
With GitHub Copilot embedded throughout the platform, you can simplify your toolchain, automate tasks, and improve the developer experience.
A Copilot chat window with extensions enabled. The user inputs the @ symbol to reveal a list of five Copilot Extensions. @Sentry is selected from the list, which shifts the window to a chat directly with that extension. There are three sample prompts at the bottom of the chat window, allowing the user to Get incident information, Edit status on incident, or List the latest issues. The last one is activated to send the prompt: @Sentry List the latest issues. The extension then lists several new issues and their metadata.
Work 55% faster.
Jump to footnote
1
Increase productivity with AI-powered coding assistance, including code completion, chat, and more.
Explore GitHub Copilot
Duolingo boosts developer speed by 25% with GitHub Copilot
Read customer story
2024 Gartner® Magic Quadrant™ for AI Code Assistants
Read report
Automate any workflow
Optimize your process with simple and secured CI/CD.
A list of workflows displays a heading ‘45,167 workflow runs’ at the top. Below are five rows of completed workflows accompanied by their completion time and their duration formatted in minutes and seconds.
Discover GitHub Actions
Get up and running in seconds
Start building instantly with a comprehensive dev environment in the cloud.
A GitHub Codespaces setup for the landing page of a game called OctoInvaders. On the left is a code editor with some HTML and Javascript files open. On the right is a live render of the page. In front of this split editor window is a screenshot of two active GitHub Codespaces environments with their branch names and a button to ‘Create codespace on main.’
Check out GitHub Codespaces
Build on the go
Manage projects and chat with GitHub Copilot from anywhere.
Two smartphone screens side by side. The left screen shows a Notification inbox, listing issues and pull requests from different repositories like TensorFlow and GitHub’s OctoArcade octoinvaders. The right screen shows a new conversation in GitHub Copilot chat.
Download GitHub Mobile
Integrate the tools you love
Sync with 17,000+ integrations and a growing library of Copilot Extensions.
A grid of fifty app tiles displays logos for integrations and extensions for companies like Stripe, Slack, and Docker. The tiles extend beyond the bounds of the image to indicate a wide array of apps.
Visit GitHub Marketplace
Built-in application security
where found means fixed
Use AI to find and fix vulnerabilities—freeing your teams to ship more secure software faster.
Apply fixes in seconds.
Spend less time fixing vulnerabilities and more time building features with Copilot Autofix.
Explore GitHub Advanced Security
Solve security debt.
Leverage AI-assisted security campaigns to reduce application vulnerabilities and zero-day attacks.
Discover security campaigns
Dependencies you can depend on.
Update vulnerable dependencies with supported fixes for breaking changes.
Learn about Dependabot
Your secrets, your business: protected.
Detect, prevent, and remediate leaked secrets across your organization.
Read about secret scanning
7x faster
vulnerability fixes with GitHub
Jump to footnote
2
90% coverage
of alert types in all supported languages with Copilot Autofix
Work together, achieve more
Collaborate with your teams, use management tools that sync with your projects, and code from anywhere—all on a single, integrated platform.
Your workflows, your way.
Plan effectively with an adaptable spreadsheet that syncs with your work.
Jump into GitHub Projects
“
It helps us onboard new software engineers and get them productive right away. We have all our source code, issues, and pull requests in one place... GitHub is a complete platform that frees us from menial tasks and enables us to do our best work.
Fabian Faulhaber
Application manager at Mercedes-Benz
Keep track of your tasks
Create issues and manage projects with tools that adapt to your code.
Display of task tracking within an issue, showing the status of related sub-issues and their connection to the main issue.
Explore GitHub Issues
Share ideas and ask questions
Create space for open-ended conversations alongside your project.
A GitHub Discussions thread where a GitHub user suggests a power-up idea involving Hubot revealing a path and protecting Mona. The post has received 5 upvotes and several reactions. Below, three other users add to the discussion, suggesting Hubot could provide different power-ups depending on levels and appreciating the collaboration idea.
Discover GitHub Discussions
Review code changes together
Create review processes that improve code quality and fit neatly into your workflow.
Two code review approvals by helios-ackmore and amanda-knox, which are followed by three successful checks for ‘Build,’ ‘Test,’ and ‘Publish.’
Learn about code review
Fund open source projects
Become an open source partner and support the tools and libraries that power your work.
A GitHub Sponsors popup displays ‘$15,000 a month’ with a progress bar showing 87% towards a $15,000 goal.
Dive into GitHub Sponsors
From startups to enterprises,
GitHub scales
with teams of any size in any industry.
By industry
By size
By use case
By industry
Technology
Figma streamlines development and strengthens security
Read customer story
Automotive
Mercedes-Benz standardizes source code and automates onboarding
Read customer story
Financial services
Mercado Libre cuts coding time by 50%
Read customer story
Explore customer stories
View all solutions
Millions of developers and businesses call GitHub home
Whether you’re scaling your development process or just learning how to code, GitHub is where you belong. Join the world’s most widely adopted AI-powered developer platform to build the technologies that redefine what’s possible.
Enter your email
Sign up for GitHub
Try GitHub Copilot
Footnotes
Survey: The AI wave continues to grow on software development teams, 2024.
This 7X times factor is based on data from the industry’s longest running analysis of fix rates Veracode State of Software Security 2023, which cites the average time to fix 50% of flaws as 198 days vs. GitHub’s fix rates of 72% of flaws with in 28 days which is at a minimum of 7X faster when compared.
Site-wide Links
Subscribe to our developer newsletter
Get tips, technical guides, and best practices. Twice a month. Right in your inbox.
Subscribe
Product
Features
Enterprise
Copilot
Security
Pricing
Team
Resources
Roadmap
Compare GitHub
Platform
Developer API
Partners
Education
GitHub CLI
GitHub Desktop
GitHub Mobile
Support
Docs
Community Forum
Professional Services
Premium Support
Skills
Status
Contact GitHub
Company
About
Customer stories
Blog
The ReadME Project
Careers
Newsroom
Inclusion
Social Impact
Shop
©
2025
GitHub, Inc.
Terms
Privacy
(
Updated
02/2024
)
Sitemap
What is Git?
Manage cookies
Do not share my personal information
GitHub on LinkedIn
Instagram
GitHub on Instagram
GitHub on YouTube
GitHub on X
TikTok
GitHub on TikTok
Twitch
GitHub on Twitch
GitHub’s organization on GitHub
You can’t perform that action at this time.

In [18]:
images

[{'url': 'https://github.githubassets.com/assets/shopify-3fed3011ac81.svg',
  'alt': 'Shopify'},
 {'url': 'https://github.githubassets.com/assets/ey-7434b194d68c.svg',
  'alt': 'EY'},
 {'url': 'https://github.githubassets.com/assets/figma-04e0038c0fef.svg',
  'alt': 'Figma'},
 {'url': 'https://github.githubassets.com/assets/duolingo-ea3a0aa56fa5.svg',
  'alt': 'Duolingo'},
 {'url': 'https://github.githubassets.com/assets/newyorktimes-9d76d7f338f5.svg',
  'alt': 'New York Times'},
 {'url': 'https://github.githubassets.com/assets/mercado-libre-762f679589f2.svg',
  'alt': 'Mercado Libre'},
 {'url': 'https://github.githubassets.com/assets/american-airlines-a2b13c8fbe08.svg',
  'alt': 'American Airlines'},
 {'url': 'https://github.githubassets.com/assets/ford-2fb5195a4c7c.svg',
  'alt': 'Ford'},
 {'url': 'https://github.githubassets.com/assets/mercedes-b9190458c80e.svg',
  'alt': 'Mercedes Benz'},
 {'url': 'https://github.githubassets.com/assets/societe-generale-7a3414490494.svg',
  'alt': 

In [19]:
urls

[{'url': 'https://www.github.com#start-of-content', 'text': 'Skip to content'},
 {'url': 'https://github.com/features/copilot/?utm_source=github&utm_medium=banner&utm_campaign=copilotfree-bannerheader',
  'text': 'Learn more'},
 {'url': 'https://www.github.com/', 'text': ''},
 {'url': 'https://www.github.com/login', 'text': 'Sign in'},
 {'url': 'https://github.com/features/copilot',
  'text': 'GitHub Copilot\n        Write better code with AI'},
 {'url': 'https://github.com/features/security',
  'text': 'Security\n        Find and fix vulnerabilities'},
 {'url': 'https://github.com/features/actions',
  'text': 'Actions\n        Automate any workflow'},
 {'url': 'https://github.com/features/codespaces',
  'text': 'Codespaces\n        Instant dev environments'},
 {'url': 'https://github.com/features/issues',
  'text': 'Issues\n        Plan and track work'},
 {'url': 'https://github.com/features/code-review',
  'text': 'Code Review\n        Manage code changes'},
 {'url': 'https://github.

## Lets summarize the URL content

#### Writing a Prompt

In [20]:
# System Prompt
system_prompt = """You are an AI assistant that analyzes the contents of a website."""

In [26]:
def prompt(system_prompt, body):
    user_prompt = f"""You are given a URL as input. Analyze the content of the page and provide a summary.
    {body}"""

    prompts = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]
    return prompts

#### Proceed with Ollama Chat

In [35]:
def summarize(url, model_name):
    title, body, images, urls = scrape_website(url)
    print(f"Title: {title}")
    response = ollama.chat(model=model_name, messages=prompt(system_prompt, body))
    return response['message']['content'], images, urls

In [None]:
# Example 1

summary, image, links = summarize("https://www.nrb.org.np/bank-list/", models[-1])
print(f"Summary: {summary} \nImages: {images} \nURLs: {links}")

Title: BFI List - the official site of the Central Bank of Nepal
Summary: ### Central Office Details

**Address:** Baluwatar, Kathmandu, Nepal  
**P.O. Box:** 73  
**Phone Numbers:**  
- +977-1-5719641  
- +977-1-5719642  
- +977-1-5719643  
- +977-1-5719653  
- +977-1-5719659  

**Fax:** +977-1-5719601

### Contact and Feedback Information
For any queries or feedback, please visit the contact page on our website.

### Quick Links
The Nepal Rastra Bank (NRB) offers several services and information resources through its official website. Key sections include:
1. **Financial Inclusion Portal**
2. **Banks & Financial Institutions List**
3. **Financial Intelligence Unit (FIU - Nepal)**
4. **goAML Nepal Registration & Reporting Portal**
5. **Financial Literacy**
6. **Bank Notes Security Features**
7. **Right To Information**
8. **International Conference**
9. **Financial Consumer Protection**
10. **International Relation and Technical Cooperation**
11. **SAARC FINANCE**
12. **Multinational 

In [None]:
# Example 2

summary, image, links = summarize("https://connectips.com/knowmore/cipsBlog.php", models[-1])
print(f"Summary: {summary} \n Found: {len(images)} images & {len(links)} urls")

Title: connect IPS
Summary: This website appears to be the official website for NepalPay, which seems to be a digital payment platform in Nepal. The page includes several features and offers:
1. Mobile-based fund transfer via QR code
2. Cashback on various services (bus tickets, flight tickets, insurance premiums)
3. Scholarships on college fees
4. Free mobile number-based fund transfer
5. Various promotions and contests with prizes like smartphones, earbuds, and air fryers
The website offers a wide range of bill payment services and seems to integrate with multiple financial institutions in Nepal. It also includes features for online license renewals and government payments.
The design is in Nepali, which may indicate that the platform has a significant user base in Nepal. The page provides options for registration and login, as well as FAQs and terms and conditions.
Overall, this appears to be a comprehensive digital payment platform designed to simplify financial transactions in Nep

In [34]:
# Example 3

summary, image, links = summarize("https://huggingface.co/docs", models[-1])
print(f"Summary: {summary} \n Found: {len(images)} images & {len(links)} urls")

Title: Hugging Face - Documentation
Summary: The provided URL appears to be the main page of Hugging Face, a company focused on natural language processing and machine learning. The website offers various tools and resources for developers and researchers in these fields:
1. **Transformers**: A library with state-of-the-art models for PyTorch, TensorFlow, and JAX.
2. **Diffusers**: Tools for image and audio generation using diffusion models.
3. **Datasets**: Access to datasets for computer vision, audio, and natural language processing tasks.
4. **Gradio**: A way to build machine learning demos and web applications quickly.
5. **Hub**: A platform for hosting Git-based models, datasets, and spaces.
6. **Inference API (Serverless)**: Allows experimenting with over 200k models without server infrastructure.
7. **Inference Endpoints (Dedicated)**: For deploying models to production on dedicated infrastructure.
8. **PEFT**: Methods for parameter-efficient fine-tuning of large models.
9. **A

In [36]:
# Example 4: using DeepSeek

summary, image, links = summarize("https://huggingface.co/docs", models[-2])
print(f"Summary: {summary} \n Found: {len(images)} images & {len(links)} urls")

Title: Hugging Face - Documentation
Summary: <think>
Okay, I need to figure out what Hugging Face is based on the content given. Let me start by reading through the provided information carefully.

First, there's a list of sections like Models, Datasets, Spaces, etc., which suggests it's a platform that offers various AI tools and resources. The "Models" section includes Transformers, Diffusers, Inference APIs, and others. These are all related to machine learning models, specifically those using transformers for tasks like NLP and computer vision.

Then there are Datasets, which makes sense because they provide data for training these models. They mention specific datasets for vision, audio, and NLP. The Spaces section might be about collaborative workspaces or projects, allowing users to share their work.

I notice sections like Gradio and Huggingface.js, which seem to be tools for building web applications using Python. This indicates that Hugging Face offers both backend services (