<a href="https://colab.research.google.com/github/MohHasan1/ScrapeSummarize/blob/main/ScrapeSummarize.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
# imports

import os
import requests
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [3]:
OPENAPI_KEY = "Your-open-api-key"

In [4]:
if not OPENAPI_KEY:
    print("No API key found. Please add the OPENAPI_KEY above.")
elif not OPENAPI_KEY.startswith("sk-proj-"):
    print("Invalid API key format. Ensure it starts with 'sk-proj-'.")
elif OPENAPI_KEY.strip() != OPENAPI_KEY:
    print("API key has leading/trailing spaces. Remove them.")
else:
    print("API key is valid!")


API key is valid!


In [6]:
# Connect to OpenAI

openai = OpenAI(api_key=OPENAPI_KEY)

In [7]:
# Test the connection.

message = "Hello, GPT! How are you?"
response = openai.chat.completions.create(model="gpt-4o-mini", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

Hello! I'm just a computer program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?


# **Types of prompts**

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

# Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:



```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]
```

# **Buisness Problem**

Summarize any webpage

In [19]:
# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        self.title = None
        self.text = None

        # Get the website
        response = requests.get(url, headers=headers)
        # Parse the website
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [None]:
# Test the website class.

hasanOS = Website("https://hasan-swe.dev/portfolio/user-profile")
print(hasanOS.title)
print(hasanOS.text)

# **Summarizer**

In [22]:
# Define our system prompt

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [21]:
# User Prompt: A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
          please provide a short summary of this website in markdown. \
          If it includes news or announcements, then summarize these too.\n\n"

    user_prompt += website.text

    return user_prompt

In [None]:
print(user_prompt_for(hasanOS))

In [27]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [None]:
print(messages_for(hasanOS))

In [29]:
# And now: call the OpenAI API.

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [31]:
summarize("https://hasan-swe.dev/portfolio/user-profile")

"# HasanOS Summary\n\n**About Me**  \nHasan is a Software Engineering student at Seneca Polytechnic, who discovered a passion for programming through pseudocode during an IGCSE Computer Science course. He enjoys the logical process of programming and is motivated by the positive impact his work has on others. \n\n**Career Goals**  \nHasan is focused on web development, with experience in front-end technologies like React and Next.js, as well as back-end frameworks such as Node.js, Django, and Spring. He has practical experience through a summer internship in full-stack development, and is interested in expanding his knowledge in AI and machine learning.\n\n**Hobbies & Skills**  \nHasan enjoys gaming, particularly aiming for platinum trophies, and has a passion for sketching which allows him to express his creativity. He also has a fondness for cats, enjoying their company despite not owning one.\n\nThis website showcases Hasan's skills, interests, and professional journey as he navigat

In [33]:
# A function to display markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [34]:
display_summary("https://hasan-swe.dev/portfolio/user-profile")

# HasanOS Summary

**Introduction**  
The website showcases Hasan, a Software Engineering student at Seneca Polytechnic, who expresses a strong passion for programming and web development. Hasan's interest in technology began during his IGCSE Computer Science course when he was introduced to programming concepts.

**About Me**  
Hasan is motivated by the process of creating software, particularly enjoying feedback from others on his work. He takes pride in the positive impacts his projects have on people and organizations.

**Career Goals**  
He has experience in both front-end and back-end web development, using technologies like:
- **Front-End:** React and Next.js
- **Back-End:** Node.js, Django, Spring
- **Architecture:** Serverless and RESTful APIs
- **Design Pattern:** MVC (Model-View-Controller)

His recent internship in full-stack development has further enhanced his skills. Hasan is also developing an interest in AI and machine learning, looking forward to expounding these areas in his future studies.

**Hobbies & Skills**  
Hasan enjoys gaming, particularly aiming for platinum trophies on his PS4 and PS5. He is also passionate about sketching, using it as a creative outlet. Additionally, he has a fondness for cats, as expressed in a fun fact on the site.

**Random Cat Pictures**  
The site includes a section for random cat pictures, providing a light-hearted touch.

© 2025 Hasan. All rights reserved.

In [35]:
display_summary("https://hasan-swe.dev/portfolio/project-explorer")

# HasanOS

HasanOS is a personal portfolio website showcasing various programming projects. It serves as a platform to highlight both completed and ongoing work, emphasizing web-based and full-stack programming. The site also reflects an interest in integrating AI with web development.

## Projects Overview

### Hosted Projects
1. **HasanOS**: A personal portfolio and desktop environment (Ongoing).
2. **Algo-Maze**: An interactive maze generator and solver utilizing various algorithms (Completed).
3. **IgnoreGit**: An AI tool for generating `.gitignore` files (Completed).
4. **Byte-Link**: A fullstack application for sharing tech-related moments among developers (Completed).

### Not Hosted Projects
- **File-Share**: A privacy-focused file storage and sharing app.
- **YuTify**: A tool for transferring playlists between YouTube and Spotify.
- **MERN-MART**: A full-stack MERN application featuring user authentication and CRUD operations.
- **The-Ping-Pong-Game**: An arcade-style game developed with C++ using the Raylib library.

### AI Projects
- **Finger-Frenzy**: A game utilizing hand gestures built with OpenCV and Google MediaPipe.
- **Gesture Pong**: An interactive version of Pong using hand gestures, OpenCV, MediaPipe, and Pygame.

## Current Focus
The creator is delving into large language models (LLMs) and machine learning, looking to innovate by merging these technologies with web development.

Overall, HasanOS is a dynamic portfolio that reflects the creator's skills, projects, and ambition in both traditional programming and AI.

In [36]:
display_summary("https://hasan-swe.dev/portfolio/skill-set")

# HasanOS Summary

HasanOS showcases a diverse set of technical skills in web development, artificial intelligence (AI), and machine learning (ML). The website outlines the programming languages, web frameworks, web tools, AI tools, and other technologies that Hasan is proficient in.

## Technical Skills

- **Languages**: C, C++, C#, JavaScript, Python, Java, TypeScript, Go, PHP
- **Web Frameworks**: React, Angular, Next.js, Express.js, Django, Flask, Spring
- **Web Tools**:
  - UI Frameworks: Ant Design, ShadCN UI, Bootstrap, DaisyUI, Tailwind CSS
  - Animation and State Management: Framer Motion, Redux, Zustand
  - Testing and Data Fetching: Jest, Postman, Selenium, Axios, TanStack
  - Cloud Services and Hosting: Firebase, Appwrite, Netlify, Vercel
  - Databases: MySQL, PostgreSQL, MongoDB, Mongoose
- **AI Tools**: Langchain, OpenAI, Scikit-learn, Pandas, NumPy, TensorFlow, OpenCV, MediaPipe, Hugging Face, Pinecone

## Other Tools

- Version Control: Git
- Containerization: Docker
- CI/CD: GitHub Actions
- IDEs: VS Code, Eclipse
- Operating Systems: Linux (Ubuntu)
- Cloud Services: AWS, Azure

The site emphasizes the importance of evolving skills akin to coding, highlighting a commitment to continuous learning and problem-solving in technology. 

No news or announcements were found on the website.