# GitHub Portfolio Summarizer

Creating a product that summarizes <b>GitHub</b> profiles for potential recruiters to quickly assess a candidate’s technical expertise, interests, and contributions.

## How to Run This GitHub Profile Summarizer

### 1. Install Required Libraries -- <i>openai beautifulsoup4 requests</i>
### 2. Set Your OpenAI API Key in a <i>.env</i> file
### 3. Run All Cells (Top to Bottom)
### 4. Run <i>stream_portfolio(your name, your github link) with your real name and GitHub 
### 5. You’ll see a live-streamed Markdown portfolio of the GitHub user

In [1]:
# Importing all the required dependancies

import os
import requests
import json
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# Initializing and declaring constants

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key found!")
else:
    print("There might be a problem with your API key! Please recheck it and try again.")

MODEL = "gpt-4o-mini"
openai = OpenAI(api_key = api_key)

API key found!


### A class to represent a GitHub profile

In [3]:
# Some websites need you to use proper headers when fetching them

headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

In [4]:
class GithubProfile:
    """
    This class scrapes a GitHub profile page, extracts the visible text and links, removes clutter, and makes the content easy to pass into something like the OpenAI API for summarization.
    """
    
    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers = headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"

        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator = "\n", strip = True)
        else:
            self.text = ""
        links = [link.get("href") for link in soup.find_all("a")]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"        

### Use a call to gpt-4o-mini to read the links on the GitHub, and respond in structured JSON. 
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

In [5]:
link_system_prompt = "You are provided with a list of links found on a Github page. \
You are able to decide which of the links would be most relevant to include in a portfolio about the github user, \
such as links to an About page, or a Repositories, or Projects.\n"

link_system_prompt += "You should respond in JSON as in this example:"

link_system_prompt += """
{
    "links": [
        {"type": "overview page", "url": "https://another.full.url"},
        {"type": "repositories page": "url": "https://another.full.url?tab=repositories"}
    ]
}
"""

In [6]:
print(link_system_prompt)

You are provided with a list of links found on a Github page. You are able to decide which of the links would be most relevant to include in a portfolio about the github user, such as links to an About page, or a Repositories, or Projects.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "overview page", "url": "https://another.full.url"},
        {"type": "repositories page": "url": "https://another.full.url?tab=repositories"}
    ]
}



In [7]:
def get_links_user_prompt(profile):
    user_prompt = f"Here is the list of links on the website of {profile.url} - "
    user_prompt += "please decide which of these are relevant web links for a portfolio about the user, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, Login, Instagram, Stars, Following, Followers, LinkedIn, Packages, Blog or Github trending related pages.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(profile.links)
    
    return user_prompt

In [8]:
def get_links(url):
    profile = GithubProfile(url)
    response = openai.chat.completions.create(
        model = MODEL,
        messages = [
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(profile)}
        ], 
        response_format = {"type": "json_object"}
    )

    result = response.choices[0].message.content
    return json.loads(result)

### Making the portfolio

In [9]:
def get_all_details(url):
    """
    Fetches and compiles the textual content of a GitHub user's profile page and key sub-pages.

    This function:
    1. Extracts the main content from the landing GitHub profile page.
    2. Uses the `get_links` function to identify relevant sub-pages (e.g., Repositories, About).
    3. Fetches and appends the content from those sub-pages.
    4. Returns a structured string containing labeled sections of all gathered content.
    """
    
    result = "Landing page:\n"
    result += GithubProfile(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += GithubProfile(link["url"]).get_contents()
    return result

In [10]:
system_prompt = """
You are an assistant that analyzes the contents of multiple relevant pages from a personal GitHub profile. 
Your task is to create a short, professional markdown-based portfolio summary of the user, focusing especially on:

- The user's profile overview (if available)
- Notable repositories and projects
- Summaries of key repositories based on README files
- Any other relevant information for recruiters or potential investors

Respond in markdown format, using headings, bullet points, and clean structure.

Be concise but informative. Assume the reader is a technical recruiter or investor looking to quickly understand the user's strengths and work.
"""

In [11]:
def get_portfolio_user_prompt(profile_name, url):
    """ Builds a user-specific prompt for OpenAI to generate a GitHub portfolio. """

    user_prompt = f"You are looking at a GitHub user called **{profile_name}**.\n\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short portfolio of the user in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    
    return user_prompt

In [16]:
def stream_portfolio(profile_name, url):
    """
    Creates and streams a markdown-formatted portfolio summary for a GitHub user using OpenAI API
    and displays it live in a Jupyter notebook.    
    """
    stream = openai.chat.completions.create(
        model = MODEL,
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_portfolio_user_prompt(profile_name, url)}
        ], 
        stream = True
    )

    full_response = ""
    display_handle = display(Markdown(""), display_id = True)

    for chunk in stream:
        content = chunk.choices[0].delta.content or ''
        full_response += content
        clean_response = full_response.replace("```", "").replace("markdown", "")
        update_display(Markdown(clean_response), display_id = display_handle.display_id)

In [17]:
stream_portfolio("Ben Gregory John", "https://github.com/BenGJ10")

Found links: {'links': [{'type': 'overview page', 'url': 'https://github.com/BenGJ10'}, {'type': 'repositories page', 'url': 'https://github.com/BenGJ10?tab=repositories'}, {'type': 'projects page', 'url': 'https://github.com/BenGJ10?tab=projects'}, {'type': 'specific repository', 'url': 'https://github.com/BenGJ10/Network-Security-System'}, {'type': 'specific repository', 'url': 'https://github.com/BenGJ10/Rainfall-Prediction-Model-Deployment'}, {'type': 'specific repository', 'url': 'https://github.com/BenGJ10/Student-Performance-End-to-End-ML-Project'}, {'type': 'specific repository', 'url': 'https://github.com/BenGJ10/LockX'}, {'type': 'specific repository', 'url': 'https://github.com/BenGJ10/Machine-Learning-Bootcamp'}, {'type': 'specific repository', 'url': 'https://github.com/BenGJ10/Machine-Learning-Projects'}]}


# Portfolio Summary of Ben Gregory John (BenGJ10)

## Profile Overview
- **Name:** Ben Gregory John
- **Username:** BenGJ10
- **Education:** Pursuing a Bachelor's degree in Computer Science Engineering
- **Location:** Mavelikara, Kerala, India
- **Interests:** Aspiring Data Scientist with a strong interest in MLOps
- **LinkedIn:** [in/bengj10](https://www.linkedin.com/in/bengj10)
- **Instagram:** [bengj10](https://www.instagram.com/bengj10)
- **Followers:** 5 | **Following:** 10
- **Stars on Repositories:** 20

## Notable Repositories and Projects
Here are some highlighted repositories showcasing Ben's skills in machine learning and software development:

1. **[Network-Security-System](https://github.com/BenGJ10/Network-Security-System)**
   - **Description:** An end-to-end MLOps project focused on building a network security system. Features include ETL pipelines.
   - **Language:** Python

2. **[Rainfall-Prediction-Model-Deployment](https://github.com/BenGJ10/Rainfall-Prediction-Model-Deployment)**
   - **Description:** Utilizes a Random Forest model to predict rainfall based on various atmospheric parameters. Deployment forthcoming.
   - **Language:** Jupyter Notebook

3. **[Student-Performance-End-to-End-ML-Project](https://github.com/BenGJ10/Student-Performance-End-to-End-ML-Project)**
   - **Description:** A basic end-to-end ML project featuring deployment with AWS ECR and Docker.
   - **Language:** Jupyter Notebook

4. **[LockX](https://github.com/BenGJ10/LockX)**
   - **Description:** A project focused on automated deadlock detection tools in operating systems.
   - **Language:** JavaScript

5. **[Machine-Learning-Bootcamp](https://github.com/BenGJ10/Machine-Learning-Bootcamp)**
   - **Description:** Repository containing files and notes from the Complete ML-NLP-MLOps Bootcamp by Krish Naik.
   - **Language:** Jupyter Notebook

6. **[Machine-Learning-Projects](https://github.com/BenGJ10/Machine-Learning-Projects)**
   - **Description:** A collection repository for various machine learning projects that Ben has worked on.
   - **Language:** Jupyter Notebook

## Technical Skills
- Proficient in **Python** and **JavaScript**
- Experienced with **Jupyter Notebook**, **AWS**, and **Docker**
- Familiar with MLOps practices and machine learning models, especially Random Forest

## Summary
Ben Gregory John is a motivated aspiring data scientist and developer with foundational experience in machine learning and software development. He demonstrates initiative through the creation of diverse projects, particularly in MLOps and network security. His academic background in computer science positions him to make impactful contributions in tech-driven fields, appealing to tech recruiters and investors alike.