# Day 2 - Exercise 1 

*Local Web Scraping and Summarization Exercise*

### Objective

This python notebook does the same GitHub readme scraping, but uses a local ollama llm, and varies the prompts from a happy developer, to a snarky super hero one...


In [None]:
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI()

# use local ollama model
OLLAMA_BASE_URL = "http://localhost:11434/v1"
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')
clientModel = "gemma3:4b"



### GitHub README Fetching and Display Utilities

This section defines helper functions to extract repository information, fetch README files from GitHub, and display summaries in Jupyter notebooks.

In [None]:
import requests
from urllib.parse import urlparse
from IPython.display import Markdown, display

# Extract GitHub owner and repo name from URL
def extract_repo_parts(url: str):
    parsed = urlparse(url)
    parts = parsed.path.strip('/').split('/')
    if len(parts) < 2:
        raise ValueError(f'Invalid GitHub repo URL: {url}')
    return parts[0], parts[1]

# Fetch README.md from main or master branch of GitHub repo
def fetch_readme(owner: str, repo: str):
    branches = ['main', 'master']
    for branch in branches:
        raw_url = f"https://raw.githubusercontent.com/{owner}/{repo}/{branch}/README.md"
        r = requests.get(raw_url)
        if r.status_code == 200:
            return r.text
    raise FileNotFoundError(f'README.md not found on main or master branch for {owner}/{repo}')

# Display summary nicely in Markdown inside Jupyter
def display_summary(text: str):
    display(Markdown(text))



### Define the prompts for the LLM

- **System Prompt:** Sets the role and behavior of the model as a "philosophical, humorous software engineer".
- **User Prompt:** Contains the README content and requests a summary.

Using system + user prompts allows us to control tone and style.

In [None]:
# Define system prompt (role, tone, behavior)
system_prompt = """
You are a snarky, egotistical, my way or the highway software engineer that analyzes the contents of GitHub repositories,
and provides a super hero, slightly arrogant summary.
Respond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.
"""

    # Define user prompt (content to summarize)
user_prompt_prefix = """
Here are the contents of a GitHub repository README.
Provide a short summary of this repository in the context of the system prompt.
"""


### This function sends the README content to the OpenAI API along with the prompts:

1. Combines the user prompt with README text.
2. Calls `ollama.chat.completions.create` with system and user prompts.
3. Displays the summary in Markdown under the repository name.


In [None]:
def summarize_readme(readme_text: str, repo_name: str):
    user_prompt = user_prompt_prefix + "\n\n" + readme_text

    response = ollama.chat.completions.create(
        model=clientModel,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )

    # extract the assistant response content
    summary = response.choices[0].message.content

    display_summary(f"### {repo_name}\n{summary}")

### Load a list of GitHub repository URLs from `repos.txt`.

For each URL:
* Extract owner/repo.
* Fetch README.
* Generate and display summary.

In [None]:
# Read repo URLs from repos.txt
with open("repos.txt", "r") as f:
    repo_urls = [line.strip() for line in f if line.strip() and not line.startswith("#")]

# Fetch and summarize each README
for repo_url in repo_urls:
    try:
        owner, repo = extract_repo_parts(repo_url)
        readme_text = fetch_readme(owner, repo)
        repo_name = f"{owner}/{repo}"
        summarize_readme(readme_text, repo_name)
    except Exception as e:
        display_summary(f"**Error fetching {repo_url}: {e}**")