# Day 1 - Exercise 1 

*Web Scraping and Summarization Exercise*

### Objective

This notebook wil scrape GitHub repositories declared in the repos.txt, fetch their README.md files, and summarize them using an LLM. The summaries will be guided by system and user prompts.  It is a different approach to screen scraping websites and the general idea is to use LLM to summarise the code repository based on user and system prompts currently limited to the repos defined in repos.txt. However could be useful/extended as a search tool to search git repos on a topic (eg. LLM) and retrieve summarised information from README, CONTRIBUTORS and other meta data... only day 1, so won't get to carried away...


In [1]:
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI()

### GitHub README Fetching and Display Utilities

This section defines helper functions to extract repository information, fetch README files from GitHub, and display summaries in Jupyter notebooks.

In [2]:
import requests
from urllib.parse import urlparse
from IPython.display import Markdown, display

# Extract GitHub owner and repo name from URL
def extract_repo_parts(url: str):
    parsed = urlparse(url)
    parts = parsed.path.strip('/').split('/')
    if len(parts) < 2:
        raise ValueError(f'Invalid GitHub repo URL: {url}')
    return parts[0], parts[1]

# Fetch README.md from main or master branch of GitHub repo
def fetch_readme(owner: str, repo: str):
    branches = ['main', 'master']
    for branch in branches:
        raw_url = f"https://raw.githubusercontent.com/{owner}/{repo}/{branch}/README.md"
        r = requests.get(raw_url)
        if r.status_code == 200:
            return r.text
    raise FileNotFoundError(f'README.md not found on main or master branch for {owner}/{repo}')

# Display summary nicely in Markdown inside Jupyter
def display_summary(text: str):
    display(Markdown(text))



### Define the prompts for the LLM

- **System Prompt:** Sets the role and behavior of the model as a "philosophical, humorous software engineer".
- **User Prompt:** Contains the README content and requests a summary.

Using system + user prompts allows us to control tone and style.

In [3]:
# Define system prompt (role, tone, behavior)
system_prompt = """
You are a philosophical new age software engineer that analyzes the contents of GitHub repositories,
and provides a critical thinking, open-minded and humorous summary.
Respond in markdown. Do not wrap the markdown in a code block - respond just with the markdown.
"""

    # Define user prompt (content to summarize)
user_prompt_prefix = """
Here are the contents of a GitHub repository README.
Provide a short summary of this repository in the context of the system prompt.
"""


### This function sends the README content to the OpenAI API along with the prompts:

1. Combines the user prompt with README text.
2. Calls `client.responses.create` with system and user prompts.
3. Displays the summary in Markdown under the repository name.


In [4]:
# Summarize a README using the system and user prompts with OpenAI
def summarize_readme(readme_text: str, repo_name: str):
    
    # Combine user prompt prefix with the README content
    user_prompt = user_prompt_prefix + "\n\n" + readme_text

    # Call OpenAI Responses API
    response = client.responses.create(
        model="gpt-4.1-mini",
        input=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )

    # Display summary in Markdown
    summary = response.output_text
    display_summary(f"### {repo_name}\n{summary}")

### Load a list of GitHub repository URLs from `repos.txt`.

For each URL:
* Extract owner/repo.
* Fetch README.
* Generate and display summary.

In [5]:
# Read repo URLs from repos.txt
with open("repos.txt", "r") as f:
    repo_urls = [line.strip() for line in f if line.strip() and not line.startswith("#")]

# Fetch and summarize each README
for repo_url in repo_urls:
    try:
        owner, repo = extract_repo_parts(repo_url)
        readme_text = fetch_readme(owner, repo)
        repo_name = f"{owner}/{repo}"
        summarize_readme(readme_text, repo_name)
    except Exception as e:
        display_summary(f"**Error fetching {repo_url}: {e}**")


### ed-donner/llm_engineering
# Repository Summary: LLM Engineering - The Joyful Path to Mastering AI

Welcome, fellow digital wanderer, to a thoughtfully curated 8-week voyage into the profound (and sometimes mystifying) world of Large Language Models (LLMs) and AI engineering! This repository serves as the heartbeat of a Udemy course designed by Edward Donner, blending practical project-based learning with the kind of humor, care, and clarity that makes complex tech feel like a friendly campfire chat.

---

## What‚Äôs inside this cosmic toolkit?

- **A Progressive Curriculum**: Each "week" folder is an educational stepping stone, guiding you from the basics of installation (hello Ollama and Llama 3.2!) to a mind-expanding autonomous Agentic AI project by Week 8. The learning path respects the sacred truth: *you learn best by doing*.
  
- **Dual Versions for the Pilgrims**: A graceful embrace of change‚Äîyou can choose between the original Anaconda-powered setup or the shiny new uv-based environment with updated models and techniques. The author understands transformation isn‚Äôt always smooth; flexibility is offered like a yogi lends a hand during a tricky pose.
  
- **API Economy & Cloud Ascension**: Insightful tips to keep your API spending down (because enlightenment should not come with a hidden subscription fee), alongside Google Colab notebooks for GPU-powered ascents in the sky of computational prowess.

- **Community and Contribution**: The repository invites you not just to consume knowledge, but to become part of a community. Submit your code like gifts, earn GitHub recognition, and connect with the author on LinkedIn or Twitter‚Äîbecause learning is at its deepest when shared.

- **Mindful Warnings and Tips**: Gentle nudges not to use Llama 3.3 on home machines and practical advice on environment setup are sprinkled throughout, showcasing wisdom that only an experienced guide can offer.

---

## The Cosmic Takeaway

This repo is less just code, more an invitation‚Äîto embark on a transformative, hands-on exploration of AI‚Äôs frontier. It‚Äôs for the curious engineer who sees behind the lines of code to the larger picture: a harmonious dance between human creativity and artificial intelligence. Here, mastering LLMs isn't about rote memorization but about *joy, challenge, and discovery.*

So, if you have a passion for AI, a curious mind, and a sense of humor when things don‚Äôt go exactly as planned (looking at you, API quirks and model sizes), this course repository might just be your North Star.

And remember: sometimes, the journey teaches us more than the destination‚Äîand with Edward Donner as your guide, laughter and learning go hand-in-hand on this 8-week odyssey into LLM Engineering.

---

*Keep exploring. Keep questioning. And most importantly, keep having fun.* üåå‚ú®

### openai/openai-cookbook
# OpenAI Cookbook: A Philosophical Guide to AI Alchemy üßô‚Äç‚ôÇÔ∏è‚ú®

Behold, a modern grimoire for the digital age! This repository serves as a cosmic cookbook, blending the arcane magic of OpenAI‚Äôs API with the approachable language of Python and beyond. It‚Äôs less about secret spells and more about liberating you from the mythic labyrinth of API calls‚Äîso you can focus on creating wonders.

In this recipe book, each example is like a little philosophical parable: simple yet profound, inviting you to experiment with OpenAI‚Äôs offerings as if stirring potions in a cauldron of creativity. All you need is the sacred key (your API key) to unlock this library of knowledge. And remember, whether you‚Äôre a digital sorcerer wielding Python or a multi-lingual magus, the essence of these recipes transcends syntax.

Ultimately, this repository reminds us that technology is not just code‚Äîit‚Äôs a playground for curious minds seeking to decode the mysteries of intelligence, language, and connection. So, dive in, explore, and may your code be ever poetic and transformative. üååüíªüß©