# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [19]:
# imports

import os
import requests
from dotenv import load_dotenv
from IPython.display import Markdown, display
import openai

In [20]:
# constants
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")



API key found and looks good so far!


In [21]:
# set up environment
MODEL_GPT = 'gpt-4o-mini'

system_prompt = "You are an assistant that answers technical or coding related questions."
question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
""";

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": question}
]


In [25]:
# here is the question; type over this to ask something new

response = openai.chat.completions.create(model=MODEL_GPT, messages=messages)

def display_markdown(response):
    content = response.choices[0].message.content  # Extract the assistant's reply text
    display(Markdown(content)) 

display_markdown(response)

Let's break down the provided code snippet and understand its purpose and functionality. 

### Code Breakdown

```python
yield from {book.get("author") for book in books if book.get("author")}
```

#### Components Explained:

1. **Set Comprehension**: 
   - `{book.get("author") for book in books if book.get("author")}` 
   - This part of the code creates a **set** of authors by iterating through a collection called `books`. 
   - The `book.get("author")` expression tries to retrieve the value associated with the key `"author"` from each `book` dictionary.
   - The `if book.get("author")` condition ensures that only books with a non-`None` or non-falsy author value are included in the set. If `book.get("author")` returns `None` or an empty string (or any other false-like value), that book will be excluded from the set.

2. **Set Generation**: 
   - The use of curly braces `{}` indicates that a set is being created. A set is a collection of unique elements. Hence, the result will contain only unique author names (no duplicates).

3. **`yield from` Statement**: 
   - `yield from` is a construct used in Python to yield all values from an iterable. This enables the function to return each element produced by the set comprehension one at a time to the caller of the generator function.
   - It essentially turns this expression into a generator that produces authors one by one.

### Purpose 

The entire line of code is used within a generator function to yield the unique authors of books present in the `books` collection that have a valid author name. Here's why you might use this construct:

- **Memory efficiency**: Since it uses `yield from`, the authors will be generated one by one, rather than all at once, which can be memory-efficient, especially with large datasets.
- **Unique values**: By using a set comprehension, it automatically filters out any duplicate author names, resulting in a set of unique authors without manual checking for duplicates.
- **Filtering**: It effectively filters out any books that do not have an author, ensuring that only valid authors are yielded.

### Example

Assuming `books` is defined as follows:

```python
books = [
    {"title": "Book 1", "author": "Author A"},
    {"title": "Book 2", "author": "Author B"},
    {"title": "Book 3", "author": None},
    {"title": "Book 4", "author": "Author A"},  # Duplicate author
    {"title": "Book 5", "author": ""},
]
```

The resulting unique authors yielded would be:

- "Author A"
- "Author B"

So, the code effectively filters out books without authors and ensures no duplicates when delivering the author names.

In [None]:
# Get gpt-4o-mini to answer, with streaming

In [27]:
# Get Llama 3.2 to answer
OLLAMA_API = "http://localhost:11434/api/chat"
HEADERS = {"Content-Type": "application/json"}
MODEL_LLAMA = 'llama3.2'

ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

In [28]:
response = ollama_via_openai.chat.completions.create(model=MODEL_LLAMA, messages=messages)

def display_markdown(response):
    content = response.choices[0].message.content  # Extract the assistant's reply text
    display(Markdown(content)) 

display_markdown(response)

**Code Explanation**

This is a generator expression in Python, written using the **Yield From syntax**.

Here's a breakdown of what each part does:

* `yield from`: This keyword is used to invoke another iterator and create an iterable sequence.
* `{...}`: This defines a dictionary comprehension. However, this specific code doesn't use the resulting dictionary values but instead yields values directly.
* `book.get("author")`: For each book in `books`, it tries to retrieve the value of the "author" key from the book dictionary and yields that value.
* The expression is wrapped within `.get()` calls for each book, which means only books with the "author" key will be included in the iteration.

**Why**

This code is using a few concepts:

1. **Generator Expression**: Instead of storing all the yielded values in memory at once (like a list comprehension), this expression creates an iterator that produces values on-the-fly. This can be more memory-efficient, especially when dealing with large datasets.
2. **Yield From syntax**: By using `yield from`, the generator becomes a delegator. It doesn't need to store all the yielded values, it just delegates them to another iterable (the one defined within the `.get()` calls).
3. **Dictionary Comprehension**: Although not necessary in this case, dictionary comprehensions can be useful when you need to perform some computation on each book's dictionary and still want the resulting dictionary.

**Example Context**

Here's a potential example where this code might appear:

```python
books = [
    {"title": "Book 1", "author": "Author 1"},
    {"title": "Book 2", "author": "Author 2"},
    # ...
]

authors = {}  # Dictionary to store authors

for book in books if book.get("author"):
    author = book.get("author")
    authors[author] = book.get("title") or None  # Optional title fetching

# Here, authors is a dictionary with books by their authors

# The yield from expression can be chained like this:
filtered_books_by_authors = {}
yield from {book for book in books if book.get("author")}
filtered_books_by_authors = filtered_books_by_authors
```

**Notes**

The `.get()` method is used with no value provided (i.e., `None`) to avoid raising a KeyError when the key doesn't exist.

In [32]:
question = '''
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"
''';

system_prompt = '''You are an assistant that answers technical or coding related questions. Note that your primary task is not to evaluate the code for correctness, 
but instead to explain the code and what it is doing and how. If there are errors to point out within that explanation, that is fine to explain them.''';

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": question}
]

response = ollama_via_openai.chat.completions.create(model=MODEL_LLAMA, messages=messages)

def display_markdown(response):
    content = response.choices[0].message.content  # Extract the assistant's reply text
    display(Markdown(content)) 

display_markdown(response)

Let's break down the code:

**Variables and Functions**

The code starts by defining a dictionary `headers` that contains a `User-Agent` header for making HTTP requests. This is typically used to make it look like the request is coming from a browser, rather than a bot.

Next, we define a class called `Website`. The comment above it briefly explains what this class represents - a scraped website with links.

**Initializing a Website**

The `__init__` method initializes an instance of the `Website` class. It takes two parameters: `self` (a reference to the current object being created) and `url` (the URL of the webpage to scrape).

 Inside the `__init__` method:

*   We set the `url` attribute of the class.
*   We use the `requests.get()` function to make an HTTP GET request to the specified `url`. This will send a request to the server and retrieve its content. We pass in the `headers` dictionary we defined earlier. The response is stored in the `response` variable, but we don't actually need this variable for anything.
*   We extract the HTML content of the webpage from the `response.content` attribute (this can be a byte array or string depending on how the content was encoded) using BeautifulSoup's HTML parser (`html.parser`). The resulting `soup` object is an abstract syntax tree representing the structure of the HTML code.

**Text Extraction and Link Collection**

In the `__init__` method, we also:

*   Set the `self.title` attribute by getting the text from the webpage's title tag. If no title exists, we use a default message.
*   We remove all irrelevant tags from the webpage structure (script, style, img, and input tags).
*   We extract the text content of the webpage using BeautifulSoup's `get_text()` function, passing in specific arguments to control how it formats the output.

Once that is done, we collect all link URLs (`href` attribute) from the webpage structure. The resulting list includes those links as well as any other potential link structures (in this case, empty).

We exclude empty strings by implementing a conditional statement in both lines when setting `self.text` and returning this condition on line 57.

**Attributes Definition**

Finally, we set two attributes of the class: `links` storing all possible `link.get('href')`, without any `if link`.