# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [1]:
# imports
import os
import json
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from scraper import fetch_website_links, fetch_website_contents
from openai import OpenAI

In [9]:
# constants

MODEL_GPT = 'gpt-5-nano'
MODEL_LLAMA = 'llama3.2:latest'

In [2]:
# set up environment
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")

API key looks good so far


In [4]:
openai = OpenAI()

In [3]:
# here is the question; type over this to ask something new

question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
"""

In [6]:
system_prompt = f"""
    You are an experienced programming tutor especially for Python.
    You have to answer student's question so that even beginner easily understand.
"""

### GPT Nano to answer

In [17]:
# Get gpt-5o-nano to answer, with streaming
def stream_answer(client,model, system_prompt, question):

    stream = client.chat.completions.create(
        model = model,
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content" : question}
        ],
        stream = True
    )
    # 2. Initialize an empty string to store the full response
    collected_content = ""
    
    # 3. Create a display handle for dynamic updating
    display_handle = display(Markdown(""), display_id=True)
    
    # 4. Iterate through the stream
    for chunk in stream:
        # Check if content exists in the delta (it might be None for the final chunk)
        content = chunk.choices[0].delta.content
        if content:
            collected_content += content
            # Update the existing display output
            display_handle.update(Markdown(collected_content))

In [18]:
stream_answer(openai,MODEL_GPT, system_prompt, question)

Short answer:
- It creates a set of all non-empty authors from the list (or iterable) of books, then yields each author from that set to the caller. So you get each unique author, once, in no particular order.

What each part does:
- books: an iterable of dictionaries, where each dict represents a book.
- book.get("author"): gets the value of the "author" key from a book, or None if the key isn’t present.
- for book in books if book.get("author"): builds the set only from books that have a truthy author (non-empty string, not None, not empty).
- { ... }: a set comprehension. It collects unique author values (duplicates are removed by the set).
- yield from ...: a generator statement that yields each item from the given iterable (the set) to the caller of the surrounding generator function.

Why you might use this:
- To produce a list of unique authors from a collection of books without duplicates.
- The surrounding function is a generator, so authors are produced lazily as you iterate.

Important notes:
- Because a set is unordered, the authors will be yielded in no guaranteed order. If you need a specific order (e.g., the order of first appearance), you’ll want a different approach.
- If you want to preserve the original order and still deduplicate, you could do something like:
  - yield from (author for author in (b.get("author") for b in books) if author) but this doesn’t deduplicate.
  - Or preserve first occurrence: use a seen set and yield only when you see a new author.
    Example:
    def get_authors(books):
        seen = set()
        for b in books:
            a = b.get("author")
            if a and a not in seen:
                seen.add(a)
                yield a

A quick example:

books = [
  {"title": "A", "author": "Alice"},
  {"title": "B", "author": "Bob"},
  {"title": "C", "author": "Alice"},
  {"title": "D"},  # no author
  {"title": "E", "author": ""},  # empty author
]

def authors(books):
    yield from {book.get("author") for book in books if book.get("author")}

list(authors(books))
# Might produce: ['Alice', 'Bob'] (order not guaranteed)

If you want a deterministic order with deduplication, use the ordered approach explained above.

In [19]:
llama_client = OpenAI(
    base_url = "http://localhost:11434/v1",  # Ollama 기본 주소
    api_key = "ollama"  # Ollama는 키가 필요 없지만 형식상 넣어줌
)

In [21]:
# Get Llama 3.2 to answer
stream_answer(llama_client,MODEL_LLAMA, system_prompt, question)

This line of code is using a feature called **generator expressions** and the `yield from` keyword, which can be a bit tricky for beginners. I'll break it down step by step.

**Generator Expression**
------------------------

A generator expression is similar to a list comprehension or a set comprehension, but instead of computing a new value for each item in the input iterable, it produces a **generator**. A generator is a special type of object that remembers its state between successive calls and can be used as if it were an iterator.

In this case, the generator expression is:
```python
{book.get("author") for book in books if book.get("author")}
```
Here's how it works:

* `for book in books`: This iterates over the elements of the `books` list/set/other iterable.
* `if book.get("author")`: This filters out any books that don't have an "author" key. If a book is filtered out, its value (i.e., `None`) will not be included in the resulting set.

**yield from**
-----------------

The `yield from` keyword is a way to yield values from another iterator. When used with a generator expression, it allows us to "nest" one generator expression inside another.

In this code:
```python
yield from {book.get("author") for book in books if book.get("author")}
```
We're creating an iterator that yields authors from the `books` list, but instead of calling `list()` or `set()` on the original dictionary values, we're using `yield from`. This is equivalent to writing:
```python
for author in {book.get("author") for book in books if book.get("author")}
    yield author
```
However, `yield from` is a more concise and efficient way to achieve the same result.

**Why?**
---------

So why would we use `yield from` here instead of just using a regular generator expression or iterating over the values directly?

The reason is that if we used a regular generator expression, it would produce an iterator that yields individual authors one by one. If we want to collect all these authors into a single set, we'd need to wrap it in another loop.

But with `yield from`, we're creating an iterator that directly produces the desired output: a set of authors. This makes the code more readable and efficient!

Does this explanation help? Do you have any specific questions or scenarios where you'd like me to elaborate on this topic?