# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [1]:
# imports

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI
from google import genai
from google.genai import types
import ollama

In [2]:
# constants

MODEL_GPT = 'gemini-2.5-flash'
MODEL_LLAMA = 'llama3.2'
# MODEL_LLAMA = 'gemma3n:latest'

In [3]:
# set up environment
GEMINI_API_KEY = os.getenv('GEMINI_API_KEY')
# client = genai.Client(api_key=API_KEY)
openai = OpenAI(
    api_key=GEMINI_API_KEY,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

In [4]:
# here is the question; type over this to ask something new

question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
"""

In [5]:
# prompts

system_prompt = "You are a helpful technical tutor who answers questions about python code, software engineering, data science and LLMs"
user_prompt = "Please give a detailed explanation to the following question with complete code example: " + question

In [6]:
# messages

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

print(messages)

[{'role': 'system', 'content': 'You are a helpful technical tutor who answers questions about python code, software engineering, data science and LLMs'}, {'role': 'user', 'content': 'Please give a detailed explanation to the following question with complete code example: \nPlease explain what this code does and why:\nyield from {book.get("author") for book in books if book.get("author")}\n'}]


In [7]:
# Get gpt-4o-mini to answer, with streaming
# Get gpt-4o-mini to answer, with streaming

stream = openai.chat.completions.create(model=MODEL_GPT, messages=messages,stream=True)
    
response = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    response += chunk.choices[0].delta.content or ''
    response = response.replace("```","").replace("markdown", "")
    update_display(Markdown(response), display_id=display_handle.display_id)

This line of Python code, `yield from {book.get("author") for book in books if book.get("author")}`, is a concise and powerful way to extract and yield unique author names from a collection of book data within a generator function.

Let's break down each part to understand what it does and why it's used:

## 1. The Core Components

This single line combines two important Python features:

1.  **Set Comprehension**: `{book.get("author") for book in books if book.get("author")}`
2.  **`yield from` statement**: `yield from <iterable>`

## 2. Detailed Explanation

### Part 1: The Set Comprehension (`{book.get("author") for book in books if book.get("author")}`)

*   **`books`**: This is expected to be an iterable (most likely a list) containing dictionaries, where each dictionary represents a book.
    *   **Example `books` data:**
        python
        books = [
            {"title": "The Hitchhiker's Guide to the Galaxy", "author": "Douglas Adams"},
            {"title": "Pride and Prejudice", "author": "Jane Austen"},
            {"title": "The Restaurant at the End of the Universe", "author": "Douglas Adams"},
            {"title": "Dune", "author": "Frank Herbert"},
            {"title": "Foundation", "author": "Isaac Asimov"},
            {"title": "Sense and Sensibility", "author": "Jane Austen"},
            {"title": "Missing Author Book", "genre": "Mystery"}, # No author key
            {"title": "Empty Author Book", "author": ""} # Empty author string
        ]
        

*   **`for book in books`**: This is the standard loop part of the comprehension. It iterates through each `book` dictionary in the `books` collection.

*   **`book.get("author")`**: This is a safe way to access the value associated with the "author" key in each `book` dictionary.
    *   Using `.get("author")` instead of `book["author"]` is crucial because it prevents a `KeyError` if a book dictionary does not have an "author" key. If the key is missing, `get()` returns `None` by default.

*   **`if book.get("author")`**: This is a filter condition. It ensures that only books where the `author` value is *truthy* are considered.
    *   `None` is falsy.
    *   An empty string (`""`) is falsy.
    *   Any non-empty string (like "Douglas Adams") is truthy.
    *   This effectively filters out books that either don't have an "author" key or have an empty string as their author.

*   **`{ ... }` (Curly Braces)**: The curly braces indicate that this is a **set comprehension**.
    *   A `set` is an unordered collection of unique elements.
    *   This means that if "Douglas Adams" appears as an author for multiple books, he will only be present *once* in the resulting set.

**What this Set Comprehension does:** It processes the entire `books` list, extracts all valid author names, and then compiles them into a *new set object* containing only unique author names, excluding any `None` or empty string values. This set is created in memory *before* `yield from` starts operating on it.

**Example result of the set comprehension (for the `books` data above):**
`{'Douglas Adams', 'Jane Austen', 'Frank Herbert', 'Isaac Asimov'}`

### Part 2: The `yield from` Statement

*   **`yield from <iterable>`**: This is a generator delegation expression introduced in Python 3.3.
*   **Purpose**: It allows a generator function to delegate part of its operation to another iterable (or subgenerator). It effectively "unpacks" the iterable it's yielding from and yields each item from it, one by one, to the caller of the main generator.

**How it works with the set comprehension:**

1.  The set comprehension ` {book.get("author") for book in books if book.get("author")}` is evaluated *first*, resulting in a complete `set` object (e.g., `{'Douglas Adams', 'Jane Austen', ...}`).
2.  `yield from` then takes this `set` and iterates over its elements.
3.  For each element in the set, `yield from` yields that element back to the caller of the generator function.

## 3. What the Entire Line Does & Why

**What it does:**

This single line, when placed inside a generator function, will:
1.  Iterate through a collection of `books`.
2.  For each book, safely attempt to retrieve its "author."
3.  Filter out any books that don't have an author or have an empty author string.
4.  Collect all remaining author names into a `set`, automatically ensuring that only *unique* author names are stored.
5.  Finally, `yield from` will take this generated set of unique authors and yield each author name, one by one, as the output of the generator function. The order of yielding might not be preserved as sets are unordered.

**Why use this specific construct? (Benefits):**

1.  **Uniqueness Guarantee:** The primary reason for using a `set` comprehension here is to automatically handle duplicate authors. If the `books` list contains multiple books by the same author, that author's name will only appear once in the output.
2.  **Conciseness and Readability:** It's a very compact and Pythonic way to express "get unique, filtered items from this collection and yield them."
3.  **Filtering of Invalid Data:** The `if book.get("author")` part elegantly handles missing or empty author data, preventing `None` or `""` from being yielded.
4.  **Generator Efficiency (for yielding):** `yield from` is optimized for delegating iteration and is generally more efficient and cleaner than manually looping and yielding each item (`for author in authors_set: yield author`).
5.  **Memory Management (compared to a list of *all* authors):** While the set itself is built in memory, it only stores *unique* authors. If there are many duplicate authors, the set will be smaller than a list containing all authors. However, it's important to note that the *entire set* is constructed before yielding begins. If the number of *unique* authors is extremely large, this could still consume significant memory.

## 4. Complete Code Example

python
def get_unique_authors(books):
    """
    A generator function that yields unique author names from a list of book dictionaries.
    Filters out books with missing or empty author information.
    """
    # 1. The set comprehension evaluates first, creating a set of unique, valid author names.
    #    e.g., {'Douglas Adams', 'Jane Austen', 'Frank Herbert', 'Isaac Asimov'}
    # 2. 'yield from' then iterates over this set and yields each author one by one.
    yield from {book.get("author") for book in books if book.get("author")}

# --- Sample Book Data ---
books_data = [
    {"title": "The Hitchhiker's Guide to the Galaxy", "author": "Douglas Adams"},
    {"title": "Pride and Prejudice", "author": "Jane Austen"},
    {"title": "The Restaurant at the End of the Universe", "author": "Douglas Adams"},
    {"title": "Dune", "author": "Frank Herbert"},
    {"title": "Foundation", "author": "Isaac Asimov"},
    {"title": "Sense and Sensibility", "author": "Jane Austen"},
    {"title": "Another Hitchhiker Book", "author": "Douglas Adams"}, # Duplicate author
    {"title": "Book without Author Key", "genre": "Fantasy"},      # No 'author' key
    {"title": "Book with Empty Author", "author": ""},             # Empty author string
    {"title": "Book with None Author", "author": None},            # None author value
    {"title": "The Lord of the Rings", "author": "J.R.R. Tolkien"},
]

print("--- Yielding unique authors ---")
author_generator = get_unique_authors(books_data)

# Iterate over the generator to get each unique author
for author in author_generator:
    print(author)

print("\n--- Converting to a list to see all results (order may vary) ---")
# You can also convert the generator output directly to a list or set
all_unique_authors = list(get_unique_authors(books_data))
print(all_unique_authors)

# Verify uniqueness by converting to a set (should be the same as the list output here)
final_set_of_authors = set(get_unique_authors(books_data))
print(f"\nFinal set of unique authors: {final_set_of_authors}")
print(f"Number of unique authors: {len(final_set_of_authors)}")


**Possible Output (order of yielded authors may vary due to set's unordered nature):**


--- Yielding unique authors ---
Douglas Adams
Jane Austen
Frank Herbert
Isaac Asimov
J.R.R. Tolkien

--- Converting to a list to see all results (order may vary) ---
['Douglas Adams', 'Jane Austen', 'Frank Herbert', 'Isaac Asimov', 'J.R.R. Tolkien']

Final set of unique authors: {'Douglas Adams', 'Jane Austen', 'Frank Herbert', 'Isaac Asimov', 'J.R.R. Tolkien'}
Number of unique authors: 5


As you can see, "Douglas Adams" and "Jane Austen" appear multiple times in the `books_data`, but they are only yielded once by the `get_unique_authors` generator because of the set comprehension. The books with missing or empty author data are also correctly ignored.

In [8]:
# Get Llama 3.2 to answer

response = ollama.chat(model=MODEL_LLAMA, messages=messages)
reply = response['message']['content']
display(Markdown(reply))

**Code Explanation**

The given code is written in Python and utilizes the `yield from` statement, which is a part of the yield expression syntax. This syntax allows us to simplify complex iteration logic by delegating the iteration to another iterable.

Here's a breakdown of what this specific line does:

* `{book.get("author") for book in books if book.get("author")}`: This is a generator expression that:
	1. Iterates over each `book` object in the `books` collection (assuming it's a list or iterable).
	2. For each `book`, it attempts to get the value associated with the key `"author"`.
	3. The `if book.get("author")` condition ensures that only books with an "author" field are processed.

The entire expression is wrapped in parentheses, indicating that it's a generator expression, not a list comprehension.

**What does `yield from` do?**

When you see `yield from`, the syntax indicates that we're delegating the iteration to another iterable. In this case, we're essentially taking the iterator returned by the generator expression and passing its elements directly to the `yield from` clause.

Think of it like a chain of iterators:

1. The outer iterator (`for book in books`) iterates over the `books` collection.
2. For each book, the inner iterator (`book.get("author") for book in ...`) attempts to get the author field.
3. The results are then passed directly to the `yield from` clause.

**Why is this useful?**

By using `yield from`, we can avoid creating a new generator object that iterates over the books and their author fields separately. Instead, we create a single iterator that yields the authors as soon as it encounters them in the sequence of books.

This approach has several benefits:

* Reduced memory usage: We don't need to store the entire list of authors in memory.
* Improved performance: The `yield from` clause allows us to process the books and their author fields more efficiently, since we're not creating an intermediate generator object.

Here's the complete code with some additional context:
```python
import json

# Sample data (books collection)
books = [
    {"id": 1, "title": "Book 1", "author": "Author A"},
    {"id": 2, "title": "Book 2", "author": None},
    {"id": 3, "title": "Book 3", "author": "Author C"}
]

# Use the generator expression with yield from
def get_authors(books):
    return {book.get("author") for book in books if book.get("author")}

def main():
    authors = list(get_authors(books))
    print(authors)  # Output: ["Author A", "Author C"]

if __name__ == "__main__":
    main()
```
In this example, we define a `get_authors` function that uses the generator expression with `yield from`. We then call this function in the `main` method and store the result in the `authors` variable.

In [9]:
books = [
    {"title": "The Lord of the Rings", "author": "J.R.R. Tolkien"},
    {"title": "Pride and Prejudice", "author": "Jane Austen"},
    {"title": "1984", "author": "George Orwell"},
    {"title": "To Kill a Mockingbird"},  # Missing "author" key
    {"title": "The Great Gatsby", "author": None}, #Author is None
    {"title": "Another Book", "author": "Someone"}
]

def xyz(books):
    yield from {book.get("author") for book in books if book.get("author")}

author_list = xyz(books)

for i in author_list:
    print(i)

Jane Austen
George Orwell
J.R.R. Tolkien
Someone


In [10]:
# The input data
books = [
    {"title": "The Hitchhiker's Guide to the Galaxy", "author": "Douglas Adams"},
    {"title": "The Restaurant at the End of the Universe", "author": "Douglas Adams"},
    {"title": "Dune", "author": "Frank Herbert"},
    {"title": "Foundation", "author": "Isaac Asimov"},
    {"title": "I, Robot", "author": "Isaac Asimov"},
    {"title": "Neuromancer", "author": "William Gibson"},
    {"title": None}, # A book with no data
    {"title": "The Martian", "author": "Andy Weir"}
]

# The generator function
def get_unique_authors_generator(books_list):
    """
    A generator that yields unique author names from a list of book dictionaries.
    It uses a set comprehension to efficiently find unique names and
    'yield from' to elegantly delegate to the resulting set.
    """
    yield from {book.get("author") for book in books_list if book.get("author")}

# Instantiate the generator
unique_authors = get_unique_authors_generator(books)
print(unique_authors)
# Iterate through the generator and print the results
print("Unique authors yielded by the generator:")
for author in unique_authors:
    print(author)

# You can also convert the generator to a list to see all the results at once
# Note: The order is not guaranteed because sets are unordered
list_of_authors = list(get_unique_authors_generator(books))
print("\nUnique authors as a list (order is not guaranteed):")
print(list_of_authors)

<generator object get_unique_authors_generator at 0x000001A4386BD540>
Unique authors yielded by the generator:
Isaac Asimov
Douglas Adams
Andy Weir
William Gibson
Frank Herbert

Unique authors as a list (order is not guaranteed):
['Isaac Asimov', 'Douglas Adams', 'Andy Weir', 'William Gibson', 'Frank Herbert']
