# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [8]:
# imports

import os
import requests
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display, update_display

In [17]:
# constants

MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2:1b'

In [18]:
# set up environment

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the API key
if not api_key:
    print("No API key was found - please check your .env file")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them")
else:
    print("API key found and looks good so far!")

# Create OpenAI client
openai = OpenAI()

# Create Ollama client (uses OpenAI-compatible endpoint)
OLLAMA_BASE_URL = "http://localhost:11434/v1"
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')

API key found and looks good so far!


In [19]:
# here is the question; type over this to ask something new

question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
"""

In [20]:
# Get gpt-4o-mini to answer, with streaming

# Create a system prompt that sets the context for answering technical questions
system_prompt = """
You are a helpful technical assistant with expertise in software engineering, 
system design, and architecture. You provide clear, detailed explanations of 
technical concepts and code. Structure your explanations to include:
1. What the code does
2. Why it's written this way
3. Common use cases or benefits
4. Any important considerations or alternatives
"""

# Create messages for the API call
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": question}
]

# Call OpenAI with streaming enabled
stream = openai.chat.completions.create(
    model=MODEL_GPT,
    messages=messages,
    stream=True
)

# Display the streaming response with typewriter effect
response = ""
display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    response += chunk.choices[0].delta.content or ''
    update_display(Markdown(response), display_id=display_handle.display_id)

Sure! Let's break down the code snippet you provided:

### 1. What the code does
The code snippet uses a generator expression along with the `yield from` statement to yield authors from a collection of book dictionaries (`books`).

- The part inside the curly braces `{...}` is a set comprehension that iterates over each `book` in the `books` iterable.
- For each `book`, it attempts to get the value associated with the "author" key using `book.get("author")`.
- The `if book.get("author")` condition filters the books so that only those with a non-empty or non-None author are considered.
- The resulting set generated by `{book.get("author") for book in books if book.get("author")}` contains unique author names.
- The `yield from` statement then returns each author in the set one at a time as a generator.

### 2. Why it's written this way
- **Efficiency**: Using a set comprehension `{...}` automatically ensures the uniqueness of author names, as sets do not allow duplicate values.
- **Conciseness**: This code is compact, and makes use of Python's powerful generator expressions/ comprehensions to transform and filter data in one line.
- **Lazy Evaluation**: By using `yield from`, the authors are produced one at a time. This is particularly beneficial for processing large datasets without loading everything into memory at once.

### 3. Common use cases or benefits
- This type of code is particularly useful in data processing tasks where you have a collection of records (like books) and you want to extract unique values from specific fields (like authors).
- It can be used in situations like generating reports, summaries, or user lists based on available data, often in APIs or data pipelines.
- Additionally, because the authors are yielded one at a time, it allows for easy integration into asynchronous programming paradigms, where results can be processed as they're available.

### 4. Any important considerations or alternatives
- **Handling Missing Data**: The use of `get()` is beneficial because it avoids `KeyError` exceptions if the "author" key doesn't exist. However, if the data structure can be guaranteed to have certain keys, a direct indexing might be used for performance reasons.
- **Performance**: For extremely large datasets, the memory consumption of the set could become a concern, depending on how many authors there are. Alternatives might include maintaining a list and then filtering duplicates, but this would require additional logic.
- **Multi-threading**: If this code is used in a multi-threaded context, and `books` is being modified by other threads, it could lead to unexpected behavior. Consider using thread-safe collections or locking mechanisms.
- **Python Version**: This syntax works in Python 3. If you are using a version below that, you would need to adjust your approach since `yield from` is not present in Python 2. 

In summary, this code provides an efficient, clear way to extract unique authors from a list of book dictionaries, leveraging Python's powerful syntax for set comprehensions and generators.

In [21]:
requests.get("http://localhost:11434").content

b'Ollama is running'

In [22]:
!ollama pull llama3.2:1b

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling 74701a8c35f6: 100% ▕██████████████████▏ 1.3 GB                         [K
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da: 100% ▕██████████████████▏ 7.7 KB                         [K
pulling a70ff7e570d9: 100% ▕██████████████████▏ 6.0 KB                         [K
pulling 4f659a1e86d7: 100% ▕██████████████████▏  485 B                         [K
verifying sha256 digest [K
writing manifest ⠋ [K[?25h[?2026l[?2026h[?25l[A[A[A[A[A[A[A[1Gpulling man

In [None]:
# Get Llama 3.2 to answer

# Note: Make sure Ollama is running and llama3.2 is pulled
# Run these commands in terminal if needed:
# ollama serve
# ollama pull llama3.2

# Create messages for the API call (same format as OpenAI)
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": question}
]

# Call Ollama (using OpenAI-compatible endpoint)
response = ollama.chat.completions.create(
    model=MODEL_LLAMA,
    messages=messages
)

# Display the response in markdown format
display(Markdown(response.choices[0].message.content))

**Code Explanation**

The given code snippet is written in Python and appears to be part of a larger script. Let's break down what it does:

```python
yield from {book.get("author") for book in books if book.get("author")]
```

Here's a step-by-step explanation:

1. `{}`: This is an empty dictionary comprehension, also known as a mapping expression or selector function.
2. `for book in books`: This iterable (like a list, tuple, or set) iterates over the `books` collection.
3. `if book.get("author")`: Within each iteration of the outer loop, this condition checks whether the current book has an "author" key. If it does, the code inside this block (the `yield from`) will be executed.
4. `yield from ...`: This is where things get interesting. The syntax `yield from` is a built-in Python feature called **generator expression**. It's equivalent to a Python 3.x `for** dict` loop.

In summary, the code snippet generates a sequence of values for the "author" key in each book dictionary, and yields them one by one as they are generated.

**Why it's written this way**

The generator expression with `yield from` is often preferred over traditional loops for several reasons:

* **Conciseness**: When using `yield from`, you only need to specify the values that should be yielded. Traditional loops typically require you to iterate until a specific condition, then discard the iteration state and continue iterating further in the original loop.
* **Efficiency**: Since generator expressions are just computations being run in memory at a time, they can avoid loading entire datasets into memory or processing them one-by-one.
* **Easy iteration**: `yield from` makes it easy to iterate over collections that contain dictionaries, lists of objects with an "author" key, or other iterable data structures.

**Common use cases or benefits**

This technique is particularly useful in situations where you need to perform complex computations on a large dataset without loading the entire dataset into memory. Some examples include:

* Filtering and transforming large datasets using dictionary comprehensions.
* Processing nested dictionaries with complex access patterns (e.g., "author" inside another book dictionary).
* Implementing infinite enumerators or iterators on large databases.

**Important considerations or alternatives**

While `yield from` is a powerful tool, it has its limitations as well:

* **Performance overhead**: The extra Python overhead can be noticeable for very large datasets.
* **Memory usage**: Generator expressions can still allocate memory to store the iteration values; this might be significant if the output cannot fit into a larger buffer or cache.

When using `yield from`, always keep these factors in mind.

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling ma