# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [15]:
# imports
import os
import requests
import json
from dotenv import load_dotenv
from openai import OpenAI
from IPython.display import Markdown, display, update_display


In [24]:
# constants

MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'

In [4]:
# set up environment

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')
if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
openai = OpenAI()

API key looks good so far


In [11]:
# here is the question; type over this to ask something new

question = """
Please explain what this code does and why:
yield from {book.get("author") for book in books if book.get("author")}
"""

In [18]:
# Get gpt-4o-mini to answer, with streaming

system_prompt = """
                    You are a helpful assistant and act as a code explainer. 
                    You will be given one or more code snippets along with questions about them. 
                    For each code snippet, you will:
                    1. Explain what the code does in detail.
                    2. Describe why it is written that way.
                    3. Highlight potential limitations or issues in the code.
                    4. Suggest improvements or alternatives if applicable.
                    5. Answer the specific question provided about the code.

                    You will respond in code blocks, formatted as JSON.

                    If multiple examples are provided, repeat the structure for each example.
                    Given the the example of input and output code for your reference
                    examples = [
                                {
                                    "code": "numbers = [1, 2, 3, 4, 5]\nsquared = [x**2 for x in numbers]",
                                    "question": "Why is a list comprehension used here?"
                                },
                                {
                                    "code": "def factorial(n):\n    if n == 0:\n        return 1\n    return n * factorial(n - 1)",
                                    "question": "What are the limitations of this implementation?"
                                }
                            ]
                    [
                    {
                        "example": {
                            "code": "numbers = [1, 2, 3, 4, 5]\nsquared = [x**2 for x in numbers]",
                            "explanation": "This code creates a list of numbers and uses a list comprehension to generate a new list containing the squares of each number.",
                            "question": "Why is a list comprehension used here?",
                            "answer": "List comprehensions are concise and Pythonic, making the code easier to read and write compared to a traditional for loop.",
                            "improvements": "If memory usage is a concern, consider using a generator expression instead of a list comprehension."
                        }
                    },
                    {
                        "example": {
                            "code": "def factorial(n):\n    if n == 0:\n        return 1\n    return n * factorial(n - 1)",
                            "explanation": "This function calculates the factorial of a number using recursion. It multiplies the number by the factorial of the number minus one until it reaches the base case of 0.",
                            "question": "What are the limitations of this implementation?",
                            "answer": "The recursive implementation can lead to a RecursionError for large inputs due to Python's recursion depth limit.",
                            "improvements": "Use an iterative approach to avoid recursion depth issues and improve performance."
                        }
                    }
                ]
            """



In [27]:
def get_answer(question, system_prompt, model=MODEL_GPT):
    if model == MODEL_LLAMA:
        ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
        stream = ollama_via_openai.chat.completions.create(
            model= model,
            messages=[
                {"role": "system","content": system_prompt},
                {"role": "user", "content": question}
            ],
            stream=True,
            temperature=0
        )
    else:
        stream = openai.chat.completions.create(
            model= model,
            messages=[
                {"role": "system","content": system_prompt},
                {"role": "user", "content": question}
            ],
            stream=True,
            temperature=0
        )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ""
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [20]:
get_answer(question, system_prompt, model=MODEL_GPT)

json
{
    "example": {
        "code": "yield from {book.get(\"author\") for book in books if book.get(\"author\")}",
        "explanation": "This code uses a generator to yield authors from a collection of books. It creates a set comprehension that extracts the 'author' from each book in the 'books' iterable, but only if the 'author' key exists and is not None. The 'yield from' statement allows the generator to yield each author one at a time, effectively flattening the output from the set comprehension.",
        "question": "What is the purpose of using 'yield from' in this context?",
        "answer": "'yield from' is used to yield all values from the set comprehension directly, allowing the generator to produce each author without needing to iterate through the set explicitly. This makes the code cleaner and more efficient.",
        "improvements": "If the intention is to yield unique authors, using a set is appropriate. However, if the order of authors matters, consider using a list instead of a set. Additionally, ensure that 'books' is defined and is an iterable of dictionaries to avoid runtime errors."
    }
}


In [28]:
# Get Llama 3.2 to answer


get_answer(question, system_prompt, model=MODEL_LLAMA)

json
{
    "explanation": "This code generates a generator expression that yields the authors of books from the `books` collection.",
    "why": "The `yield from` statement is used to delegate to a sub-generator or iterator, allowing the outer generator to produce values from the inner one. In this case, it's used to iterate over the `books` collection and yield the authors of each book. The `get("author")` method is called on each book dictionary to retrieve the author's value.",
    "limitations": "This code assumes that all books have an 'author' key in their dictionaries. If a book is missing this key, it will raise a KeyError. Additionally, if the list of books is very large, this generator expression may consume a lot of memory due to the use of a dictionary lookup on each iteration.",
    "improvements": "Consider using a more robust way to handle missing keys, such as using the `get()` method with a default value or checking for the key's existence before trying to access it. Alternatively, consider using a database query or an external data source if the list of books is very large."
}


As for your question about why this code uses a generator expression instead of a traditional loop:

json
{
    "answer": "The `yield from` statement and generator expressions are used here because they provide a concise and memory-efficient way to iterate over the `books` collection. This is particularly useful when working with large datasets, as it avoids loading the entire dataset into memory at once."
}


Note that this code can be rewritten using a traditional loop for better readability and maintainability:

python
def get_authors(books):
    authors = []
    for book in books:
        author = book.get("author")
        if author is not None:
            authors.append(author)
    yield from authors


However, the original code using `yield from` is more concise and Pythonic.