# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [19]:
# imports

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI
import ollama

In [4]:
# constants

MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'

In [5]:
# set up environment

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
openai = OpenAI()

API key looks good so far


In [6]:
# here is the question; type over this to ask something new

question = """
Please explain what this code does and why:
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
"""

In [20]:
# Get gpt-4o-mini to answer, with streaming

system_prompt = "You are an assistant that is able to take a technical question and respond with an explanation. \
Respond in markdown. You tone should be caring and patient as if you are a tutor. \
If there are more than one line of code in the code snippet, explain the code line by line."

def get_tech_question_prompt(question):
    user_prompt = f"You are trying to asnwer a technical question or explain some lines of code. "
    user_prompt += f"Answer the question in markdown. Here is the question: \n"
    user_prompt += question
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

def stream_answer_question_gpt(question):
    stream = openai.chat.completions.create(
        model = MODEL_GPT,
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_tech_question_prompt(question)}
        ],
        stream=True
    )
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("markdown", "")
        update_display(Markdown(response), display_id = display_handle.display_id)    

stream_answer_question_gpt(question)

Sure! Let's break down the provided code line by line, so you can understand what it does and why each part is important.

### Code Explanation

```python
if soup.body:
```
- **What it does:** This line checks if the `soup` object (which is likely a Beautiful Soup object from the `bs4` library used to parse HTML) has a `body` element.
- **Why:** The `body` element contains the content of the HTML document. If there is no body (for example, if the document is empty or malformed), the code should handle it gracefully.

```python
    for irrelevant in soup.body(["script", "style", "img", "input"]):
```
- **What it does:** This line iterates over all elements within the `soup.body` that match the tags specified in the list: `script`, `style`, `img`, and `input`.
- **Why:** These elements are typically not useful for text extraction, as `script` and `style` contain code and styles, `img` contains images, and `input` elements may not provide relevant textual information. By filtering them out, the goal is to clean the extracted text of unnecessary content.

```python
        irrelevant.decompose()
```
- **What it does:** For each `irrelevant` element found in the previous loop, this line calls the `decompose()` method on it.
- **Why:** The `decompose()` method removes the element from the Beautiful Soup parse tree, which means it will no longer be considered in subsequent operations. This helps ensure that only relevant text is extracted later.

```python
    self.text = soup.body.get_text(separator="\n", strip=True)
```
- **What it does:** This line retrieves the text content from `soup.body` using the `get_text()` method.
- **Why:** The `separator="\n"` argument indicates that line breaks should be inserted between different fragments of text to improve readability, and `strip=True` removes any leading or trailing whitespace. The resulting text is then assigned to an instance variable `self.text`, which presumably will be used later in the code or returned as part of a function.

```python
else:
    self.text = ""
```
- **What it does:** If there is no `soup.body`, this line simply sets `self.text` to an empty string.
- **Why:** This is a safeguard to ensure that `self.text` always has a value. If there’s no content to extract, it's better to have an empty string than not defining `self.text` at all.

### Summary
In summary, this code snippet is designed to extract and clean textual content from the body of an HTML document while removing irrelevant elements like scripts, images, and styles to ensure the output is useful. If the body doesn't exist, it safely sets the text to an empty string. This process is helpful in text processing applications where clean text data is needed from HTML sources.

In [None]:
# Get Llama 3.2 to answer

ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

def stream_answer_question_llama(question):
    stream = ollama_via_openai.chat.completions.create(
    model=MODEL_LLAMA,
    messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_tech_question_prompt(question)}
        ],
        stream=True
    stream = openai.chat.completions.create(
        model = MODEL_GPT,
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_tech_question_prompt(question)}
        ],
        stream=True
    )
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("markdown", "")
        update_display(Markdown(response), display_id = display_handle.display_id)    

stream_answer_question(question)



print(response.choices[0].message.content)