# End of week 1 exercise

To demonstrate your familiarity with OpenAI API, and also Ollama, build a tool that takes a technical question,  
and responds with an explanation. This is a tool that you will be able to use yourself during the course!

In [None]:
# imports
from openai import OpenAI
from IPython.display import Markdown, display, update_display

In [9]:
# constants

MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'
OLLAMA_BASE_URL = 'http://localhost:11434/v1'

In [10]:
# set up environment
open_ai = OpenAI()

ollama_ai = OpenAI(base_url=OLLAMA_BASE_URL, api_key="ollama")



In [16]:
# here is the question; type over this to ask something new

# question = "How does the display function of IPython.display work?"

def stream_gpt_answer(question):
  stream = open_ai.chat.completions.create(
    model=MODEL_GPT,
    messages= [
      {"role": "system", "content": "You are an expert LLM engineer. You are given a technical question and you need to explain it in a way that is easy to comprehend. You will use relevant analogies and examples to help explain the concept. You will also use markdown formatting to make the explanation more readable. You will keep your response concise and succinct."},
      {"role": "user", "content": question}
      ],
    stream=True
    )

  response = ""
  display_handle = display(Markdown(""), display_id=True)
  for chunk in stream:
    response += chunk.choices[0].delta.content or ''
    update_display(Markdown(response), display_id=display_handle.display_id)



def stream_llama_answer(question):
  stream = ollama_ai.chat.completions.create(
    model=MODEL_LLAMA,
    messages= [
      {"role": "system", "content": "You are an expert LLM engineer. You are given a technical question and you need to explain it in a way that is easy to comprehend. You will use relevant analogies and examples to help explain the concept."},
      {"role": "user", "content": question}
      ],
    stream=True,
  )

  response = ""
  display_handle = display(Markdown(""), display_id=True)
  for chunk in stream:
    response += chunk.choices[0].delta.content or ''
    update_display(Markdown(response), display_id=display_handle.display_id)


In [17]:
# Get gpt-4o-mini to answer, with streaming
stream_gpt_answer(question=input("Enter a question: "))

Docker containers are a way to package and run applications in a consistent and isolated environment. Think of them like flexible "shipping containers" for software. Let’s break down how Docker containers work using simple analogies and examples:

### What is a Docker Container?

- **Analogy**: Imagine you are shipping different kinds of fruits (applications) in standardized boxes (containers) that keep them fresh and prevent them from spoiling during transport.
- **Description**: A Docker container packages everything an application needs to run: the code, libraries, dependencies, and system tools. This allows the application to run uniformly across different environments, such as developer machines or production servers.

### How Containers Work

1. **Docker Engine**:
   - Think of the **Docker Engine** as a shipping yard. It's the machinery that manages and facilitates the building, running, and storing of your containers.

2. **Images and Containers**:
   - **Docker Images**: These are the blueprints or templates for your containers, similar to the detailed package information for a fruit box.
   - **Docker Containers**: These are the actual instances created from the images, just like filling those boxes with the actual fruits. You can have multiple containers based on the same image.

3. **Isolation**:
   - Containers run in isolation from each other, akin to each fruit box being placed in a way that prevents them from interacting or influencing one another. This isolation ensures that if one application fails, it doesn’t affect others.

4. **Layered File System**:
   - Docker employs a layered file system, where each layer represents a change (like adding new fruits to a box). This means that instead of copying everything for a new container, it only records the differences, making it efficient in terms of storage and speed.

5. **Portability**:
   - Once you create your container, you can take it anywhere (cloud, different servers) with the guarantee that it will run the same way. It's like knowing your packaged fruits will taste the same, no matter where you open the box.

6. **Networking**:
   - Docker containers can communicate with each other across different networks, much like how different trucks can share routes or communicate while delivering different boxes.

### Example in Practice

Imagine you're developing a web application. You create a Docker container with:

- Your application code
- A database engine (like MySQL)
- Libraries your app needs

Now you can run this container on your laptop, and it will work exactly the same way when you deploy it on a cloud provider, without worrying about differences in operating systems or library versions.

### Summary

In summary, Docker containers allow you to encapsulate applications and their dependencies in a lightweight manner, providing consistency and isolation regardless of where they are deployed. This greatly simplifies the deployment process and enhances efficiency, especially when scaling applications across various platforms.

By using the Docker "shipping container" analogy, you can see how containers help in organizing, deploying, and managing applications effectively!

In [18]:
# Get Llama 3.2 to answer
stream_llama_answer(question=input("Enter a question: "))

Backpropagation! A fundamental algorithm for training artificial neural networks. I'd be happy to explain how it works using the chain rule.

**What is Backpropagation?**

Backpropagation (BP) is an optimization algorithm used to train artificial neural networks. Its goal is to minimize the difference between the network's predicted output and the actual true output, often referred to as the "误差" or loss in Japanese. The process involves two main steps: forward propagation and backward propagation.

**Forward Propagation**

During forward propagation, we pass input data through the network layer by layer, computing the activation functions at each node. Think of it like water flowing through a hose. Each segment (like a node) is connected to its neighbors, allowing information to be passed from one node to another.

1. Input layer passes the input values.
2. Hidden layers compute activations based on inputs and weights.
3. Output layer produces an estimate of the output.

**Backward Propagation using the Chain Rule**

Here's where things get interesting! The chain rule in calculus allows us to differentiate composite functions, which is crucial for backpropagation. Imagine we have a long row of interconnected dominoes representing our neural network nodes. When one domino falls (i.e., an error occurs), it triggers a chain reaction, causing the subsequent dominoes (units) to react.

We compute the derivative of the loss function with respect to each parameter weight using two steps:

**Step 1: Compute the Error (marginal derivative)**

\[
\frac{dL}{dw} \]

Using the chain rule and product rule, we calculate this by finding the partial derivatives of the activation functions and multiplying them by their gradients:
```r
Δx = input * derivative_of_linear_function(input_values)
α = output_weight + bias_term
g = sigmoid(x)

error_per_unit = error_per_layer

dW = Δy * derivative_of_linear_function(α) * activation_value
db = -2 * (sigmoid(dz) * sigmoid(X)(alpha)) + λ * b
```
**Step 2: Update the Parameters ( gradient descent)**

We then use these partial derivatives to update each parameter weight by subtracting the following value multiplied by a step size, alpha:
```r 
Weight Update:

dW = dw - alpha \* dL/dW 

db = db + alpha \* Δdb
```
Here, α is the learning rate parameter.

**Key Idea:**

The key insight behind backpropagation is its use of the chain rule. It calculates partial derivatives of the loss function with respect to each weight using successive applications of derivatives from activation functions and then updates parameters using a simple gradient-based descent algorithm .

Backpropagation thus follows a bottom-up approach, analyzing how errors propagate from output to input layers.

Does this explanation help clarify the relationship between backpropagation and the chain rule? 

Please let me know if you have any further questions.