# End of week 1 exercise
Computer Engineering Tutor. User can ask the question and it answer assuming the user has some engineering background.

## Sample computer-science questions (copy into the prompt)

Try one of these example questions to get started. They are tailored to computer science and software engineering topics:

- Explain Big O notation and give examples for common algorithms (e.g., sorting, searching).
- How does a hash table handle collisions and what are common collision-resolution strategies?
- What's the difference between processes and threads, and when should you use each?
- How does TCP differ from UDP, and in which scenarios would you choose one over the other?
- Explain ACID vs BASE in databases and when each consistency model is appropriate.
- What is a race condition and how can you prevent it in concurrent programs?
- How does an LRU cache work and where would you use it?
- Describe the compilation pipeline from source code to machine code (lexing, parsing, optimization, codegen).

## Setup & requirements

This notebook demonstrates using both the OpenAI API and a local Ollama server to answer technical questions. Before running the notebook, make sure you complete the following setup steps.

1) Python environment and dependencies

 - Create and activate a virtual environment (recommended).
 - Install the project dependencies:
```bash
pip install -r requirements.txt
```

2) Create a `.env` file

 - In the repository root create a file named `.env` and add your OpenAI API key:
```text
OPENAI_API_KEY=sk-<your-openai-key>
```
 - Optionally you can add an Ollama API key variable if you use one (not required for the default local Ollama setup):
```text
OLLAMA_API_KEY=ollama
```
 - This notebook reads `OPENAI_API_KEY` from the environment. If you choose to use `OLLAMA_API_KEY`, adjust the client instantiation accordingly.

3) Install and run Ollama (local LLM server)

 - Install Ollama following their official instructions for your OS (macOS, Linux, Windows).
 - Start the Ollama server so it is reachable on `localhost:11434`. For many installations the server listens on port `11434` by default.
   - Example (if using the Ollama CLI): run the Ollama service so the API is available at `http://localhost:11434`.
 - If you change the port or host, update the `base_url` used for the Ollama client in the notebook.

4) Quick verification (once Ollama is running)

 - Check the daemon is reachable:
```bash
curl -I http://localhost:11434/
```
 - List available models (use the exact model id reported here in the notebook):
```bash
curl http://localhost:11434/v1/models | jq .
```
 - Example model id you may see: `llama3.2:1b`. Use that exact id when calling the model.

5) Notebook client configuration notes

 - The notebook uses the OpenAI-compatible Python client. For Ollama the notebook sets the client like:
```python
ollama = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
```
 - Important: For the OpenAI client installed in this environment the `base_url` must include the `/v1` suffix so requests go to `http://localhost:11434/v1/chat/completions`. If you get a 404 error, check whether the client is calling a path without `/v1` and adjust `base_url` accordingly.

6) Running the notebook

 - After completing steps above, run the cells from top to bottom.
 - If you get errors, first try non-streaming calls to check basic connectivity:
```python
ask_llm(ollama, "Say hi", MODEL_LLAMA, stream=False)
```
 - If non-streaming works but streaming fails, the streaming loop may need to be adjusted to match the chunk format returned by your OpenAI client version; the notebook contains defensive streaming handling to help with this.

## Where to find diagnostics

 - A diagnostic code cell follows the environment setup; run it to print the OpenAI package version, the Ollama `/v1/models` response, and a quick non-streaming test reply.

Troubleshooting tips

 - `Connection refused` or no response: Ollama daemon is not running or wrong port.
 - `404 page not found`: incorrect base_url or missing `/v1` in requests.
 - `model not found`: use the exact model id reported by `/v1/models` (e.g., `llama3.2:1b`).
 - `401/403`: authentication mismatch â€” either remove the auth header for a local server that doesn't require it or set the correct token.

In [1]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
# constants

MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2:1b'

In [3]:
# set up environment
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    

openai = OpenAI()
ollama = OpenAI(base_url="http://localhost:11434/v1" , api_key="ollama")

# set up system prompt
system_prompt = "You are a the engineering teacher. Anwswer the users breifly in 1-3 paragraphs. Do not go in too much details. Assume user has some prior knowledge of engineering concepts."

API key looks good so far


In [4]:
# Diagnostic: environment, OpenAI package and Ollama checks. This is just to make sure all is setup properly.
import requests
import json
import sys
import traceback

import openai  # Only for version check, does not redefine the client  # noqa: F811

print("openai version:", getattr(openai, '__version__', None))
print("python:", sys.version.splitlines()[0])
print('\nChecking Ollama /v1/models endpoint...')
try:
    r = requests.get('http://localhost:11434/v1/models', headers={'Authorization':'Bearer ollama'}, timeout=5)
    print('models status:', r.status_code)
    try:
        print(json.dumps(r.json(), indent=2))
    except Exception:
        print(r.text)
except Exception as e:
    print('Could not reach Ollama models endpoint:', e)

print('\nTrying a non-streaming chat completion via the OpenAI client (ollama)...')
try:
    resp = ollama.chat.completions.create(
        model=MODEL_LLAMA,
        messages=[{"role":"system","content":"You are a test."},{"role":"user","content":"Say hi"}],
        stream=False,
    )
    try:
        content = resp.choices[0].message.content
    except Exception:
        try:
            content = resp.get('choices',[{}])[0].get('message',{}).get('content')
        except Exception:
            content = str(resp)
    print('Assistant reply:', content)
except Exception:
    traceback.print_exc()

openai version: 2.6.1
python: 3.12.11 (main, Aug 28 2025, 17:00:08) [Clang 20.1.4 ]

Checking Ollama /v1/models endpoint...
models status: 200
{
  "object": "list",
  "data": [
    {
      "id": "llama3.2:1b",
      "object": "model",
      "created": 1763146276,
      "owned_by": "library"
    }
  ]
}

Trying a non-streaming chat completion via the OpenAI client (ollama)...
Assistant reply: Hi! How can I assist you today?


In [None]:
def ask_llm(model_client, user_prompt, model, stream=False):
    message = [{"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt}]

    try:
        response = model_client.chat.completions.create(
            model=model,
            messages=message,
            stream=stream,
        )
    except Exception as e:
        import traceback
        print("Request failed:", e)
        traceback.print_exc()
        return
    display(Markdown("Model: " + model))
    display(Markdown("UserPrompt:" + user_prompt))
    if stream:
        streaming = ""
        display_handle = display(Markdown(""), display_id=True)
        for chunk in response:
            streaming += chunk.choices[0].delta.content or ''
            streaming = streaming.replace("```","").replace("markdown", "")
            update_display(Markdown(streaming), display_id=display_handle.display_id)

    else:
        return display(Markdown(response.choices[0].message.content))

In [6]:
#Ask user to provide question.
question = input("Please enter your engineering question: ");

In [13]:
# Get gpt-4o-mini to answer, with streaming
ask_llm(openai, question, MODEL_GPT, stream=True)

Model: gpt-4o-mini UserPrompt:Describe the compilation pipeline from source code to machine code (lexing, parsing, optimization, codegen).

The compilation pipeline is a structured process that transforms high-level source code into machine code that can be executed by a computer. It typically consists of several key stages: lexing, parsing, optimization, and code generation.

1. **Lexing**: This is the initial stage where the compiler reads the raw source code and breaks it down into tokens. Tokens are the basic building blocks like keywords, operators, and identifiers. The lexer eliminates white spaces and comments, producing a stream of tokens for further processing.

2. **Parsing**: In this stage, the parser takes the token stream produced by the lexer and constructs a syntax tree (or abstract syntax tree based on language grammar). It verifies that the sequence of tokens adheres to the rules of the language and captures the hierarchical structure of the program. Any syntax errors are identified during this phase.

3. **Optimization and Code Generation**: After parsing, the compiler may optimize the syntax tree or intermediate representation to improve performance, reduce resource consumption, or enhance execution speed. Finally, the code generation phase translates this optimized representation into machine code, producing an object file or executable that the processor can execute.

Each stage is crucial for ensuring the efficiency and correctness of the final machine code.

In [12]:
# Get Llama 3.2 to answer
ask_llm(ollama, question, MODEL_LLAMA, stream=True)

Model: llama3.2:1b UserPrompt:Describe the compilation pipeline from source code to machine code (lexing, parsing, optimization, codegen).

Here's a brief overview of the compilation pipeline:

**Lexing**: The first step is **Lexical Analysis**, where the programmer writes the source code and the lexical analysis tools break it into individual tokens. These tokens may include keywords, identifiers, operators, symbols, etc.

* Lexicography: This is the domain of languages and grammar rules.
* Syntax analyzer: This tool analyzes the tokens to identify grammatical structures and errors.

**Parsing**: After lexical analysis, the programmer uses **Syntax Analysis** tools to construct a parse tree, which shows the hierarchical structure of the program (e.g., how the operands are assigned values).

* Parsing context: This is the data passed between parsing steps.
* Error reporting: If there are syntax errors, they're reported and the parser will report the error location.

**Optimization**: The compiler optimizes intermediate language code to improve execution performance. This involves:

* **Code generation:** Moving functions into separate files, using registers for immediate values, etc.
* **Dead code elimination:** Removing unnecessary code blocks.
* **Constant folding:** Evaluating large expressions at compile time.

**Code generation**: The final step is **Machine Code Generation**, where the compiler converts the optimized Intermediate Language (IL) code into assembly listing and then into machine code instructions that the computer's processor can execute directly.

* Instruction selection: Choosing between different instruction sets.
* Register allocation: Determining which registers are available to store operands.
* Macro expansion: Expanding macro definitions into regular functions.