## <p style="font-family: Georgia; font-weight: normal; letter-spacing: 2px; color: #fabd2f;     font-size: 140%; text-align: left; padding: 0px; border-bottom: 3px solid #fabd2f;">Hello, Ollama!</p>

This notebook is meant to be a tiny lab for experimenting with local LLMs via Ollama.

What we do here:

- Pull a small local model with Ollama
- Send a simple one-shot prompt from Python
- Try **streaming** responses
- Build a minimal **chat loop** that keeps history
- Add **JSON logging** so each chat session is saved for later analysis

### 1. Load a local model with Ollama

In [1]:
import ollama

First, we pick a model name and aks Ollama to pull it locally. If you already have it, this is basically a no-op.

You can swap `phi3` fro any other model you've installed.

In [2]:
MODEL_NAME = "phi3" # will try this one first then maybe "qwen:3b"
ollama.pull(MODEL_NAME)

ProgressResponse(status='success', completed=None, total=None, digest=None)

### 2. Simple helper

This helper:

- Sends a **single** user message.
- Uses a fixed `temperature` and `num_predict` (max new tokens).
- Returns the full response as a string (non-streaming: we wait until it’s done).

In [3]:
def generate_response(prompt: str) -> str:
    response = ollama.chat(
        model = MODEL_NAME,
        messages=[{"role": "user", "content": prompt}],
        options={
            "temperature": 0.7,
            "num_predict": 512,
        }
    )
    return response['message']['content']

In [4]:
print(generate_response("Hello Ollama, how are you today?"))

I'm an AI and I don't have feelings, but thank you for asking! How can I assist you on this fine day? If there is anything specific that concerns or interests you, please let me know. Whether it's information, problem-solving, creativity sparking questions – just shoot away!


### 3. Let's try streaming!

Now we switch to **streaming mode**:

- Instead of waiting for the full answer, we iterate over chunks.
- Each chunk contains part of the text, so we can print it as it arrives.
- This is how modern chat UIs feel fast and “alive”.

In [5]:
def generate_streaming(prompt: str):
    stream = ollama.chat(
        model = MODEL_NAME,
        messages = [{
            "role": "user",
            "content": prompt
        }],
        stream=True,
        options={
            "temparature": 0.7,
            "num_predict": 256
        }
    )
    
    full = []
    for chunk in stream:
        content = chunk['message']['content']
        print(content, end="", flush=True)
        full.append(content)
    print()
    return "".join(full)

If you run the cell below, you’ll see the answer appear **token by token**,
instead of all at once at the end.

This is important for UX: even if the model takes a while to finish,
the user sees progress immediately.

In [6]:
generate_streaming('Why is the sky blue?')

The sky appears blue to us due to a phenomenon called Rayleigh scattering. When sunlight enters Earth' end, it encounters molecules and small particles in our atmosphere which cause the light to scatter. Blue light scatters more because its shorter wavelength is better suited for interaction with these tiny atmospheric components than longer-wavelength red or yellow light. So when we look up at the sky away from the sun, it’s dominated by this scattered blue light that reaches our eyes, making the sky appear predominantly blue during daylight hours.


"The sky appears blue to us due to a phenomenon called Rayleigh scattering. When sunlight enters Earth' end, it encounters molecules and small particles in our atmosphere which cause the light to scatter. Blue light scatters more because its shorter wavelength is better suited for interaction with these tiny atmospheric components than longer-wavelength red or yellow light. So when we look up at the sky away from the sun, it’s dominated by this scattered blue light that reaches our eyes, making the sky appear predominantly blue during daylight hours."

### 4. Let's try simple chat loop!

Next, we build a tiny CLI chat loop:

- Keeps a `history` list of `{role, content}` messages.
- Sends the **full history** on each request, so the model has context.
- Streams tokens as they come.
- Type `exit`, `\q` or `\quit` to end the session.

This is the first baby step toward a *real* local agent.

In [None]:
import ollama

MODEL_NAME = "phi3"  # You can change to another model if you have it pulled locally


def chat_loop(model=MODEL_NAME, temperature=1.1, num_predict=256):
    history = []
    print(f"Starting chat with {model}. Type 'exit' to quit.")

    while True:
        try:
            user_input = input("> ")
        except (KeyboardInterrupt, EOFError):
            print("\nending chat...")
            break

        if user_input.strip().lower() in {"exit", "\\q", "\\quit"}:
            print("ending chat...")
            break

        history.append({"role": "user", "content": user_input})

        stream = ollama.chat(
            model=model,
            messages=history,
            options={
                "temperature": temperature,  # try more randomness
                "num_predict": num_predict,
            },
            stream=True,
        )
        full_chunks = []
        print(f"{model}: ", end="")
        for chunk in stream:
            content = chunk["message"]["content"]
            print(content, end="", flush=True)
            full_chunks.append(content)
        print()

        assistant_message = "".join(full_chunks)
        history.append({"role": "assistant", "content": assistant_message})


if __name__ == "__main__":
    chat_loop()


Starting chat with phi3. Type 'exit' to quit.


You can also try this loop from a standalone script
(e.g. `chat_loop.py`) so it feels more like a tiny local “terminal chat app”.

### 5. Logging chat history to JSON

Right now, `history` only lives in RAM. Once the process exits, the whole
conversation is gone.

To take a small step toward an **agent brain**, we’ll:

- Log each chat session to a **single JSON file**.
- Use a simple schema with:
  - `model`, `temperature`, `num_predict`
  - `started_at`, `ended_at`
  - `messages`: list of `{role, content, timestamp}`

This is enough to:

- Re-read conversations later.
- Use logs as evaluation data.
- Feed logs into future RAG / analytics tools.

In [None]:
from datetime import datetime
import json
import os
import ollama

MODEL_NAME = "gemma3:1b"

def chat_loop(model: str = MODEL_NAME, temperature: float = 1.1, num_predict: int = 512):
    history = []
    print(f"Starting chat with {model}. Type '\\exit', '\\q' or '\\quit' to quit.")
    
    session_start = datetime.now().isoformat
    log = {
        "model": model,
        "temperature": temperature,
        "num_predict": num_predict,
        "started_at": session_start,
        "messages": [],
    }
    
    try:
        while True:
            try:
                user_input = input("> ")
            except (KeyboardInterrupt, EOFError):
                print("\nerror, ending chat...")
                break
            
            if user_input.strip().lower() in {"\\exit", "\\q", "\\quit"}:
                print("\n\quit command, ending chat...")
                break
            
            now = datetime.now().isoformat()
            
            user_msg = {"role": "user", "content": user_input}
            history.append(user_msg)
            log["messages"].append(
                {
                    "role": "user",
                    "content": user_input,
                    "timestamp": now,
                }
            )
            
            stream = ollama.chat(
                model=model,
                messages=history,
                options={
                    "temperature": temperature,  # try more randomness
                    "num_predict": num_predict,
                },
                stream=True,
            )
            full_chunks = []
            print(f"{model}: ", end="")
            for chunk in stream:
                content = chunk["message"]["content"]
                print(content, end="", flush=True)
                full_chunks.append(content)
            print()
            
            assistant_message = "".join(full_chunks)
            assistant_msg = {"role": "assistant", "content": assistant_message}
            history.append(assistant_msg)
            
            now = datetime.now().isoformat()
            
            log["messages"].append(
                {
                    "role": "assistant",
                    "content": assistant_message,
                    "timestamp": now,
                }
            )
            
    finally:
        log["ended_at"] = datetime.now().isoformat()
        os.makedirs("chat_logs", exist_ok=True)
        
        safe_start = session_start.replace(":", "-")
        log_path = os.path.join("chat_logs", f"chat_{safe_start}.json")
        
        with open(log_path, "w", encoding="utf-8") as f:
            json.dump(log, f, ensure_ascii=False, indent=4)
            
        print(f"chat log saved to {log_path}")

### 6. Using the loggin loop

To try this logging variant:

1. Run `chat_loop_with_logging()` in this notebook **or** from a script.
2. Ask a few questions.
3. Exit with `\exit`, `\q`, `\quit` or Ctrl+C.
4. Check the `chat_logs/` directory – you should see a file like:

   ```text
   chat_logs/chat_2025-11-30T16-25-12.345678.json
   ```