## Exercise goal ‚Äì The OpenAI Python library ü§ñüì¶

In the following example, we use the OpenAI to interact
with different Large Language Models.

In this notebook we will:
- interact with **Gemini** via API (cloud-based inference),
- interact with **Ollama** using local inference.

Both approaches expose the same conceptual workflow:
prompt ‚Üí inference ‚Üí completion,
but differ in **where inference happens** and **how models are accessed**.


In [None]:
from openai import OpenAI
from dotenv import load_dotenv
import os
import requests

## Using Google Gemini via API üîë

To interact with Gemini models, inference is performed remotely on Google‚Äôs
infrastructure. This requires an **API key**.

#### Step 1 ‚Äì Create a Gemini API key

Generate the API key here: https://aistudio.google.com/api-keys

#### Step 2 ‚Äì Store the API key safely

Add the key to the `.env` file: `GOOGLE_API_KEY=AIza...`

---
## üåç Conceptual note: Gemini inference

When using Gemini:
- the notebook runs locally,
- the model runs remotely,
- prompts and responses are exchanged via API calls.

This is an example of **cloud-based inference**:
no local GPUs are required, but internet access and credentials are mandatory.


In [None]:
GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"

load_dotenv(override=True)

google_api_key = os.getenv("GOOGLE_API_KEY")

if not google_api_key:
    print("No API key was found - please be sure to add your key to the .env file, and save the file! Or you can skip the next 2 cells if you don't want to use Gemini")
elif not google_api_key.startswith("AIz"):
    print("An API key was found, but it doesn't start AIz")
else:
    print("API key found and looks good so far!")

In [None]:
gemini = OpenAI(base_url=GEMINI_BASE_URL, api_key=google_api_key)

In [None]:
response = gemini.chat.completions.create(model="gemini-2.5-pro", messages=[{"role":"user", "content": "Tell me a funny story"}])
print(response.choices[0].message.content)

## Using Ollama locally ü¶ô

In contrast to Gemini, Ollama allows us to run models **entirely on our local machine**.

Before running any code that uses Ollama, make sure the local server is active.


In [None]:
# Check local server
requests.get("http://localhost:11434").content

If the output is not `Ollama is Running`, then:

### Step 1 ‚Äì Start the Ollama server

Open a terminal and run: `ollama serve`

### Step 2 ‚Äì Ensure a model is installed

Run: `ollama list`

If needed, download a model, running: `ollama pull llama3.2`

In [None]:
OLLAMA_BASE_URL="http://localhost:11434/v1"
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key="ollama")

response = ollama.chat.completions.create(model="llama3.2", messages=[{"role":"user", "content":"Tell me a fairy story!"}])
print(response.choices[0].message.content)

## üß† Important to notice

This exercise shows that LLM interaction patterns are largely independent
from the underlying model family or deployment strategy.
Although the code structure may look similar, the execution model is very different:

- **Gemini** ‚Üí remote inference, API-based, managed infrastructure
- **Ollama** ‚Üí local inference, no API keys, full control over data

This distinction is fundamental when reasoning about:
- privacy,
- latency,
- costs,
- deployment constraints.