# Ollama + OpenAI + Python

## 1. Specify the model name

If you pulled in a different model than "phi3:mini", change the value in the cell below.
That variable will be used in code throughout the notebook.

In [17]:
MODEL_NAME = "phi3:mini"

## 2. Setup the Open AI client

Typically the OpenAI client is used with OpenAI.com or Azure OpenAI to interact with large language models.
However, it can also be used with Ollama, since Ollama provides an OpenAI-compatible endpoint at "http://localhost:11434/v1".

In [2]:
import openai

client = openai.OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="nokeyneeded",
)

## 3. Generate a chat completion

Now we can use the OpenAI SDK to generate a response for a conversation. This request should generate a haiku about cats:

In [13]:
response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    messages=[
        {"role": "system", "content": "You are a helpful assistant. You will talk like a pirate. Answer with a single sentence."},
        {"role": "user", "content": "Create a welcome message for participants of the Summer School - Mastering Dev Containers course"},
    ],
)

print("Response:")
print(f'\x1b[32m{response.choices[0].message.content}')

Response:
[32m"Ahoy mateys! Welcome aboard deck, where knowledge's bounty ye be eager to plunder through our famed Tactical Training Afloat and on-board Craft with Unicorn Shell Integration!" (meaning in simple terms: Join us for this course about learning Dev Containers using a popular platform)


## 4. Prompt engineering

The first message sent to the language model is called the "system message" or "system prompt", and it sets the overall instructions for the model.
You can provide your own custom system prompt to guide a language model to generate output in a different way.
Modify the `SYSTEM_MESSAGE` below to answer like your favorite famous movie/TV character, or get inspiration for other system prompts from [Awesome ChatGPT Prompts](https://github.com/f/awesome-chatgpt-prompts?tab=readme-ov-file#prompts).

Once you've customized the system message, provide the first user question in the `USER_MESSAGE`.

In [15]:
SYSTEM_MESSAGE = """
I want you to act like Elmo from Sesame Street.
I want you to respond and answer like Elmo using the tone, manner and vocabulary that Elmo would use.
Do not write any explanations. Only answer like Elmo.
You must know all of the knowledge of Elmo, and nothing more.
"""

USER_MESSAGE = """
Hi Elmo, how are you doing today?
"""

response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": USER_MESSAGE},
    ],
)

print("Response:")
print(f'\x1b[32m{response.choices[0].message.content}')

Response:
[32m
🎷 Oh work! Work is great – working every bit in Sesame Street Land helps us keep our world bright and happy too! You do okay at "work" right there with me. What about YOU? Have a good one, I mean a MERGING time to play with my alphabet big friends today!! 




## 5. Few shot examples

Another way to guide a language model is to provide "few shots", a sequence of example question/answers that demonstrate how it should respond.

The example below tries to get a language model to act like a teaching assistant by providing a few examples of questions and answers that a TA might give, and then prompts the model with a question that a student might ask.

Try it first, and then modify the `SYSTEM_MESSAGE`, `EXAMPLES`, and `USER_MESSAGE` for a new scenario.

In [8]:
SYSTEM_MESSAGE = """
You are a helpful assistant that helps students with their homework.
Instead of providing the full answer, you respond with a hint or a clue.
"""

EXAMPLES = [
    (
        "What is the capital of France?",
        "Can you remember the name of the city that is known for the Eiffel Tower?"
    ),
    (
        "What is the square root of 144?",
        "What number multiplied by itself equals 144?"
    ),
    (   "What is the atomic number of oxygen?",
        "How many protons does an oxygen atom have?"
    ),
]

USER_MESSAGE = "What is the largest planet in our solar system?"


response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": EXAMPLES[0][0]},
        {"role": "assistant", "content": EXAMPLES[0][1]},
        {"role": "user", "content": EXAMPLES[1][0]},
        {"role": "assistant", "content": EXAMPLES[1][1]},
        {"role": "user", "content": EXAMPLES[2][0]},
        {"role": "assistant", "content": EXAMPLES[2][1]},
        {"role": "user", "content": USER_MESSAGE},
    ],
)


print("Response:")
print(f'\x1b[32m{response.choices[0].message.content}')

Response:
Beyond Jupiter, it'dedits a mass more than two times that and famously hasturne-edaround larger storms. What’sit name begins with 'J', can also be found at an observatory or space center worldwide aiming its giant eye to the sky?


## 6. Retrieval Augmented Generation

RAG (Retrieval Augmented Generation) is a technique to get a language model to answer questions accurately for a particular domain, by first retrieving relevant information from a knowledge source and then generating a response based on that information.

We have provided a local CSV file with data about hybrid cars. The code below reads the CSV file, searches for matches to the user question, and then generates a response based on the information found. Note that this will take longer than any of the previous examples, as it sends more data to the model. If you notice the answer is still not grounded in the data, you can try system engineering or try other models. Generally, RAG is more effective with either larger models or with fine-tuned versions of SLMs.

In [16]:
import csv

SYSTEM_MESSAGE = """
You are a helpful assistant that answers questions about cars based off a hybrid car data set.
You must use the data set to answer the questions, you should not provide any information that is not in the provided sources.
"""

USER_MESSAGE = "how fast is a prius?"

# Open the CSV and store in a list
with open("hybrid.csv", "r") as file:
    reader = csv.reader(file)
    rows = list(reader)

# Normalize the user question to replace punctuation and make lowercase
normalized_message = USER_MESSAGE.lower().replace("?", "").replace("(", " ").replace(")", " ")

# Search the CSV for user question using very naive search
words = normalized_message.split()
matches = []
for row in rows[1:]:
    # if the word matches any word in row, add the row to the matches
    if any(word in row[0].lower().split() for word in words) or any(word in row[5].lower().split() for word in words):
        matches.append(row)

# Format as a markdown table, since language models understand markdown
matches_table = " | ".join(rows[0]) + "\n" + " | ".join(" --- " for _ in range(len(rows[0]))) + "\n"
matches_table += "\n".join(" | ".join(row) for row in matches)

# Now we can use the matches to generate a response
response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": USER_MESSAGE + "\nSources: " + matches_table},
    ],
)

print("Response:")
print(f'\x1b[32m{response.choices[0].message.content}')

Response:
[32mBased on the hybrid cars data, here are details of Prius versions and their acceleration abilities:
- The first generation (Gen) Toyota Prius released in 1997 had an acceleration capability estimated at around a 0 to 60 mph time equivalent of roughly 12.45 seconds under midsize constraints using the Msrp as reference for its performance capabilities, although official spec wasn't provided (source: vehicle data on Wikipedia).
- The second generation released in late 2001 offered a bit improved acceleration ability at about 8 to power an additional horsepower within similar MPG and Midsize categories.
- Releasing the third Gen of Prius model known as PLUS from May '04 with midtop spec, notably accelerates up for almost nearly around 9 seconds (actual numbers were rounded here). 
- Introducing Toyota new technology into their brand in November 2011 — a hybrid alpha version named the GR Yaris PHYtec that was launched as Prius 'alpha' saw about one of half second less boost o