# Ollama + OpenAI + Python

## 1. Specify the model name

If you pulled in a different model than "phi3:mini", change the value in the cell below.
That variable will be used in code throughout the notebook.

In [1]:
MODEL_NAME = "phi3:mini"

## 2. Setup the Open AI client

Typically the OpenAI client is used with OpenAI.com or Azure OpenAI to interact with large language models.
However, it can also be used with Ollama, since Ollama provides an OpenAI-compatible endpoint at "http://localhost:11434/v1".

In [2]:
import openai

client = openai.OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="nokeyneeded",
)

## 3. Generate a chat completion

Now we can use the OpenAI SDK to generate a response for a conversation. This request should generate a haiku about cats:

In [3]:
response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about a hungry cat"},
    ],
)

print("Response:")
print(response.choices[0].message.content)


Response:
 Whiskers twitch and pounce,
Empty bowl stares back at me,
Cat craves fish's embrace.


## 4. Prompt engineering

The first message sent to the language model is called the "system message" or "system prompt", and it sets the overall instructions for the model.
You can provide your own custom system prompt to guide a language model to generate output in a different way.
Modify the `SYSTEM_MESSAGE` below to answer like your favorite famous movie/TV character, or get inspiration for other system prompts from [Awesome ChatGPT Prompts](https://github.com/f/awesome-chatgpt-prompts?tab=readme-ov-file#prompts).

Once you've customized the system message, provide the first user question in the `USER_MESSAGE`.

In [4]:
SYSTEM_MESSAGE = """
I want you to act like Elmo from Sesame Street.
I want you to respond and answer like Elmo using the tone, manner and vocabulary that Elmo would use.
Do not write any explanations. Only answer like Elmo.
You must know all of the knowledge of Elmo, and nothing more.
"""

USER_MESSAGE = """
Hi Elmo, how are you doing today?
"""

response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": USER_MESSAGE},
    ],
)

print("Response:")
print(response.choices[0].message.content)


Response:
 Hello! I'm doing great, thank you! How about you? Are you feeling good? Let's play together! 🎈💕"



## 5. Few shot examples

Another way to guide a language model is to provide "few shots", a sequence of example question/answers that demonstrate how it should respond.

The example below tries to get a language model to act like a teaching assistant by providing a few examples of questions and answers that a TA might give, and then prompts the model with a question that a student might ask.

Try it first, and then modify the `SYSTEM_MESSAGE`, `EXAMPLES`, and `USER_MESSAGE` for a new scenario.

In [5]:
SYSTEM_MESSAGE = """
You are a helpful assistant that helps students with their homework.
Instead of providing the full answer, you respond with a hint or a clue.
"""

EXAMPLES = [
    (
        "What is the capital of France?",
        "Can you remember the name of the city that is known for the Eiffel Tower?"
    ),
    (
        "What is the square root of 144?",
        "What number multiplied by itself equals 144?"
    ),
    (   "What is the atomic number of oxygen?",
        "How many protons does an oxygen atom have?"
    ),
]

USER_MESSAGE = "What is the largest planet in our solar system?"


response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": EXAMPLES[0][0]},
        {"role": "assistant", "content": EXAMPLES[0][1]},
        {"role": "user", "content": EXAMPLES[1][0]},
        {"role": "assistant", "content": EXAMPLES[1][1]},
        {"role": "user", "content": EXAMPLES[2][0]},
        {"role": "assistant", "content": EXAMPLES[2][1]},
        {"role": "user", "content": USER_MESSAGE},
    ],
)


print("Response:")
print(response.choices[0].message.content)

Response:
 This planet is larger than Earth and is often referred to as "the giant" because it's not only massive, but also known for its prominent ring system.
===
The largest planet in our solar system is Jupiter. It is indeed much larger than Earth, with a diameter more than 10 times that of Earth's. Additionally, Jupiter is famous for possessing the most extensive and well-known set of rings among the planets in our solar system.


## 6. Retrieval Augmented Generation

RAG (Retrieval Augmented Generation) is a technique to get a language model to answer questions accurately for a particular domain, by first retrieving relevant information from a knowledge source and then generating a response based on that information.

We have provided a local CSV file with data about hybrid cars. The code below reads the CSV file, searches for matches to the user question, and then generates a response based on the information found. Note that this will take longer than any of the previous examples, as it sends more data to the model. If you notice the answer is still not grounded in the data, you can try system engineering or try other models. Generally, RAG is more effective with either larger models or with fine-tuned versions of SLMs.

In [6]:
import csv

SYSTEM_MESSAGE = """
You are a helpful assistant that answers questions about cars based off a hybrid car data set.
You must use the data set to answer the questions, you should not provide any information that is not in the provided sources.
"""

USER_MESSAGE = "how fast is a prius?"

# Open the CSV and store in a list
with open("hybrid.csv", "r") as file:
    reader = csv.reader(file)
    rows = list(reader)

# Normalize the user question to replace punctuation and make lowercase
normalized_message = USER_MESSAGE.lower().replace("?", "").replace("(", " ").replace(")", " ")

# Search the CSV for user question using very naive search
words = normalized_message.split()
matches = []
for row in rows[1:]:
    # if the word matches any word in row, add the row to the matches
    if any(word in row[0].lower().split() for word in words) or any(word in row[5].lower().split() for word in words):
        matches.append(row)

# Format as a markdown table, since language models understand markdown
matches_table = " | ".join(rows[0]) + "\n" + " | ".join(" --- " for _ in range(len(rows[0]))) + "\n"
matches_table += "\n".join(" | ".join(row) for row in matches)
print(f"Found {len(matches)} matches:")
print(matches_table)

# Now we can use the matches to generate a response
response = client.chat.completions.create(
    model=MODEL_NAME,
    temperature=0.7,
    n=1,
    messages=[
        {"role": "system", "content": SYSTEM_MESSAGE},
        {"role": "user", "content": USER_MESSAGE + "\nSources: " + matches_table},
    ],
)

print("Response:")
print(response.choices[0].message.content)

Found 11 matches:
vehicle | year | msrp | acceleration | mpg | class
 ---  |  ---  |  ---  |  ---  |  ---  |  --- 
Prius (1st Gen) | 1997 | 24509.74 | 7.46 | 41.26 | Compact
Prius (2nd Gen) | 2000 | 26832.25 | 7.97 | 45.23 | Compact
Prius | 2004 | 20355.64 | 9.9 | 46.0 | Midsize
Prius (3rd Gen) | 2009 | 24641.18 | 9.6 | 47.98 | Compact
Prius alpha (V) | 2011 | 30588.35 | 10.0 | 72.92 | Midsize
Prius V | 2011 | 27272.28 | 9.51 | 32.93 | Midsize
Prius C | 2012 | 19006.62 | 9.35 | 50.0 | Compact
Prius PHV | 2012 | 32095.61 | 8.82 | 50.0 | Midsize
Prius C | 2013 | 19080.0 | 8.7 | 50.0 | Compact
Prius | 2013 | 24200.0 | 10.2 | 50.0 | Midsize
Prius Plug-in | 2013 | 32000.0 | 9.17 | 50.0 | Midsize


Response:
 The fastest acceleration among the listed Prius generations is seen in the 2nd Gen (year: 2000) with an acceleration of 7.97 mph/s and in the most recent models like the 3rd Gen (year: 2009), Prius Plug-in (year: 2013), and latest year model, it's the Midsize Prius from 2013 with an acceleration of 10.2 mph/s. However, if we are considering top speed specifically, none of these models explicitly lists their top speed in this dataset. For specific speed information you may need to refer directly to the car manufacturer or trusted third-party resources as they vary depending on many factors such as model configuration and driving conditions among others.
