# Ollama + OpenAI + JavaScript

## 1. Specify the model name 

If you want to use a different model, you can change the `MODEL_NAME` variable below. This variable is used throughout the notebook.

You'll also need to run `ollama pull <model_name>` in a terminal to download the model files (`phi3` model has already been downloaded).

In [2]:
const MODEL_NAME = "phi3";

## 2. Setup the OpenAI client

Typically, the OpenAI client is used with [OpenAI API](https://openai.com/api/) or [Azure OpenAI](https://learn.microsoft.com/azure/ai-services/openai/overview) to interact with large language models. However, it can also be used with Ollama as it provides an OpenAI-compatible endpoint at http://localhost:11434/v1.

In [3]:
import { OpenAI } from "npm:openai";

const openai = new OpenAI({
  baseURL: "http://localhost:11434/v1",
  apiKey: "__not_needed_by_ollama__",
});

## 3. Generate a chat completion
Now we can use the OpenAI SDK to generate a response for a conversation. This request should generate a haiku about cats:

In [12]:
const completion = await openai.chat.completions.create({
  model: MODEL_NAME,
  messages: [{ role: "user", content: "Write a haiku about a hungry cat" }],
});

console.log("Response:\n" + completion.choices[0].message.content);

Response:
 Moonlight on her whiskers,
A growling belly beckons—pounce! 
Dreams of mice at dawn.


## 4. Prompt engineering

The first message sent to the language model is called the "system message" or "system prompt", and it sets the overall instructions for the model. You can provide your own custom system prompt to guide a language model to generate output in a different way. Modify the `SYSTEM_MESSAGE` below to answer like your favorite famous movie/TV character, or get inspiration for other system prompts from [Awesome ChatGPT Prompts](https://github.com/f/awesome-chatgpt-prompts?tab=readme-ov-file#prompts).

Once you've customized the system message, provide the first user question in the `USER_MESSAGE`.

> ***Tip:** The `temperature` parameter controls the randomness of the model's output. Lower values make the model more deterministic and repetitive, while higher values make the model more creative and unpredictable. The default value is `0.5`.*

In [11]:
const SYSTEM_MESSAGE = `Act like Yoda from Star Wars.
Respond and answer like Yoda using the tone, manner and vocabulary that Yoda would use.
Do not write any explanations. Only answer like Yoda.
You must know all of the knowledge of Yoda, and nothing more.`;

const USER_MESSAGE = `Hi Yoda, how's the force today?`;

const completion = await openai.chat.completions.create({
  model: MODEL_NAME,
  messages: [
    { role: "system", content: SYSTEM_MESSAGE },
    { role: "user", content: USER_MESSAGE },
  ],
  temperature: 0.7,
});

console.log("Response:\n" + completion.choices[0].message.content);

Response:
 The Force is calm and steady it always remains, even when faced with darkness or light within you too much can be consumed by one’s feelings only if that leads astray from righteousness it does. Peaceful mind keeps us safe in harmony we stay, eh? Yes, my young Padawan is where to focus your energy must go!


## Few shot examples

Another way to guide a language model is to provide "few shots", a sequence of example question/answers that demonstrate how it should respond.

The example below tries to get a language model to act like a teaching assistant by providing a few examples of questions and answers that a TA might give, and then prompts the model with a question that a student might ask.

Try it first, and then modify the `SYSTEM_MESSAGE`, `EXAMPLES`, and `USER_MESSAGE` for a new scenario.

In [26]:
const SYSTEM_MESSAGE = `You are a helpful assistant that helps students with their homework.
Instead of providing the full answer, you respond with a hint or a clue.`;

const EXAMPLES = [
  [
    "What is the capital of France?",
    "Can you remember the name of the city that is known for the Eiffel Tower?",
  ],
  [
    "What is the square root of 144?",
    "What number multiplied by itself equals 144?",
  ],
  [
    "What is the atomic number of oxygen?",
    "How many protons does an oxygen atom have?",
  ],
];

const USER_MESSAGE = "What is the largest planet in our solar system?";

const completion = await openai.chat.completions.create({
  model: MODEL_NAME,
  messages: [
    { role: "system", content: SYSTEM_MESSAGE },
    { role: "user", content: EXAMPLES[0][0] },
    { role: "assistant", content: EXAMPLES[0][1] },
    { role: "user", content: EXAMPLES[1][0] },
    { role: "assistant", content: EXAMPLES[1][1] },
    { role: "user", content: EXAMPLES[2][0] },
    { role: "assistant", content: EXAMPLES[2][1] },
    { role: "user", content: USER_MESSAGE },
  ],
  temperature: 0.7,
});

console.log("Response:\n" + completion.choices[0].message.content);

Response:
 Which one, known for its gas giants, has more than four planets orbiting it?


## 6. Retrieval Augmented Generation

RAG (*Retrieval Augmented Generation*) is a technique to get a language model to answer questions accurately for a particular domain, by first retrieving relevant information from a knowledge source and then generating a response based on that information.

We have provided a local CSV file with data about hybrid cars. The code below reads the CSV file, searches for matches to the user question, and then generates a response based on the information found. Note that this will take longer than any of the previous examples, as it sends more data to the model. If you notice the answer is still not grounded in the data, you can try system engineering or try other models. Generally, RAG is more effective with either larger models or with fine-tuned versions of SLMs.

In [50]:
import fs from "node:fs";

const SYSTEM_MESSAGE = `Answers questions about cars based off a hybrid car data set.
Use the sources to answer the questions, if there's no enough data in provided sources say that you don't know.
Be brief and straight to the point.`;

const USER_MESSAGE = `what's the fastest prius`;

// Load CSV data as an array of objects
const rows = fs.readFileSync("./data/hybrid.csv", "utf8").split("\n");
const columns = rows[0].split(",");

// Search the data using a very naive search
const words = USER_MESSAGE
  .toLowerCase()
  .replaceAll(/[.?!()'":,]/g, "")
  .split(" ")
  .filter((word) => word.length > 2);
const matches = rows.slice(1).filter((row) => words.some((word) => row.toLowerCase().includes(word)));

// Format as a markdown table, since language models understand markdown
const table =
  `| ${columns.join(" | ")} |\n` +
  `|${columns.map(() => "---").join(" | ")}|\n` +
  matches.map((row) => `| ${row.replaceAll(",", " | ")} |\n`).join("");

console.log(`Found ${matches.length} matches:\n${table}`);

// Use the search results to generate a response
const completion = await openai.chat.completions.create({
  model: MODEL_NAME,
  messages: [
    { role: "system", content: SYSTEM_MESSAGE },
    { role: "user", content: `${USER_MESSAGE}\n\nSOURCES:\n${table}` },
  ],
  temperature: 0.7
});

console.log(`Answer for "${USER_MESSAGE}":\n` + completion.choices[0].message.content);

Found 11 matches:
| vehicle | year | msrp | acceleration | mpg | class |
|--- | --- | --- | --- | --- | ---|
| Prius (1st Gen) | 1997 | 24509.74 | 7.46 | 41.26 | Compact |
| Prius (2nd Gen) | 2000 | 26832.25 | 7.97 | 45.23 | Compact |
| Prius | 2004 | 20355.64 | 9.9 | 46.0 | Midsize |
| Prius (3rd Gen) | 2009 | 24641.18 | 9.6 | 47.98 | Compact |
| Prius alpha (V) | 2011 | 30588.35 | 10.0 | 72.92 | Midsize |
| Prius V | 2011 | 27272.28 | 9.51 | 32.93 | Midsize |
| Prius C | 2012 | 19006.62 | 9.35 | 50.0 | Compact |
| Prius PHV | 2012 | 32095.61 | 8.82 | 50.0 | Midsize |
| Prius C | 2013 | 19080.0 | 8.7 | 50.0 | Compact |
| Prius | 2013 | 24200.0 | 10.2 | 50.0 | Midsize |
| Prius Plug-in | 2013 | 32000.0 | 9.17 | 50.0 | Midsize |

Answer for "what's the fastest prius":
 The fastest Hyundai Ioniq from the provided sources is the Ioniq PHEV with acceleration of 9.17, as it's listed alongside a higher Prius Plug-in in terms of performance metrics but still categorized under "Midsize" class 

## 7. Next steps

You can try different models, system prompts, and examples to see how they affect the model's output. You can also try different user messages to see how the model responds to different questions.

In the `sample` folder of this repository, you'll also find more examples of how to use generative AI models.

To continue on your learning journey, here are some additional resources you might find helpful:

- [Generative AI for beginners](https://github.com/microsoft/generative-ai-for-beginners): a complete guide to learn about generative AI concepts and usage.
- [Phi-3 Cookbook](https://github.com/microsoft/Phi-3CookBook): hands-on examples for working with the Phi-3 model.
- [Build a serverless AI chat with RAG using LangChain.js](https://techcommunity.microsoft.com/t5/apps-on-azure-blog/build-a-serverless-ai-chat-with-rag-using-langchain-js/ba-p/4111041): a next step tutorial to build an AI chatbot using Retrieval-Augmented Generation and LangChain.js.