## Step by step process

Download, install and starts [Ollama](https://ollama.com/)

Run `ollama run gemma2:2b`

Start chatting with a Gemma model that runs directly on your laptop!

Then you can explore many more models here!

# Generate content

In [4]:
import ollama
from IPython.display import Markdown, display

def get_response_from_model(prompt, model):
    response = ollama.generate(model=model, prompt=prompt)
    return response['response']

# Define the prompt and model
prompt = "Is Ollama the best tool I have ever encountered?"
model = "gemma2:2b"

# Get the response from the model
response = get_response_from_model(prompt, model)

# Display the Markdown
display(Markdown(response))

Whether or not Ollama is the **best** tool you've ever encountered depends entirely on your needs and what you consider "best." 

Here's why:

**Ollama shines in these areas:**

* **Accessibility and affordability:** Being open-source and freely available makes it accessible to a wide range of users, especially those with limited budgets.
* **Flexibility and customization:** Ollama allows for fine-tuning and adaptation to specific tasks and use cases, making it a versatile tool for various projects.
* **Ease of use:**  Ollama offers a user-friendly interface and requires minimal technical expertise to get started.

**However, there are some limitations to consider:**

* **Resource requirements:** While Ollama is relatively lightweight compared to other LLMs, it still demands considerable computational power and resources for optimal performance.
* **Fine-tuning challenges:**  Ollama's flexibility comes with a caveat – fine-tuning can be complex and time-consuming depending on the desired output and dataset. 
* **Limited knowledge scope:**  While Ollama boasts impressive capabilities, it is still under development and its knowledge base might not encompass all the information necessary for complex tasks.

**What makes "best" subjective:**

The best tool depends on your specific context:

* **For research and experimentation:** Ollama's open-source nature and ease of customization make it a strong contender for exploring the potential of LLMs.
* **For text generation and summarization:** Its capabilities in these areas are impressive, especially for user-friendly applications like chatbot development. 
* **For complex tasks requiring specific domain expertise:**  Ollama might be less suitable compared to specialized LLMs focused on particular fields like healthcare or legal jargon.

**Ultimately,** Ollama is a powerful tool with its own strengths and limitations. To determine if it's the "best" for *you*, you need to:

1. **Define your needs:**  What tasks do you want to accomplish? What kind of expertise do you require in the tool?
2. **Compare Ollama to alternatives:** Research other tools available, considering their strengths and weaknesses relative to your specific requirements. 


Instead of seeking the absolute "best," focus on finding the **most suitable** tool for your unique needs and goals. 


# Retieval augmented generation (RAG)

In [1]:
# first ollama pull mxbai-embed-large
import ollama
import chromadb

documents = [
  "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
  "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
  "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
  "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
  "Llamas are vegetarians and have very efficient digestive systems",
  "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old",
]

client = chromadb.Client()
collection = client.create_collection(name="docs")

# store each document in a vector embedding database
for i, d in enumerate(documents):
  response = ollama.embed(model="mxbai-embed-large", input=d)
  embeddings = response["embeddings"]
  collection.add(
    ids=[str(i)],
    embeddings=embeddings,
    documents=[d]
  )

In [2]:
# an example input
input = "What animals are llamas related to?"

# generate an embedding for the input and retrieve the most relevant doc
response = ollama.embed(
  model="mxbai-embed-large",
  input=input
)
results = collection.query(
  query_embeddings=[response["embeddings"][0]],
  n_results=1
)
data = results['documents'][0][0]

In [3]:
# generate a response combining the prompt and data we retrieved in step 2
output = ollama.generate(
  model="gemma2:2b",
  prompt=f"Using this data: {data}. Respond to this prompt: {input}"
)

print(output['response'])

Llamas are related to **vicuñas** and **camels**. 



# More realistic scenario

In [6]:
import ollama
import chromadb
from datasets import load_dataset

# Load the ag_news dataset
dataset = load_dataset("ag_news", split="train[:1000]")  # Using the first 1000 articles for example

# Sample documents from the dataset (for simplicity, use the 'text' field)
documents = dataset['text']

client = chromadb.Client()
collection = client.create_collection(name="news")

# Store each document in a vector embedding database
for i, d in enumerate(documents):
  response = ollama.embed(model="mxbai-embed-large", input=d)
  embeddings = response["embeddings"]
  collection.add(
    ids=[str(i)],
    embeddings=embeddings,
    documents=[d]
  )
