
# Local Retrieval Augmented Generator (RAG) Pipeline in Python via Ollama


## Quick Start Guide for Ollama Setup

**Ollama** is an open-source Large Language Model (LLM) backend server that streamlines the deployment of LLMs on local environments, utilizing both CPU and GPU resources

1. **Download and Install Ollama**  
   [Download Ollama]((https://ollama.com/download)) and follow the installation instructions

2. **Select Models**  
   Browse and choose supported models from the [Ollama library](https://ollama.com/library)

3. **Pull the Models**  
   Open a terminal and pull the following models:

> ```bash
> ollama pull llama2      # Language model
> ollama pull all-minilm  # Embedding model
> ```

4. **Install the [Ollama Python library](https://github.com/ollama/ollama-python/blob/main/README.md)**:

> ```bash
> pip install ollama==0.1.8 # Install Ollama Python library (version 0.1.8)
> ```

5. **Verify Model Execution**  
   Run the model in the terminal to verify it works

> ```bash
> ollama run llama2
> ```

6. **Start the Ollama Service for Jupyter Notebook Connection**  
   Run the following command in the terminal to start the Ollama service in the background:

> ```bash
> ollama serve &
> ```


#### Key Features of Ollama 

- **Optimized Performance:** Efficiently leverages both CPU and GPU hardware to maximize the speed and performance of supported LLMs
- **Flexible Deployment:** Supports easy setup and deployment on local machines, enabling developers full control over model training and inference
- **Scalable Architecture:** Designed to handle varying workloads, making it suitable for both small-scale projects and large enterprise applications


In [70]:


# Import the Ollama library to interact with the language models
import ollama

# Send a chat request to the 'llama2' model with a user message
response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',  # Message content that the user sends to the model
  },
])

# Print the response from the model, displaying the answer to the user's question
print(response['message']['content'])




The sky appears blue because of a phenomenon called Rayleigh scattering, which occurs when sunlight travels through the Earth's atmosphere. The sun emits light across the entire electromagnetic spectrum, but the atmosphere scatters the shorter wavelengths (such as blue and violet) more than the longer wavelengths (such as red and orange). This means that when sunlight reaches our eyes, we see mainly the blue and violet light that has been scattered in all directions by the atmosphere.

The reason for this scattering is due to the tiny molecules of gases present in the atmosphere, such as nitrogen and oxygen. These molecules are much smaller than the wavelength of visible light, so they don't absorb or reflect the light directly. Instead, they scatter it in all directions, giving the sky its blue appearance.

The blue color of the sky can also be affected by other factors such as the presence of aerosols (small particles in the atmosphere) and the angle of the sun. For example, during 

In [71]:


# Generate vector embeddings for the given text prompt using the specified embedding model 'all-minilm'
# Embeddings are used to represent the semantic meaning of the text in a numerical format

ollama.embeddings(model="all-minilm", 
                  prompt="Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels")



{'embedding': [0.0479680672287941,
  0.11637094616889954,
  -0.24570561945438385,
  -0.04406300559639931,
  -0.24932530522346497,
  0.12218563258647919,
  -0.48447176814079285,
  -0.1940533071756363,
  0.27372273802757263,
  0.1956769824028015,
  0.26291224360466003,
  -0.31583428382873535,
  0.28280389308929443,
  0.03434046357870102,
  0.17188657820224762,
  -0.13632719218730927,
  0.03992735221982002,
  0.13770951330661774,
  -0.33948075771331787,
  0.3291258215904236,
  0.1720881313085556,
  0.08957856893539429,
  0.33579421043395996,
  0.1561996340751648,
  -0.11858808994293213,
  0.43885111808776855,
  -0.11902756989002228,
  0.11152736097574234,
  -0.012823819182813168,
  0.16673514246940613,
  -0.34044894576072693,
  0.019135579466819763,
  -0.110402412712574,
  -0.059086285531520844,
  -0.3229251801967621,
  0.10102441906929016,
  0.2166978120803833,
  0.74030601978302,
  0.3397805392742157,
  0.2394002228975296,
  0.11844772100448608,
  -0.1658744364976883,
  0.32743486762046

## Quick Start Guide for Weaviate Setup

**Weaviate** is an open-source, AI-native vector database designed to store and manage large-scale data. It provides advanced capabilities for AI-driven applications, making it easier to handle data ingestion, querying, and search operations.

1. **Install [Python Weaviate library](https://weaviate.io/developers/weaviate/client-libraries/python)**:

> ```bash
> pip install -U weaviate-client  # Install Weaviate client library (version 4.5.5)
> ```

#### Key Features of the Weaviate Client Library 

- **Data Ingestion:** Easily add and manage data within your Weaviate instance 
- **Querying:** Execute complex queries to retrieve relevant information 
- **Search Operations:** Perform semantic and vector-based searches for accurate data retrieval


In [72]:


# Import the Weaviate client library to interact with the Weaviate database
import weaviate

# Connect to an embedded Weaviate instance, enabling the Python client to interact with the local server at http://127.0.0.1:8079
# using HTTP for queries and gRPC (Google Remote Procedure Call) for faster data communication
# Start process ID path: /Users/briankaewell/.cache/weaviate-embedded/*
client = weaviate.connect_to_embedded()

# Check if the Weaviate instance is ready and print the connection status
print(client.is_ready())

# TO-DO
#try:
#    pass  # Do something with the client

#finally:
#    client.close()  # Ensure the connection is closed



{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2024-10-15T16:58:58-04:00"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2024-10-15T16:58:58-04:00"}
{"level":"info","msg":"No resource limits set, weaviate will use all available memory and CPU. To limit resources, set LIMIT_RESOURCES=true","time":"2024-10-15T16:58:58-04:00"}
{"level":"info","msg":"module offload-s3 is enabled","time":"2024-10-15T16:58:58-04:00"}
{"level":"info","msg":"open cluster service","servers":{"Embedded_at_8079":53686},"time":"2024-10-15T16:58:58-04:00"}
{"address":"192.168.0.249:53687","level":"info","msg":"starting cloud rpc server ...","time":"2024-10-15T16:58:58-04:00"}
{"level":"info","msg":"starting raft sub-system ...","time":"2024-10-15T16:5

True


{"action":"hnsw_prefill_cache_async","level":"info","msg":"not waiting for vector cache prefill, running in background","time":"2024-10-15T16:59:01-04:00","wait_for_cache_prefill":false}
{"level":"info","msg":"Completed loading shard docs_GpI2uZlxg3Io in 12.138625ms","time":"2024-10-15T16:59:01-04:00"}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-10-15T16:59:01-04:00","took":645083}
{"action":"bootstrap","level":"info","msg":"node reporting ready, node has probably recovered cluster from raft config. Exiting bootstrap process","time":"2024-10-15T16:59:02-04:00"}




<figure>
  <img src="https://weaviate.io/assets/images/rag-ollama-diagram-c71ba5c4e60629e70a2cf334a7716860.png" alt="rag_ollama" width="700" />
  <figcaption>Local Retrieval Augmented Generation (RAG) system with language models via Ollama</figcaption>
</figure>



# Ingest Data

In [73]:


# List contains individual pieces of information (documents) related to llamas, 
# which may be used for processing or data storage tasks
documents = [
  "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
  "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
  "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
  "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
  "Llamas are vegetarians and have very efficient digestive systems",
  "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old",
]



# Create Data Structure

In [74]:


# Import specific classes from Weaviate to work with data schema and configs for vector database
import weaviate.classes as wvc
from weaviate.classes.config import Property, DataType

# Define the name of the structure (collection)
collection_name = "docs"

# Check if the collection already exists
if client.collections.exists(collection_name):
    client.collections.delete(collection_name)

# Create a new collection with the specified name and define its structure properties
collection = client.collections.create(
    collection_name,
    properties=[
        Property(name="text", 
                 data_type=DataType.TEXT), # Name and data type of a single property for simple list of strings
    ],
)



{"action":"hnsw_prefill_cache_async","level":"info","msg":"not waiting for vector cache prefill, running in background","time":"2024-10-15T16:59:53-04:00","wait_for_cache_prefill":false}
{"level":"info","msg":"Created shard docs_RPmnTJ9v9FvR in 10.439ms","time":"2024-10-15T16:59:53-04:00"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"main","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2024-10-15T16:59:53-04:00","took":128750}


# Load Data

In [75]:


# Store each document in a vector embedding database
with collection.batch.dynamic() as batch:
  for i, d in enumerate(documents):
    # For each document, generate its vector embeddings
    response = ollama.embeddings(model="all-minilm", 
                                 prompt=d)
    embedding = response["embedding"]
    # Print text and its embedding
    # display({f'Document {i}': d, "Embedding": embedding}) 
    
    # Store data object with combined text and embedding in the vector embedding database
    batch.add_object(
        properties = {"text" : d},
        vector = embedding,
    )



In [76]:


# Query the collection to fetch objects, retrieving only the closet 
# result (limit=1), and include their vector representations in the response
collection.query.fetch_objects(limit=1, include_vector=True)



QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('18f07307-25b1-45bd-b601-d5ca4440cd98'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'text': "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels"}, references=None, vector={'default': [0.0479680672287941, 0.11637094616889954, -0.24570561945438385, -0.04406300559639931, -0.24932530522346497, 0.12218563258647919, -0.48447176814079285, -0.1940533071756363, 0.27372273802757263, 0.1956769824028015, 0.26291224360466003, -0.31583428382873535, 0.28280389308929443, 0.03434046357870102, 0.17188657820224762, -0.13632719218730927, 0.03992735221982002, 0.13770951330661774, -0.33948075771331787, 0.3291258215904236, 0.1720881313085556, 0.08957856893539429, 0.33579421043395996, 0.1561996340751648, -0.11858808994293213, 0.43885111808776855, -0.11902756989002228, 0.111

# Retrieve Context

In [79]:


# Define the prompt for which you want to find the most relevant document
prompt = "What animals are llamas related to?"

# Generate an embedding for the prompt using the specified model 'all-minilm'
response = ollama.embeddings(
  prompt=prompt,
  model="all-minilm"
)

# Query the collection to retrieve the MOST relevant document (limit=1) based on the prompt's embedding
results = collection.query.near_vector(near_vector=response["embedding"],
                             limit=1)

# Extract and display the text of the most relevant document
data = results.objects[0].properties['text']
print(data)



Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels


# Augment the Prompt

In [80]:


# Create a prompt template that combines the retrieved context (data) 
# with the original prompt to generate a comprehensive response
prompt_template = f"Using this data: {data}. Respond to this prompt: {prompt}"



# Generate a Response

In [86]:


# Generate a response from the augmented prompt template
output = ollama.generate(
  model="llama2",
  prompt=prompt_template,
)

# Print the generated response to the prompt
print(output['response'])




Llamas are related to several other animals within the camelid family. Specifically, llamas are most closely related to vicuñas and camels. All three of these animals belong to the Camelidae family and share many similarities in terms of their physical characteristics and behaviors.

Vicuñas are the wild ancestors of both llamas and alpacas, and they are found in the Andean regions of South America. Like llamas, vicuñas have a distinctive coat with long guard hairs and a soft undercoat, which makes them well-suited to the cold, dry climate of their native habitat.

Camels, on the other hand, are better adapted to the hot, arid environments of North Africa and the Middle East. Despite their differences in terms of size and coat type, all three camelids share a number of key characteristics, including their ability to go without water for long periods of time and their unique digestive system, which allows them to extract moisture from food even when there is little water available.

Ov

In [88]:
#client.close()

**The grep LISTEN filter displays only those ports that are actively listening for incoming connections**

> ``` 
> sudo lsof -i -P -n | grep LISTEN 
> 
> ollama    27370   IPv4 0xdfbc846e3834546d      0t0    TCP 127.0.0.1:11434 (LISTEN) 
>
> weaviate- 35147   IPv6 0x53a13eedd360c342      0t0    TCP *:60186 (LISTEN) 
> weaviate- 35147   IPv6 0x91f60d5cf4124a61      0t0    TCP *:6060 (LISTEN) 
> weaviate- 35147   IPv4 0x2acb043cb27d74a0      0t0    TCP 192.168.0.249:60188 (LISTEN) 
> weaviate- 35147   IPv6 0x41d21a8cc4514707      0t0    TCP *:60187 (LISTEN) 
> weaviate- 35147   IPv4 0xf61841243b061880      0t0    TCP 192.168.0.249:60187 (LISTEN) 
> weaviate- 35147   IPv6 0x515655760ad88e38      0t0    TCP *:50050 (LISTEN) 
> weaviate- 35147   IPv4 0x379147eef7c0a3e5      0t0    TCP 127.0.0.1:8079 (LISTEN)
> ```
