<a href="https://colab.research.google.com/github/BUDparty/AImodel/blob/main/Retrieval_Augmented_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predibase + LlamaIndex: Building a RAG System
The following walkthrough shows you how to use Predibase-hosted LLMs with LlamaIndex to build a RAG system.

There are a few pieces required to build a RAG system:

1. **LLM provider**
* Predibase is the LLM provider here. We can serve base LLMs and/or fine-tuned LLMs for whatever generative task you have.
2. **Embedding Model**
* This model generates embeddings for the data that you are storing in your Vector Store.
* In this example you have the option of using a local HuggingFace embedding model, or OpenAI's embedding model.
** Note: You need to have an OpenAI account with funds and an API token to use the OpenAI embedding model.
* In the near future, you will be able to train and deploy your own embedding models using Predibase
3. **Vector Store**
* This is where we store the embedded data that we want to retrieve later at query time
* In this example we will use Pinecone for our Vector Store

## Getting Started

### Predibase
If you don't have a Predibase account already, sign up for a free trial here
Once you've logged in, navigate to Settings > My profile
Generate a new API token
Copy the API token and paste in the first setup cell below

### OpenAI (Optional)
If you don't have an OpenAI account already, sign up here
Navigate to OpenAI's API keys page
If you have not already, generate an API key
Copy the API key and paste in the second setup cell below

### Pinecone
If you don't have a Pinecone account already, they have a free tier available for trial
Navigate to the API Keys page
If you have not already, generate an API key

### Step 0: Setup

In [None]:
import os

import openai
import pinecone

from llama_index import ServiceContext, StorageContext, SimpleDirectoryReader, VectorStoreIndex, set_global_service_context
from llama_index.llms import PredibaseLLM
from llama_index.embeddings import HuggingFaceEmbedding, OpenAIEmbedding
from llama_index.vector_stores import PineconeVectorStore

os.environ["PREDIBASE_API_TOKEN"] = "YOUR API TOKEN HERE"

The following is only required if you'll be using an OpenAI embedding model.



In [None]:
os.environ["OPENAI_API_KEY"] = "YOUR API TOKEN HERE"
openai.api_key = os.environ["OPENAI_API_KEY"]

## Step 1: Setting up the Predibase LLM
There a few parameters to keep in mind while setting up your Predibase LLM:

1. model_name: This must be an LLM currently deployed in your Predibase environment.
* Any of models shown in the LLM query view dropdown are valid options.
* If you are running Predibase in a VPC, you'll need to deploy an LLM first.
2. temperature: Controls the randomness of your model responses.
A higher value will give the model more creative leeway
A lower value will give a more reproducible and consistent response
3. max_new_tokens: Controls the number of tokens the model can produce.

In [None]:
# Configure Predibase LLM
predibase_llm = PredibaseLLM(model_name="llama-2-13b-chat", temperature=0.1, max_new_tokens=512)

## Step 2: Set up Embedding model
If you are using a local HuggingFace embedding model, you can use the following code to set up your embedding model:

In [None]:
# loads BAAI/bge-small-en-v1.5
hf_embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

If you are using OpenAI's embedding model, you can use the following code to set up your embedding model:



In [None]:
# loads text-embedding-ada-002 OpenAI embedding model - uncomment and run for the OpenAI option
openai_embed_model = OpenAIEmbedding()

Now with our embedding model set up, we will create the service context that will be used to query the LLM and embed our data/queries.

In [None]:
# Create a ServiceContext with our Predibase LLM and chosen embedding model
ctx = ServiceContext.from_defaults(llm=predibase_llm, embed_model=hf_embed_model)

# Set the Predibase LLM ServiceContext to the default
set_global_service_context(ctx)

## Step 3: Set up Vector Store
As mentioned before, we'll be using Pinecone for this example. Pinecone has a free tier that you can use to try out this example. You can also swap out any other Vector Store supported by LlamaIndex.

In [None]:
# Initialize pinecone and create index
pinecone.init(api_key="YOUR API TOKEN HERE", environment="gcp-starter")

If you are using the HuggingFace embedding model, you can use the following code to set up your Vector Store:

In [None]:
# HF Index - Compatible with local HF embedding model output dimensions
pinecone.create_index("predibase-demo-hf", dimension=384, metric="euclidean", pod_type="p1")

If you are using the OpenAI embedding model, you can use the following code to set up your Vector Store:

Note: You need to have OpenAI set up and configured for this option. If you do not have an OpenAI API key, we recommend you go with the HuggingFace Index option above.

In [None]:
# OpenAI Index - Compatible with OpenAI embedding model (text-embedding-ada-002) output dimensions
pinecone.create_index("predibase-demo-openai", dimension=1536, metric="euclidean", pod_type="p1")

Finally, we'll select our index, create the storage context, and index our documents!

In [None]:
# construct vector store and custom storage context
pincone_vector_store = PineconeVectorStore(pinecone.Index("predibase-demo-hf"))
pinecone_storage_context = StorageContext.from_defaults(vector_store=pincone_vector_store)

# Load in the documents you want to index
documents = SimpleDirectoryReader("/Users/connor/Documents/Projects/datasets/huffington_post_pdfs/").load_data()

## Step 4: Set up index
Here we create the index so that any query you make will pull the relevant context from your Vector Store.

In [None]:
index = VectorStoreIndex.from_documents(documents, storage_context=pinecone_storage_context)

## Step 5: Querying the LLM with RAG
Now that we've set up our index, we can ask questions over the documents and Predibase + LlamaIndex will search for the relevant context and provide a response to your question within said context.

In [None]:
# Setup query engine
predibase_query_engine = index.as_query_engine()

Now we can ask questions over our documents!

In [None]:
response = predibase_query_engine.query("INSERT QUERY HERE")

To see the response to your query, you can pass the response variable to a print statement. Otherwise, you can pass the response object around your system to finish setting up your RAG solution.