# Predibase + LlamaIndex: Building a RAG System
The following walkthrough shows you how to use Predibase-hosted LLMs with LlamaIndex to build a RAG system.

There are a few pieces required to build a RAG system:

1. **LLM provider**
* Predibase is the LLM provider here.
2. **Embedding Model**
* This model generates embeddings for the data that you are storing in your Vector Store.
* We will use local HuggingFace embedding model.
** Note: You need to have an OpenAI account with funds and an API token to use the OpenAI embedding model.

3. **Vector Store**
* This is where we store the embedded data that we want to retrieve later at query time
* We will use Pinecone for our Vector Store

## Getting Started

### Predibase
If you don't have a Predibase account already, sign up for creating an account
Once you've logged in, navigate to Settings > My profile
Generate a new API token
Copy the API token and paste in the first setup cell below

### Pinecone
If you don't have a Pinecone account already, they have a free tier available for trial
Navigate to the API Keys page
If you have not already, generate an API key

### Step 0: Setup

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%pip install llama-index-llms-predibase
%pip install llama-index-embeddings-huggingface
%pip install llama-index-embeddings-instructor
%pip install llama-index-vector-stores-pinecone

Collecting llama-index-llms-predibase
  Downloading llama_index_llms_predibase-0.1.7-py3-none-any.whl.metadata (645 bytes)
Collecting llama-index-core<0.11.0,>=0.10.1 (from llama-index-llms-predibase)
  Downloading llama_index_core-0.10.66-py3-none-any.whl.metadata (2.4 kB)
Collecting dataclasses-json (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-predibase)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting deprecated>=1.2.9.3 (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-predibase)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl.metadata (5.4 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-predibase)
  Downloading dirtyjson-1.0.8-py3-none-any.whl.metadata (11 kB)
Collecting httpx (from llama-index-core<0.11.0,>=0.10.1->llama-index-llms-predibase)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting openai>=1.1.0 (from llama-index-core<0.11.0,>=0.10.1->llama-ind

In [None]:
! pip3 install predibase --quiet
! pip3 install sentence-transformers --quiet
! pip3 install pinecone-client
! pip3 install python-dotenv
! pip3 install requests
! pip3 install llama-index>=0.9.31 pinecone-client>=3.0.0
! pip3 install sec_api
! pip3 install -U langchain

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/87.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.3/87.3 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.1/47.1 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.4/140.4 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m130.2/130.2 kB[0m [31m12.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.4/12.4 MB[0m [31m108.6 MB/s[0m eta [36m0:00:00[0m


In [None]:
import os

# import openai
import pinecone

from sec_api import ExtractorApi, QueryApi

from llama_index.core import ServiceContext, StorageContext, SimpleDirectoryReader, VectorStoreIndex, set_global_service_context
from llama_index.llms.predibase import PredibaseLLM
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.vector_stores.pinecone import PineconeVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter

os.environ["PREDIBASE_API_TOKEN"] = "pb_dFQZfsaUVT-l37VYFXC1Tg"

# HuggingFace token, required for accessing gated models (like LLaMa 3 8B Instruct)
hf_token = "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"
# SEC-API Key
sec_api_key = "b169c6359f8312f5965d2202861d0649b1f01991598550c189e0b7b8bbbd3748"

## Step 1: Setting up the Predibase LLM
There a few parameters to keep in mind while setting up your Predibase LLM:

1. model_name: This must be an LLM currently deployed in your Predibase environment.
* Any of models shown in the LLM query view dropdown are valid options.
* If you are running Predibase in a VPC, you'll need to deploy an LLM first.
2. temperature: Controls the randomness of your model responses.
A higher value will give the model more creative leeway
A lower value will give a more reproducible and consistent response
3. max_new_tokens: Controls the number of tokens the model can produce.

In [None]:
# Configure Predibase LLM
predibase_llm = PredibaseLLM(model_name="solar-1-mini-chat-240612", temperature=0.1, max_new_tokens=512)

## Step 2: Set up Embedding model
If you are using a local HuggingFace embedding model, you can use the following code to set up your embedding model:

In [None]:
# loads BAAI/bge-large-en-v1.5
hf_embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/779 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

Now with our embedding model set up, we will create the service context that will be used to query the LLM and embed our data/queries.

In [None]:
# Create a ServiceContext with our Predibase LLM and chosen embedding model
ctx = ServiceContext.from_defaults(llm=predibase_llm, embed_model=hf_embed_model)

# Set the Predibase LLM ServiceContext to the default
set_global_service_context(ctx)

  ctx = ServiceContext.from_defaults(llm=predibase_llm, embed_model=hf_embed_model)


## Step 3: Set up Vector Store and index
As mentioned before, we'll be using Pinecone for this example. Pinecone has a free tier that you can use to try out this example. You can also swap out any other Vector Store supported by LlamaIndex.

In [None]:
# Initialize pinecone and create index
# pinecone.init(api_key="a7990f53-24de-4a39-8ad8-85d0f979edd8", environment="gcp-starter")

from pinecone import Pinecone, PodSpec

#pinecone.Pinecone(
#   api_key="a7990f53-24de-4a39-8ad8-85d0f979edd8",
#   environment="gcp-starter",
#)

pc = Pinecone(api_key="a7990f53-24de-4a39-8ad8-85d0f979edd8")

If you are using the HuggingFace embedding model, you can use the following code to set up your Vector Store:

In [None]:
# HF Index - Compatible with local HF embedding model output dimensions
pc.create_index("predibase-hf-sec-10-k-chatbot", dimension=1024, metric="cosine", spec=PodSpec(environment="asia-northeast1-gcp", pod_type="s1.x1", pods=1))

Finally, we'll select our index, create the storage context, and index our documents!

In [None]:
# Extract Filings Function
def get_filings(ticker):
    global sec_api_key

    # Finding Recent Filings with QueryAPI
    queryApi = QueryApi(api_key=sec_api_key)
    query = {
      "query": f"ticker:{ticker} AND formType:\"10-K\"",
      "from": "0",
      "size": "1",
      "sort": [{ "filedAt": { "order": "desc" } }]
    }
    filings = queryApi.get_filings(query)

    # Getting 10-K URL
    filing_url = filings["filings"][0]["linkToFilingDetails"]

    # Extracting Text with ExtractorAPI
    extractorApi = ExtractorApi(api_key=sec_api_key)
    onea_text = extractorApi.get_section(filing_url, "1A", "text") # Section 1A - Risk Factors
    seven_text = extractorApi.get_section(filing_url, "7", "text") # Section 7 - Management’s Discussion and Analysis of Financial Condition and Results of Operations

    # Joining Texts
    combined_text = onea_text  + "\n\n" + seven_text

    return combined_text

In [None]:
# construct vector store and custom storage context
pinecone_index = pc.Index("predibase-hf-sec-10-k-chatbot")
pincone_vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
pinecone_storage_context = StorageContext.from_defaults(vector_store=pincone_vector_store)


# Prompt the user to input the stock ticker they want to analyze
# For this case, we selected AAPL, TSLA, by running this code block multiple times
ticker = input("What Ticker Would you Like to Analyze? ex. AAPL: ")

print("-----")
print("Getting Filing Data")
# Retrieve the filing data for the specified ticker
filing_data = get_filings(ticker)

print("-----")
print("Initializing Vector Database")
# Initialize a text splitter to divide the filing data into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,         # Maximum size of each chunk
    chunk_overlap = 500,       # Number of characters to overlap between chunks
    length_function = len,     # Function to determine the length of the chunks
    is_separator_regex = False # Whether the separator is a regex pattern
)
# Split the filing data into smaller, manageable chunks
split_data = str(text_splitter.create_documents([filing_data]))


# Here we create the index so that any query you make will pull the relevant context from your Vector Store.

from llama_index.core import Document, VectorStoreIndex

# text_list = [text1, text2, ...]
documents = [Document(text=t) for t in split_data]

# build index
index = VectorStoreIndex.from_documents(documents, storage_context=pinecone_storage_context)



# Load in the documents you want to index
# documents = split_data

What Ticker Would you Like to Analyze? ex. AAPL: TSLA
-----
Getting Filing Data
-----
Initializing Vector Database


Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/2048 [00:00<?, ?it/s]

Upserted vectors:   0%|          | 0/1826 [00:00<?, ?it/s]

## Step 4: Querying the LLM with RAG
Now that we've set up our index, we can ask questions over the documents and Predibase + LlamaIndex will search for the relevant context and provide a response to your question within said context.

In [None]:
# Setup query engine
predibase_query_engine = index.as_query_engine()

Now we can ask questions over our documents!

In [None]:
# response = predibase_query_engine.query("What are the risk that Tesla is currently facing? Explain in detail and provide how this risk could affect the stock market performance.")
response = predibase_query_engine.query("What are significant announcements of products for Tesla during fiscal year 2023?")
print(response)


During fiscal year 2023, Tesla made several significant announcements of products. Some of the key announcements include:

1. Tesla Bot: In August 2021, Tesla unveiled a prototype of its humanoid robot, called the Tesla Bot. The robot is designed to perform dangerous or repetitive tasks, and is powered by Tesla's Autopilot software.

2. Cybertruck: Tesla provided updates on the development of its electric pickup truck, the Cybertruck. The company announced that it had begun construction of a new factory in Austin, Texas, which will be dedicated to producing the Cybertruck.

3. Semi Truck: Tesla also provided updates on the development of its electric semi-truck. The company announced that it had begun testing prototypes of the Semi Truck and that it expected to begin production in 2023.

4. Autopilot and Full Self-Driving: Tesla continued to improve its Autopilot and Full Self-Driving features. The company announced that it had begun rolling out a new feature called "Traffic-Aware Cru

To see the response to your query, you can pass the response variable to a print statement. Otherwise, you can pass the response object around your system to finish setting up your RAG solution.