# Knowledge Base RAG

Code for knowledge base RAG built with Pinecone, Canopy and OpenAI. The goal is to use the code in this notebook directly in the research assistant project without much refactoring.

The knowledge base RAG will support the following functionalities:
- Upload: Upload a document in the canopy format to the Pinecone index.
- Chat: Chat with the knowledge base to get answers to questions.

# Environment Setup

In [None]:
import os

os.environ["PINECONE_API_KEY"] = ...
os.environ["OPENAI_API_KEY"] = ...

# Uploading documents

The documents are uploaded on demand to the Pinecone index. 

The document should have the following attributes:
- id: unique identifier for the document
- text: the text of the document, in utf-8 encoding.
- source: the source of the document, can be any string, or null. This will be used as a reference in the generated context.
- metadata: optional metadata for the document

## Init Tokenizer

Many of Canopy's components are using tokenization, which is a process that splits text into tokens - basic units of text (like word or sub-words) that are used for processing. Therefore, Canopy uses a singleton Tokenizer object which needs to be initialized once.

In [3]:
from canopy.tokenizer import Tokenizer
Tokenizer.initialize()

## Create Pinecone index

In [8]:
from canopy.knowledge_base import KnowledgeBase
from canopy.knowledge_base import list_canopy_indexes

# canopy prefixes index names with "canopy--"
INDEX_NAME = "knowledge-base"

kb = KnowledgeBase(index_name=INDEX_NAME)

if not any(name.endswith(INDEX_NAME) for name in list_canopy_indexes()):
    kb.create_canopy_index()

kb.connect()

## Upload document to index

I will upload the paper "Mixtral of Experts" to the Pinecone index. The paper is available in the `example-document.json` file.

In [18]:

import json
document = json.load(open('example-document.json'))
document.keys()

dict_keys(['id', 'source', 'text', 'metadata'])

Convert the json document to a canopy document object.

In [19]:
from canopy.models.data_models import Document

document = Document(**document)
document.id

'2401.04088'

Upload the document object to the index

In [24]:
def upload(document):
    # upload a single document to the knowledge base
    return kb.upsert([document])

In [25]:
upload(document)

# Chat with the knowledge base

In [27]:
from canopy.chat_engine import ChatEngine
from canopy.context_engine import ContextEngine

context_engine = ContextEngine(kb)
chat_engine = ChatEngine(context_engine)

In [28]:
from canopy.models.data_models import UserMessage, AssistantMessage

def chat(new_message, history):
    messages = history + [UserMessage(content=new_message)]
    response = chat_engine.chat(messages)
    assistant_response = response.choices[0].message.content
    return assistant_response, messages + [AssistantMessage(content=assistant_response)]

In [29]:
history = []

response, history = chat("What are main ideas presented in the paper \"Mixtral of Expert\"?", history)
print(response)

The main ideas presented in the paper "Mixtral of Experts" include introducing Mixtral as a sparse mixture-of-experts model with a fully dense context length of 32k tokens. This model architecture has a decoder-only design with a sparse mixture-of-experts network that uses a subset of its parameters for every token, allowing for faster inference speeds at low batch sizes and higher throughput at large batch sizes. Mixtral utilizes a mechanism where a router network chooses two expert groups to process each token, which enables the model to control cost and latency effectively. Additionally, Mixtral is shown to outperform other models like Llama 2 70B and GPT-3.5 on various benchmarks, especially excelling in mathematics, code generation, and multilingual tasks. The model also uses significantly fewer active parameters per token while maintaining high performance compared to models with a higher number of parameters per token. Furthermore, Mixtral was pretrained with multilingual data a