# RAG Document QA - Code Development

This notebook is the workspace for creating the code used in this project, which can help a user understand certain components of the program and verify their functionality. I've gone through and added markdown and comments to explain the development thought process, but I won't be adding docstrings or worrying about PEP8 standards in these functions until moving the code to modules.

## Create LLM Communication

Create an object to handle our communications with the LLM. We'll use GPT here - you'll need to have an access token saved to a txt file in order to run this.

In [3]:
import os

# define paths to use throughout this notebook

ACCESS_TOKEN_PATH = os.path.pardir + "/api_keys/openai.key"
SAVE_PATH = os.getcwd() + "/document_store/"

In [2]:
from openai import OpenAI

# check key path is valid
assert os.path.exists(ACCESS_TOKEN_PATH), "Access token key not found"

# create save dir if new run
if not os.path.exists(SAVE_PATH):
    os.mkdir(SAVE_PATH)

In [3]:
from openai import OpenAI
import tiktoken
import logging

# class for GPT communication; won't worry about documentation until moving code to modules

class GPTCommunicator():

    def __init__(
            self, api_key_path: str, model_name: str = "gpt-3.5-turbo",
        ):

        # init client with api key file
        with open(api_key_path) as f:
            self.client = OpenAI(api_key=f.readline().strip())
        
        # context window limits; found at https://platform.openai.com/docs/models
        model_max_tokens = { 
            #"gpt-3.5-turbo-instruct": 4096,
            "gpt-3.5-turbo": 16385,
            "gpt-4": 8192,
            "gpt-4-32k": 32768,
        }

        # check for valid model name input
        if model_name not in model_max_tokens.keys():
            raise ValueError(f"Invalid model name; valid args include: {model_max_tokens.keys()}")
        self.model_name = model_name

        # set model attributes
        self.max_prompt_tokens = model_max_tokens[model_name] -  250 # buffer for response tokens
        self.system_role = "You are a helpful AI assistant."
        self.total_tokens_used = 0
        
    def post_prompt(self, text):

        try:
            response = self.client.chat.completions.create(
                model = self.model_name,
                messages = [
                    {"role": "system", "content": str(self.system_role)},
                    {"role": "user", "content": str(text)}
                ]
            )
            self.last_response = response
            self.total_tokens_used += int(response.usage.total_tokens)

        except Exception as e:
            logging.error(f"Failed to post prompt: {e}")
            return None
        
        return response.choices[0].message.content
    
    def count_tokens(self, text):
        encoding = tiktoken.encoding_for_model(self.model_name)
        num_tokens = len(encoding.encode(text))

        return num_tokens



In [4]:
gpt = GPTCommunicator(ACCESS_TOKEN_PATH)

# test communication
response = gpt.post_prompt("Hello")
response

'Hello! How can I assist you today?'

In [5]:
gpt.total_tokens_used

28

In [6]:
gpt.last_response.usage

CompletionUsage(completion_tokens=9, prompt_tokens=19, total_tokens=28)

In [7]:
gpt.count_tokens(response)

9

In [8]:
gpt.post_prompt("Describe Europe in 3 sentences.")

'Europe is a continent located in the Northern Hemisphere, bordered by the Arctic Ocean to the north, the Atlantic Ocean to the west, the Mediterranean Sea to the south, and Asia to the east. It is known for its rich history, diverse cultures, and stunning landscapes, including mountains, forests, rivers, and coastlines. Europe is home to numerous iconic landmarks, such as the Eiffel Tower in Paris, the Colosseum in Rome, and the Acropolis in Athens.'

In [9]:
gpt.total_tokens_used

151

In [10]:
gpt.last_response.usage

CompletionUsage(completion_tokens=98, prompt_tokens=25, total_tokens=123)

Communication and token tracking is working properly.

## EDA

We need a set of text documents to develop our RAG solution. We'll use the wikitext dataset for this, which contains passages from wikipedia articles that we can use as our documents to QA over. 

There's a good chance GPT and other LLMs already have knowledge of the info in this dataset, so we may have to manipulate some of the information if we want to properly test our RAG system (more on this below).

In [11]:
from datasets import load_dataset

wikitext = load_dataset("wikitext", "wikitext-2-raw-v1")
wikitext

  from .autonotebook import tqdm as notebook_tqdm


DatasetDict({
    test: Dataset({
        features: ['text'],
        num_rows: 4358
    })
    train: Dataset({
        features: ['text'],
        num_rows: 36718
    })
    validation: Dataset({
        features: ['text'],
        num_rows: 3760
    })
})

In [12]:
wikitext["train"]["text"][:10]

['',
 ' = Valkyria Chronicles III = \n',
 '',
 ' Senjō no Valkyria 3 : Unrecorded Chronicles ( Japanese : 戦場のヴァルキュリア3 , lit . Valkyria of the Battlefield 3 ) , commonly referred to as Valkyria Chronicles III outside Japan , is a tactical role @-@ playing video game developed by Sega and Media.Vision for the PlayStation Portable . Released in January 2011 in Japan , it is the third game in the Valkyria series . Employing the same fusion of tactical and real @-@ time gameplay as its predecessors , the story runs parallel to the first game and follows the " Nameless " , a penal military unit serving the nation of Gallia during the Second Europan War who perform secret black operations and are pitted against the Imperial unit " Calamaty Raven " . \n',
 " The game began development in 2010 , carrying over a large portion of the work done on Valkyria Chronicles II . While it retained the standard features of the series , it also underwent multiple adjustments , such as making the game more f

**NOTE:**

The text is split into a list of strings. We want to combine the strings to form full articles as individual documents. Based on visual inspection, it seems a delimiter " = " on both sides of a text is used to mark titles, " = = " for headers and " = = = " for sub-headers. We can group the text based on the start of a new title.

In [13]:
import numpy as np 

# confirm " = = = " is the lowest category

np.max([int(t.count(" = ") / 2) for t in wikitext["train"]["text"]])

3

In [14]:
# define a function to classify string as title/header/content based on the delimiters we saw above

def classify_string_type(text):
    
    if text == '':
        return "empty"
    
    title_delimiter = " = "
    header_delimiter = " = = "
    subheader_delimiter = " = = = "

    def check_by_delimiter(t, delimiter):
        # when split by the right delimiter, text will be in the form: ['', text, '\n']
        t_split = t.split(delimiter)

        # for titles and headers, we can expect split == 3 and split[-1] == \n
        if len(t_split) == 3 and t_split[-1] == '\n':
            return True
        else:
            return False

    
    if check_by_delimiter(text, subheader_delimiter):
        return "subheader"
    
    elif check_by_delimiter(text, header_delimiter):
        return "header"
    
    elif check_by_delimiter(text, title_delimiter):
        return "title"
    
    else:
        return "content"

In [15]:
text = wikitext["train"]["text"][1]
print(text)
classify_string_type(text)

 = Valkyria Chronicles III = 



'title'

In [16]:
import pandas as pd

text_list = wikitext["train"]["text"]

df = pd.DataFrame()
df["text"] = text_list
df["text_type"] = list(map(lambda t: classify_string_type(t), text_list))
df.head(5)

Unnamed: 0,text,text_type
0,,empty
1,= Valkyria Chronicles III = \n,title
2,,empty
3,Senjō no Valkyria 3 : Unrecorded Chronicles (...,content
4,"The game began development in 2010 , carrying...",content


In [17]:
df.text_type.value_counts()

text_type
content      17870
empty        12951
header        2922
subheader     2346
title          629
Name: count, dtype: int64

In [18]:
title_idx = df.index[df['text_type']=="title"].tolist()
df.iloc[title_idx[0]:title_idx[1]+1]

Unnamed: 0,text,text_type
1,= Valkyria Chronicles III = \n,title
2,,empty
3,Senjō no Valkyria 3 : Unrecorded Chronicles (...,content
4,"The game began development in 2010 , carrying...",content
5,"It met with positive sales in Japan , and was...",content
6,,empty
7,= = Gameplay = = \n,header
8,,empty
9,"As with previous Valkyira Chronicles games , ...",content
10,"The game 's battle system , the BliTZ system ...",content


**NOTE**:

Looks like our classifying function is working properly; we can now split the list at the index of each title and group the lines into full passages.

In [19]:
passage = "\n".join(df.iloc[title_idx[0]:title_idx[1]]["text"])
print(passage[:3000])

 = Valkyria Chronicles III = 


 Senjō no Valkyria 3 : Unrecorded Chronicles ( Japanese : 戦場のヴァルキュリア3 , lit . Valkyria of the Battlefield 3 ) , commonly referred to as Valkyria Chronicles III outside Japan , is a tactical role @-@ playing video game developed by Sega and Media.Vision for the PlayStation Portable . Released in January 2011 in Japan , it is the third game in the Valkyria series . Employing the same fusion of tactical and real @-@ time gameplay as its predecessors , the story runs parallel to the first game and follows the " Nameless " , a penal military unit serving the nation of Gallia during the Second Europan War who perform secret black operations and are pitted against the Imperial unit " Calamaty Raven " . 

 The game began development in 2010 , carrying over a large portion of the work done on Valkyria Chronicles II . While it retained the standard features of the series , it also underwent multiple adjustments , such as making the game more forgiving for series n

In [20]:
print(passage[-3000:])

ation 4 that forms the beginning of a new series within the Valkyria franchise . 


 = = = Adaptations = = = 


 Valkyria Chronicles 3 was adapted into a two @-@ episode original video animation series in the same year of its release . Titled Senjō no Valkyria 3 : Taga Tame no Jūsō ( 戦場のヴァルキュリア３ 誰がための銃瘡 , lit . Valkyria of the Battlefield 3 : The Wound Taken for Someone 's Sake ) , it was originally released through PlayStation Network and Qriocity between April and May 2011 . The initially @-@ planned release and availability period needed to be extended due to a stoppage to PSN during the early summer of that year . It later released for DVD on June 29 and August 31 , 2011 , with separate " Black " and " Blue " editions being available for purchase . The anime is set during the latter half of Valkyria Chronicles III , detailing a mission by the Nameless against their Imperial rivals Calamity Raven . The anime was first announced in November 2010 . It was developed by A @-@ 1 Pictures

# Data preprocessing

Transform list of single text lines into list of passages. These passages will be saved and used as documents for our vector store.

In [21]:
from collections import Counter

def transform_into_passages(text_list):

    text_type = list(map(lambda t: classify_string_type(t), text_list))

    type_counts = Counter(text_type) #dict storing counts of titles, headers, etc.

    title_idx = np.array([i for i,v in enumerate(text_type) if v == "title"])
    title_idx = np.append(title_idx, len(text_list)) #append for last passage
    title_idx_pairs = np.column_stack((title_idx[:-1], title_idx[1:]))

    passages = []

    for idx_pair in title_idx_pairs:
        start_i, end_i = idx_pair[0], idx_pair[1]
        passage = "\n".join(text_list[start_i:end_i])
        passages.append(passage)

    assert len(passages) == type_counts["title"], "Passage count should match number of titles"

    return passages

passages = transform_into_passages(text_list)
print(f"{passages[0][:100]} ... {passages[0][-100:]}")

 = Valkyria Chronicles III = 


 Senjō no Valkyria 3 : Unrecorded Chronicles ( Japanese : 戦場のヴァルキュリア ... ustrated by Mizuki Tsuge and eventually released in a single volume by Kadokawa Shoten in 2012 . 





In [22]:
print(f"{passages[1][:100]} ... {passages[1][-100:]}")

 = Tower Building of the Little Rock Arsenal = 


 The Tower Building of the Little Rock Arsenal , a ... emen and servicewomen of the United States and commemorate the birthplace of Douglas MacArthur . 





In [23]:
# verify against text_list

print(passages[-1][-1000:])

y , which had not yet been performed in public . He became very attached to the bird and arranged an elaborate funeral for it when it died three years later . It has been suggested that his A Musical Joke ( K. 522 ) might be written in the comical , inconsequential style of a starling 's vocalisation . Other people who have owned common starlings report how adept they are at picking up phrases and expressions . The words have no meaning for the starling , so they often mix them up or use them on what to humans are inappropriate occasions in their songs . Their ability at mimicry is so great that strangers have looked in vain for the human they think they have just heard speak . 

 Common starlings are trapped for food in some Mediterranean countries . The meat is tough and of low quality , so it is casseroled or made into pâté . One recipe said it should be stewed " until tender , however long that may be " . Even when correctly prepared , it may still be seen as an acquired taste . 



In [24]:
text_list[-10:]

[' Western Australia banned the import of common starlings in 1895 . New flocks arriving from the east are routinely shot , while the less cautious juveniles are trapped and netted . New methods are being developed , such as tagging one bird and tracking it back to establish where other members of the flock roost . Another technique is to analyse the DNA of Australian common starling populations to track where the migration from eastern to western Australia is occurring so that better preventive strategies can be used . By 2009 , only 300 common starlings were left in Western Australia , and the state committed a further A $ 400 @,@ 000 in that year to continue the eradication programme . \n',
 ' In the United States , common starlings are exempt from the Migratory Bird Treaty Act , which prohibits the taking or killing of migratory birds . No permit is required to remove nests and eggs or kill juveniles or adults . Research was undertaken in 1966 to identify a suitable avicide that wo

**NOTE:**

Now that we have our passages to be used as documents, we should inspect the token counts and consider filtering some to limit our token usage.

In [25]:
gpt.count_tokens(passages[0])

4486

In [26]:
passage_token_counts = list(map(lambda p: gpt.count_tokens(p), passages))
passage_token_counts[:4]

[4486, 4638, 3913, 832]

In [27]:
print("Passage token counts\n")
print(f"MEAN: {np.mean(passage_token_counts)}")
print(f"STD: {np.std(passage_token_counts)}")
print(f"MIN: {np.min(passage_token_counts)}")
print(f"MAX: {np.max(passage_token_counts)}")

Passage token counts

MEAN: 3875.3465818759937
STD: 3161.958046450358
MIN: 10
MAX: 20498


In [28]:
# let's limit our token usage for now

limit_n_tokens = 5000

print(f"{len([n for n in passage_token_counts if n > limit_n_tokens])} / {len(passages)} passages greater than limit")

169 / 629 passages greater than limit


In [29]:
# elimite passages greater than our limit

valid_idx = [i for i,v in enumerate(passage_token_counts) if v <= limit_n_tokens]
valid_passages = [v for i,v in enumerate(passages) if i in valid_idx]

# double check these passages are below the limit
print(f"largest passage after trim is {np.max(list(map(lambda p: gpt.count_tokens(p), valid_passages)))} tokens")

largest passage after trim is 4997 tokens


In [30]:
len(valid_passages)

460

**NOTE:**

Before we continue, we should test to see if GPT is already able to answer questions about these passages from its internal knowledge base

In [31]:
valid_passages[112][:500]

" = You Only Live Twice ( film ) = \n\n\n You Only Live Twice ( 1967 ) is the fifth spy film in the James Bond series , and the fifth to star Sean Connery as the fictional MI6 agent James Bond . The film 's screenplay was written by Roald Dahl , and loosely based on Ian Fleming 's 1964 novel of the same name . It is the first James Bond film to discard most of Fleming 's plot , using only a few characters and locations from the book as the background for an entirely new story . \n\n In the film , Bond"

In [32]:
gpt.post_prompt("What is the fifth film in the James Bond series?")

'The fifth film in the James Bond series is "You Only Live Twice," released in 1967.'

In [33]:
valid_passages[152][:500]

' = Trials and Tribble @-@ ations = \n\n\n " Trials and Tribble @-@ ations " is the 104th episode of the American science fiction television series Star Trek : Deep Space Nine , the sixth episode of the fifth season . It was written as a tribute to the original series of Star Trek , in the 30th anniversary year of the show ; sister series Voyager produced a similar episode , " Flashback " . The idea for the episode was suggested by René Echevarria , and Ronald D. Moore suggested the link to " The Tr'

In [34]:
gpt.post_prompt("What show is the episode 'Trials and Tribble' from?")

'The episode "Trials and Tribble-ations" is from the TV show "Star Trek: Deep Space Nine." It is the 6th episode of the 5th season.'

**NOTE:**

GPT's internal knowledge base already has information about this dataset. Let's manipulate some of the information in our dataset so we can properly test our RAG system.

In [35]:
valid_passages[112] = valid_passages[112].replace("You Only Live Twice", "No, YOLO")
valid_passages[152] = valid_passages[152].replace("Star Trek", "I'm More of a Star Wars Fan")

In [36]:
write_df = pd.DataFrame()
write_df["text"] = valid_passages
write_df.to_csv(SAVE_PATH+"passages.csv")

# Create Vectorstore for RAG

Now we need to create our database of documents to retrieve from. We could do this by embedding each document, storing the embeddings and text in a file and perform cosine similarity on the embeddings to find top matches against a query. However, it's better to use a vectorstore for this, as they're optimized for this task and very fast. We'll use FAISS through the langchain library for our vectorstore, but we can swap this out for another one if we want.

In [4]:
from langchain_community.document_loaders import CSVLoader
import random

# use langchain csv loader to load our docs

def load_csv_file(file_path, shuffle=True, seed=None):
    loader = CSVLoader(
        file_path=file_path, encoding="utf-8", csv_args={"delimiter": ","}
    )
    csv_data = loader.load()
    if shuffle:
        random.seed(seed)
        random.shuffle(csv_data)

    return csv_data

# passages_csv = load_csv_file(SAVE_PATH+"passages.csv", seed=1)
# passages_csv[:3]

[Document(page_content=': 289\ntext: = Freakum Dress = \n\n\n " Freakum Dress " is a song by American singer and songwriter Beyoncé from her second solo studio album B \'Day ( 2006 ) . It was written by Beyoncé , Rich Harrison , and Makeba Riddick . " Freakum Dress " is similar to songs that Destiny \'s Child used to record in the 1990s . The song is complete with whistles , cymbal dominated scatter rhythms and a beat , which is augmented by hi @-@ hats and plinking keyboard pulses . In the song , Beyoncé advises women who have partners with straying eyes to put on alluring dresses and grind on other guys in dance clubs , to regain their affections . \n\n " Freakum Dress " was generally well received by music critics who complimented Beyoncé \'s vocals as well as the assertiveness with which she delivers the lyrics . Many of them also noted that the beat of song melds very well with the vocal arrangement and the instruments used . The music video for the song was directed by Ray Kay , 

In [38]:
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_core.vectorstores import VectorStoreRetriever
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import FAISS

from tqdm import tqdm
import logging
import sys
import os

logging.basicConfig(
    stream=sys.stdout,
    level=logging.INFO, 
    format='%(asctime)s - %(levelname)s - %(message)s'
)

# define class to create, load, and retrieve from vectorstore holding our data

class LangchainVectorstore:
    def __init__(self, embedding_type, processed_csv_path, verbose_info=True):
        self.data = load_csv_file(processed_csv_path)
        self.embedding_type = embedding_type
        self.verbose = verbose_info
        self.vectorstore = None # invoke create/load_local_vectorstore() to set
        self.retriever = None # invoke create_retriever to set

        if self.verbose:
            logging.info("Vectorstore and retriever must be set using the class methods.")

    def chunk_data(self, chunk_size: int = 2048, chunk_overlap: int = 50):
        text_splitter = CharacterTextSplitter(
                    chunk_size=chunk_size, chunk_overlap=chunk_overlap
                )
        self.data = text_splitter.split_documents(self.data)
        if self.verbose:
            logging.info(f"Data chunked to size {chunk_size}")

    def create_local_vectorstore(self, save_path):

        if os.path.exists(save_path):
            overwrite_saved = input(f"Vectorstore already found at {save_path}; overwrite? [y/n]: ").lower()
            if overwrite_saved not in ["y", "n"]:
                raise ValueError("Invalid input; try again with 'y' for yes or 'n' for no.")
            
            if overwrite_saved == "n":
                if self.verbose:
                    logging.info("Keeping saved vectorstore; aborting... ")
                return None # break out of function

        logging.info(f"Creating a new local vectorstore at: {save_path}")
        try:
            # no built in progress bar from their API; using this workaround shared at: https://stackoverflow.com/questions/77836174/how-can-i-add-a-progress-bar-status-when-creating-a-vector-store-with-langchain
            with tqdm(total=len(self.data), desc="Processing documents") as progress_bar:
                for d in self.data:
                    if self.vectorstore:
                        self.vectorstore.add_documents([d])
                    else: # init 
                        self.vectorstore = FAISS.from_documents([d], self.embedding_type)
                    progress_bar.update(1)

            #self.vectorstore = FAISS.from_documents(self.data, self.embedding_type)
            # above function is equivalent to embedding each piece of text, zipping text and embeddings as pairs, and creating index from these pairs
            self.vectorstore.save_local(save_path)
            if self.verbose:
                logging.info(f"Vectorstore successfully set and saved to {save_path}")
        
        except Exception as e:
            logging.error(f"Failed to create vectorstore: {e}")
        
        finally:
            return None # not needed, but including a final return since we used one for a conditional abort
            

    def load_local_vectorstore(self, load_path):
        if not os.path.exists(load_path):
            raise ValueError(f"Failed to find a saved vectorstore at {load_path}; please ensure save_path points to correct location.")

        try:
            self.vectorstore = FAISS.load_local(load_path, self.embedding_type, allow_dangerous_deserialization=True)

        except Exception as e:
            logging.error(f"Failed to load vectorstore: {e}")

    def create_retriever(self, search_type: str = "similarity", search_kwargs: dict = {}):
        if self.vectorstore is None:
            raise ValueError("Vectorstore not set; create or load a vectorstore using class method first.")

        search_types = ["similarity", "mmr", "similarity_score_threshold"]
        if search_type not in search_types:
            raise ValueError(f"Invalid arg for search_type; valid args include: {search_types}")
        
        self.retriever = self.vectorstore.as_retriever(
            search_type=search_type,
            search_kwargs=search_kwargs,
        )
        if self.verbose:
            logging.info(f"Retriever successfully set")

    def retrieve_conext(self, query: str):
        if self.retriever is None:
            raise ValueError("Retriver not set; create a retriever using class method first")
        
        retrieved_docs = self.retriever.get_relevant_documents(query)

        return [retrieved_docs[i].page_content for i in range(len(retrieved_docs))]
    


I've created a few methods for this vectorstore class, which includes:
 - creating the vectorstore and saving locally
 - loading the vectorstore locally, if already saved
 - creating the retriever to perform a similarity search over the database
 - using the retriever to return top documents related to a query
 - chunking the data to a specified token size 

We filtered out passaged larger than our desired token limit before, so we don't need to chunk now, but having the functionality will be useful in the future.

In [39]:
vs = LangchainVectorstore(
    embedding_type = HuggingFaceEmbeddings(),
    processed_csv_path = SAVE_PATH+"passages.csv",
    verbose_info = True
    )

2024-03-27 20:32:09,476 - INFO - Load pretrained SentenceTransformer: sentence-transformers/all-mpnet-base-v2
2024-03-27 20:32:10,169 - INFO - Use pytorch device: cpu
2024-03-27 20:32:10,195 - INFO - Vectorstore and retriever must be set using the class methods.


In [40]:
#vs.chunk_data()
vs.create_local_vectorstore(save_path=SAVE_PATH+"faiss_index")
#vs.load_local_vectorstore(load_path=SAVE_PATH+"faiss_index")

2024-03-27 20:32:10,200 - INFO - Creating a new local vectorstore at: /Users/dev/projects/Rag_DocumentQA/document_store/faiss_index


Processing documents:   0%|          | 0/460 [00:00<?, ?it/s]

2024-03-27 20:32:10,743 - INFO - Loading faiss.
2024-03-27 20:32:10,757 - INFO - Successfully loaded faiss.


Processing documents: 100%|██████████| 460/460 [01:23<00:00,  5.53it/s]

2024-03-27 20:33:33,328 - INFO - Vectorstore successfully set and saved to /Users/dev/projects/Rag_DocumentQA/document_store/faiss_index





**NOTE:**

When we create the retriever, we can specify how many documents to return with the "k" search kwarg.

In [41]:
vs.create_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 5
    }
)

2024-03-27 20:33:33,335 - INFO - Retriever successfully set


Let's test our RAG search.

In [42]:
query = "What is the fifth film in the James Bond series?"

# use similarity search on vectorstore
top_context = vs.retrieve_conext(query)
print(top_context[0][:1000])

: 112
text: = No, YOLO ( film ) = 


 No, YOLO ( 1967 ) is the fifth spy film in the James Bond series , and the fifth to star Sean Connery as the fictional MI6 agent James Bond . The film 's screenplay was written by Roald Dahl , and loosely based on Ian Fleming 's 1964 novel of the same name . It is the first James Bond film to discard most of Fleming 's plot , using only a few characters and locations from the book as the background for an entirely new story . 

 In the film , Bond is dispatched to Japan after American and Soviet manned spacecraft disappear mysteriously in orbit . With each nation blaming the other amidst the Cold War , Bond travels secretly to a remote Japanese island in order to find the perpetrators and comes face to face with Ernst Stavro Blofeld , the head of SPECTRE . The film reveals the appearance of Blofeld , who was previously a partially unseen character . SPECTRE is extorting the government of an unnamed Asian power , implied to be the People 's Republic

In [43]:
len(top_context)

5

**NOTE:**

Nice, looks like our retrieval is working as expected and returned the passage we manipulated. Before we perform a RAG query, we need to consider the LLM's token limit for a single prompt; we may need to trim some of the retrieved context.

In [44]:
all_context = "\n\n".join(top_context)

prompt = all_context + "\n\nBased on the above context, answer the follow question:\n" + query

In [45]:
gpt.count_tokens(prompt)

18593

In [46]:
gpt.max_prompt_tokens


16135

In [47]:
def truncate_text(text, gpt, token_limit):
    token_count = 0
    truncated_text = ""
    try:
        for line in text.split("."):
            if line.strip() in [""]:
                continue

            line = line + "."
            token_count += gpt.count_tokens(line)
            if token_count >= token_limit:
                break

            truncated_text += line

        truncated_text += "\n"

    except Exception as e:
        logging.error(f"Failed to truncate text: {e}")
        return None
    
    return truncated_text

# test function

print(f"Before: {gpt.count_tokens(passages[0])}")
print(f"{passage[:100]} ... {passage[-100:]}")

truncated_passage = truncate_text(passages[0], gpt, token_limit=1024)
print(f"After: {gpt.count_tokens(truncated_passage)}")
print(f"{truncated_passage[:100]} ... {truncated_passage[-100:]}")

Before: 4486
 = Valkyria Chronicles III = 


 Senjō no Valkyria 3 : Unrecorded Chronicles ( Japanese : 戦場のヴァルキュリア ... ustrated by Mizuki Tsuge and eventually released in a single volume by Kadokawa Shoten in 2012 . 



After: 1009
 = Valkyria Chronicles III = 


 Senjō no Valkyria 3 : Unrecorded Chronicles ( Japanese : 戦場のヴァルキュリア ... signed weapon . Changing class does not greatly affect the stats gained while in a previous class .



In [48]:
# we can build this check into the gpt class to truncate any prompts over the limit before sending

all_context = "\n\n".join(top_context)
query = "\n\nBased on the above context, answer the follow question:\n" + query

buffer_token_space = gpt.count_tokens(query)
token_limit = gpt.max_prompt_tokens - buffer_token_space

if gpt.count_tokens(all_context) > token_limit:
    prompt = truncate_text(all_context, gpt, token_limit) + query
else:
    prompt = all_context + query
    
gpt.count_tokens(prompt)

16113

**NOTE:**

We will need to adjust the system prompt to ensure GPT only uses the provided document information.

In [49]:
gpt.system_role = "You will answer user queries based on the context provided. You will limit your answers ONLY to the information provided and will NOT provide any external information. If the information needed to answer the query is not present in the input, or no additional context is provided, you will reply with 'I can't answer that based on the provided documents'.'"

In [50]:
gpt.post_prompt(prompt)

2024-03-27 20:33:35,364 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


"The fifth film in the James Bond series is 'No, YOLO'."

In [51]:
query = "What American science fiction television series is the episode 'Trials and Tribble' from?"

top_context = vs.retrieve_conext(query)
all_context = "\n\n".join(top_context)
query = "\n\nBased on the above context, answer the follow question:\n" + query

buffer_token_space = gpt.count_tokens(query)
token_limit = gpt.max_prompt_tokens - buffer_token_space

if gpt.count_tokens(all_context) > token_limit:
    prompt = truncate_text(all_context, gpt, token_limit) + query
else:
    prompt = all_context + query

gpt.post_prompt(prompt)


2024-03-27 20:33:37,412 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


"The American science fiction television series that the episode 'Trials and Tribble' is from is 'I'm More of a Star Wars Fan: Deep Space Nine.'"

In [53]:
# check response when not providing any of our context

gpt.post_prompt(query)

2024-03-27 20:37:26,526 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


'"I can\'t answer that based on the provided documents."'

**NOTE:**

Success! Looks like our RAG system is functioning properly and we've been able to limit GPT to the documents we manipulated. We now have all the pieces we need for our program and can wrap all of this up into a project code base.