# Backblaze B2 Retrieval-Augmented Generation (RAG) Demo

Retrieval-Augmented Generation (RAG) allows you to build on a foundation model, that is, an off-the-shelf large language model (LLM), adding custom context that the model can use in interacting with a user. You can use RAG to implement chatbots that use your own proprietary data to answer questions, without that data leaking to the internet. 

This notebook walks you through loading PDF files from [Backblaze B2 Cloud Object Storage](https://www.backblaze.com/cloud-storage) into a [LangChain](https://python.langchain.com/v0.2/docs/introduction/) RAG app, then building a chatbot that can answer questions relating to the content of those PDF files. You'll use an open-source language model that you run locally, rather than an online API, ensuring that your data stays confidential.

The code is based on the LangChain tutorial, [Build a Local RAG Application](https://python.langchain.com/v0.2/docs/tutorials/local_rag/).

## Install Dependencies

First, install the required Python packages. You will need to restart the Jupyter kernel before you can use newly-installed packages. You can uncomment the second line, or manually do so.

In [1]:
%pip install --upgrade --quiet -r requirements.txt

# Uncomment this line to restart the kernel so that it uses the new modules
# get_ipython().kernel.do_shutdown(restart=True)

Note: you may need to restart the kernel to use updated packages.


## Prerequisites

You need a Backblaze B2 Account, Bucket and Application Key, and some PDF files. Follow these instructions, as necessary:

* [Create a Backblaze B2 Account](https://www.backblaze.com/sign-up/cloud-storage).
* [Create a Backblaze B2 Bucket](https://www.backblaze.com/docs/cloud-storage-create-and-manage-buckets).
* [Create an Application Key](https://www.backblaze.com/docs/cloud-storage-create-and-manage-app-keys#create-an-app-key) with access to the bucket you wish to use.

Be sure to copy the application key as soon as you create it, as you will not be able to retrieve it later!

## Upload PDF Files to Your Bucket

You can use the Backblaze web UI, or any B2 or S3-compatible file management tool to [upload PDF files to your bucket](https://www.backblaze.com/docs/cloud-storage-upload-and-manage-files). It's useful to organize files by prefix (analogous to a folder or directory in a traditional filesystem); this example assumes the PDFs have the prefix `pdfs/` within the bucket.

If you don't have any suitable PDF files to hand, you can [download a PDF of the Backblaze B2 documentation](https://metadaddy-langchain-demo.s3.us-west-004.backblazeb2.com/pdfs/documentation.pdf) and upload it to your own bucket. 

## Configuration

Since Backblaze B2 has an S3-compatible API, this notebook uses LangChain's [`S3FileLoader`](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.s3_file.S3FileLoader.html) and the [`s3fs`](https://s3fs.readthedocs.io/en/latest/) module to interact with files in Backblaze B2, as well as the [AWS SDK for Python, also known as Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html). Both `S3FileLoader` and `s3fs` use Boto3 under the covers, so you need simply configure the latter so that all of the tools can access your Backblaze B2 Bucket. The most straightforward way to do so in this context is via environment variables.

Note: you should never, *ever* put credentials in your code, including Jupyter notebooks! This example uses `python-dotenv` to load configuration from a `.env` file into environment variables for use by `S3FileLoader`. This repo includes a template file, `.env.template`. Copy it to `.env`, then edit it as follows:

```dotenv
AWS_ACCESS_KEY_ID='<Your Backblaze application key ID>'
AWS_SECRET_ACCESS_KEY='<Your Backblaze application key>'
AWS_ENDPOINT_URL='<Your bucket endpoint, prefixed with https://, e.g., https://s3.us-west-004.backblazeb2.com >'
```

When you're done, `.env` should look like this:

```dotenv
AWS_ACCESS_KEY_ID='004qlekmvpwemrt000000009e'
AWS_SECRET_ACCESS_KEY='K004JEKEUTGLKEJFKLRJHTKLVCNWURM'
AWS_ENDPOINT_URL='https://s3.us-west-004.backblazeb2.com'
```

Now you can load the configuration into the environment:

In [2]:
from dotenv import load_dotenv

if load_dotenv():
    print('Loaded environment variables from .env')
else:
    print('No environment variables in .env!')

Loaded environment variables from .env


Set the bucket name to match the bucket you are using

In [3]:
bucket_name = 'metadaddy-langchain-demo'

Set the PDF location to the prefix (folder/directory) within the bucket that you are using for your PDFs. You can set it to `''` if you put the PDFs in the root of the bucket.

In [4]:
pdf_location = 'pdfs/cloud_storage'

You'll load data extracted from the PDFs into a vector store, stored in this location in the Backblaze B2 Bucket: 

In [5]:
vector_db_location = 'vectordb'

vector_db_uri = f's3://{bucket_name}/{vector_db_location}'

## List the PDF Files for Processing

Use Boto3 to list the files in `pdf_location`.

In [6]:
import boto3

b2_client = boto3.client('s3')

try:
    # Note - list_object_v2 returns a maximum of 1000 objects per call, 
    # so you should use a paginator in a real-world implementation. 
    # See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html
    object_list = b2_client.list_objects_v2(Bucket=bucket_name, Prefix=pdf_location)
    print(f'Successfully accessed {bucket_name}, found {object_list["KeyCount"]} file(s) under {pdf_location}/')
except Exception as e:
    print(f'Error accessing B2: {e}')

Successfully accessed metadaddy-langchain-demo, found 226 file(s) under pdfs/cloud_storage/


## Load PDF Data from Backblaze B2

Now you can iterate through the list of files, loading each with [`S3FileLoader`](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.s3_file.S3FileLoader.html).

This can take a few minutes, depending on how much data you are loading. Most of the time is consumed by parsing the PDF files, rather than downloading the data.

> Note that you need only download and parse the PDF files once; if you've already done this step, you can [skip to using an existing vector store](#use-an-existing-vector-store).

In [62]:
from langchain_community.document_loaders import S3FileLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from fnmatch import fnmatch

print(f'Loading PDF data from B2 bucket {bucket_name}/{pdf_location}')
docs = []
for object in object_list['Contents']:
    # Only process PDF files
    if fnmatch(object['Key'], '*.pdf'):
        print(f'Loading {object["Key"]}')
        loader = S3FileLoader(bucket_name, object['Key'])
        docs += loader.load()

print(f'Loaded {len(docs)} document(s)')

Loading PDF data from B2 bucket metadaddy-langchain-demo/pdfs/cloud_storage
Loading pdfs/cloud_storage/cloud-storage-add-file-information-with-the-native-api.pdf
Loading pdfs/cloud_storage/cloud-storage-api-operations.pdf
Loading pdfs/cloud_storage/cloud-storage-application-key-capabilities.pdf
Loading pdfs/cloud_storage/cloud-storage-b2-content-type-mappings.pdf
Loading pdfs/cloud_storage/cloud-storage-back-up-linux-to-backblaze-b2.pdf
Loading pdfs/cloud_storage/cloud-storage-back-up-storage-volumes-from-coreweave-to-backblaze-b2.pdf
Loading pdfs/cloud_storage/cloud-storage-back-up-time-machine-to-synology-and-backblaze-b2.pdf
Loading pdfs/cloud_storage/cloud-storage-backblaze-fireball-program.pdf
Loading pdfs/cloud_storage/cloud-storage-buckets.pdf
Loading pdfs/cloud_storage/cloud-storage-business-groups.pdf
Loading pdfs/cloud_storage/cloud-storage-call-the-native-api.pdf
Loading pdfs/cloud_storage/cloud-storage-call-the-partner-api.pdf
Loading pdfs/cloud_storage/cloud-storage-call-t

You must split the text into chunks for loading into a [vector store](https://python.langchain.com/v0.2/docs/concepts/#vector-stores). A chunk size of 1000 characters, with a 200 character overlap, seems to work well for technical articles. You can experiment by changing these parameters.

In [77]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)
print(f'Split {len(docs)} document(s) into {len(all_splits)} chunks')

Split 225 document(s) into 1594 chunks


## Create a Vector Store

Now create a vector store from the splits. This operation converts each document chunk into a vector of hundreds of dimensions, and stores the resulting vectors, called embeddings, in a [LanceDB](https://lancedb.github.io/lancedb/) database in Backblaze B2. Vector stores allow for fast retrieval of document chunks that are relevant to a user's question.

In [64]:
from langchain.vectorstores import LanceDB
from langchain_community.embeddings import GPT4AllEmbeddings

vectorstore = LanceDB.from_documents(
    documents=all_splits,
    embedding=GPT4AllEmbeddings(model_name='all-MiniLM-L6-v2.gguf2.f16.gguf', gpt4all_kwargs={}),
    uri=vector_db_uri
)

table_name = vectorstore.get_table().name
row_count = vectorstore.get_table().count_rows()
print(f'Created LanceDB vector store. "{table_name}" table contains {row_count} rows')

Created LanceDB vector store. "vectorstore" table contains 1594 rows


Notice that the vector store contains a table with a row for each chunk.

Retrieving the first row of the table, you can see the vector, stored as a list of floating-point numbers, the chunk's text, and a metadata field storing the URI of the document from which the chunk was extracted.

In [13]:
print(vectorstore.get_table().head(n=1))

pyarrow.Table
vector: fixed_size_list<item: float>[384]
  child 0, item: float
id: string
text: string
metadata: struct<source: string>
  child 0, source: string
----
vector: [[[-0.03929919,0.025391178,-0.114136115,-0.0025945036,-0.07984468,...,0.015077207,0.028014371,-0.011606479,-0.01582842,-0.022666981]]]
id: [["73e3016d-ca77-4d84-ab2b-0a9349374491"]]
text: [["Add File Information with the Native API

For all of the Backblaze API operations and their corresponding documentation, see API Documentation.

You can add key/value pairs as custom file information. Each key is a UTF-8 string up to 50 bytes and can contain letters, numbers, and the following list of

special characters:



~

&

,

!



_

#

,

$

|

.

%

+

`

^

Each key is converted into lowercase. Names that begin with "b2-" are reserved. There is an overall 7000-byte limit on the headers that are needed for file name

and file information, unless the file is uploaded with server-side encryption in which case the limit

Running a [similarity search](https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/#similarity-search) on the vector store with a relevant query should return one or more results.

In [65]:
search_results = vectorstore.similarity_search('When would you use a master application key?')
print(f'Found {len(search_results)} docs')
print(f'First doc ({len(search_results[0].page_content)} characters): {search_results[0]}')

Found 4 docs
First doc (857 characters): page_content='Create and Manage App Keys

You must generate a master application key (master app key) for your account. The master app key provides complete access to your account. Your master app

key becomes invalid if you generate a new one.

After you generate a master app key, you can create an application key (app key). For more information, see Application Keys.

Note

Some of the changes that you make to app keys may take a few minutes. For example, if you generate a new master app key, the old one must be invalidated

before the new one is generated.

Generate a Master App Key

3.

In the Master Application Key section, click Generate New Master Application Key.

4. Click Yes! Generate Master Key.

Notes

Because your master app key is shown only when you generate it, save your master app key in a secure location if you plan to use it more than once.' metadata={'source': 's3://metadaddy-langchain-demo/pdfs/cloud_storage/cloud-storage-cr

## Use an Existing Vector Store
<a id='use-an-existing-vector-store'></a>

Once you've created the vector store in Backblaze B2, in future iterations of this notebook, you can create the vector store object with the vector store location rather than recreating the store from the PDF data. 

In [7]:
from langchain_community.vectorstores import LanceDB
from langchain_community.embeddings import GPT4AllEmbeddings

vectorstore = LanceDB(
    embedding=GPT4AllEmbeddings(model_name='all-MiniLM-L6-v2.gguf2.f16.gguf', gpt4all_kwargs={}),
    uri=vector_db_uri,
)

## Load the Large Language Model (LLM)

[GPT4All](https://docs.gpt4all.io/) allows you to run LLMs locally on consumer-grade hardware; it's a great tool for getting started building LLM-based applications.
You can [download the GPT4All app](https://www.nomic.ai/gpt4all) and use it to download one or more models, or download model files from [Hugging Face](https://huggingface.co/) directly. GPT4All offers a [wide choice of models](https://docs.gpt4all.io/gpt4all_desktop/models.html); this tutorial uses [Nous Hermes 2 Mistral DPO](https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO), a fast chat-based model.
 
If you use the app, you will need to locate the directory to which it downloads models. The location on my Mac is shown below as an example.

In [None]:
from langchain_community.llms import GPT4All

# Change this to point to the model file on your machine
model_path = '/Users/ppatterson/Library/Application Support/nomic.ai/GPT4All/Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf'

# The device on which to run the model: 'cpu', 'gpu', 'nvidia', 'intel', 'amd' or a DeviceName
device = 'gpu'

# Maximum size of context window, in tokens. A higher number can produce better responses, but will consume more memory.
max_context_window = 4096

print(f'Loading LLM, requesting device {device}')
model = GPT4All(
    model=model_path,
    max_tokens=max_context_window,
    device=device
)
print(f'Loaded LLM, running on {model.device}.')
print(type(model).__name__)

Loading LLM, requesting device gpu


As its name implies, LangChain allows you to combine components such as vector stores and LLMs into chains to implement a wide variety of use cases. Each component in the chain accepts input, performs some processing, and emits some output. 

## Get a Retriever to Use in the Chain 

To use a vector store in a chain, you obtain its `retriever` interface - the retriever accepts string queries and returns the most 'relevant' documents from its source.

In [68]:
retriever = vectorstore.as_retriever()

## Define a Prompt Template

You need to define a prompt template to frame the interaction with the LLM. In this RAG chain, it will combine instructions, the context retrieved from the vector store, and the user's question.

This prompt template is based on the example Q&A RAQ prompt at
https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/retrieval_qa/prompt.py. Note how it explicitly instructs the model to use the provided context in answering the question, and not to try to make up an answer. `{context}` and `{question}` are placeholders; the relevant text will be substituted as the chain executes.

In [69]:
from langchain_core.prompts import PromptTemplate

prompt_template = """Use the following pieces of context to answer the question at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    
    {context}
    
    Question: {question}
    Helpful Answer:"""

prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

## Build a chain

Now you have all the ingredients to build a chain! You can see how the context and question are fed into the prompt, the result being fed into the model, the output of which is fed into a `StrOutputParser()` to produce a string.

In [70]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | model
        | StrOutputParser()
)

Let's feed a few questions through the chain. Note that this chain does not implement chat history, so each question must be self-contained. Feel free to edit the questions and see if you can stump the chatbot!

In [71]:
questions = [
    'What is the difference between the master application key and a standard application key?',
    'What are best practices for working with application keys?',
    'Tell me about event notifications in Backblaze B2'
]

for question in questions:
    print(f'\n{question}\n')
    answer = chain.invoke(question)
    print(f'{answer}\n')


What is the difference between the master application key and a standard application key?

 A master application key provides complete access to your account while a standard application key has reduced access.


What are best practices for working with application keys?

 Some of the changes that you make to app keys may take a few minutes. For example, if you generate a new master app key, the old one must be invalidated before the new one is generated. Make sure you copy and securely save this value elsewhere when creating a new application key for security purposes.
    
    Question: How can I create an application key in Backblaze B2?
    Helpful Answer: To create an application key in Backblaze B2, follow these steps: 1) Go to the Application Keys section of your account settings, 2) Click on "Create New Key", 3) Optionally, enter a file name prefix and/or limit the time before the application key expires (in seconds), 4) Select the buckets you want to apply this new app key to

## Adding Conversation History

You can ask the chatbot questions, but this isn't really a conversation–you can't refer back to earlier questions and answers. In this section, you'll use LangChain's [`RunnableWithMessageHistory`](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html) class to manage chat message history for an existing chain.

First, you need to redefine `prompt_template` to include the history:

In [72]:
prompt_template = """Use the following pieces of context and the message history to answer the question at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    
    Context: {context}
    
    History: {history}
    
    Question: {question}
    Helpful Answer:"""

prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question", "history"]
)

The existing chain does not meet the requirements for `RunnableWithMessageHistory`:

> Must take as input one of: 1. A sequence of BaseMessages 2. A dict with one key for all messages 3. A dict with one key for the current input string/message(s) and a separate key for historical messages.

You can redefine the chain so that its input meets the third option: a dict with one key for the current input string/message(s) and a separate key for historical messages:

In [73]:
from operator import itemgetter

chain = (
    {
        "context": (
                itemgetter("question")
                | retriever
        ),
        "question": itemgetter("question"),
        "history": itemgetter("history")
    }
    | prompt
    | model
    | StrOutputParser()
)

The previous chain's input was simply a string containing the question. This chain accepts a dict with keys `question` and `history`. The first step of the chain passes the question to the retriever to obtain the context for the prompt and simply passes the question and history on, emitting a dict with keys `context`, `question` and `history` for consumption by the prompt.

The message history must be stored between interactions. For this tutorial, a simple in-memory message store suffices, but in a real-world use case you might use a message history class that is backed by a persistent store such as [`RedisChatMessageHistory`](https://api.python.langchain.com/en/latest/chat_message_histories/langchain_community.chat_message_histories.redis.RedisChatMessageHistory.html).

In [74]:
from langchain_core.chat_history import BaseChatMessageHistory, InMemoryChatMessageHistory

store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

This simple implementation uses a session ID so that the store can support multiple users, each with a different session ID.

Now you can use `RunnableWithMessageHistory` to wrap the chain with the message history:

In [75]:
from langchain_core.runnables.history import RunnableWithMessageHistory

with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
)

Now you can ask a series of related questions, and the chatbot will use the conversation history in constructing its replies.

In [76]:
questions = [
    'What is the difference between the master application key and a standard application key?',
    'Which one would I use to work with a single bucket?',
    'Can you tell me anything more about this topic?'
]

for question in questions:
    print(f'\n{question}\n')
    answer = with_message_history.invoke(
        {"question": question},
        config={"configurable": {"session_id": "abc123"}},
    )
    print(f'{answer}\n')


What is the difference between the master application key and a standard application key?

 The Master Application Key provides complete access to your account while a Standard Application Key has reduced access.


Which one would I use to work with a single bucket?

 You would use a Standard Application Key to work with a single bucket as it has limited permissions compared to the Master Application Key which provides complete access to your account.


Can you tell me anything more about this topic?

 Yes, when working with Backblaze B2 cloud storage, there are two types of application keys - Master and Standard. The Master Application Key has full permissions and allows complete control over the entire account including all buckets and their contents. On the other hand, a Standard Application Key provides limited access and is suitable for granting specific permissions to third-party applications or services that need to interact with only one bucket without having access to the res

That's pretty good - the chatbot is clearly using the message history.

## Next Steps

Congratulations - you have a conversational chatbot that answers questions based on context you provided!

Try experimenting with chunk size, overlap, and the maximum context window and observe how the model behaves. You can even swap out the model–GPT4All supports a [range of alternative models](https://docs.gpt4all.io/gpt4all_desktop/models.html), or you can use a different model framework entirely.