# Motivation

We want to create an opportunity to chat with a custom knowledge base about amazon sagemaker. For this we use a sagemaker QNA dataset and make it available to a large language model. 

Since the knowledge is very domain specific about AWS sagemaker, we could ask a simple LLM questions about Amazon or Amazon Web Services (AWS) in general, but we might struggle to find specific details about AWS sagemaker. 

The general idea is: If we have sensitive data or company data, we can not use the OpenAI API, since we would send our documents to the API. This has two disadvantages:

- we share our sensitive data with OpenAI
- we need an API key for this, which means it costs us money per requests depending on the number of tokens we send to the API

To be able to keep privacy of our data, we show a process how to use a small LLM locally, which will run on our computer, and use a local vector database (ChromaDB) for our domain specific knowledge. In this environment no data is leaving our local machine. This can be tested by setting of the internet connection after the model was downloaded.

Let's start by setting up the environment!

# Setting up the environment

Requirements:

- a notebook or computer with sufficient memory 16GB
- python environment with version 3.10.4

First we need to setup an environment. For the use of python, we recommend using pyenv virtualenv to setup a specific python version (in this case python 3.10.4) and use poetry to take care of the required python packages.

To setup the environment, just use `pyenv virtualenv 3.10.4 llm_chromadb` or a similar method to setup the correct python version.

Maybe an installation of python 3.10.4 is required before by using `pyenv install 3.10.4` 

After the setup, just use `pyenv activate llm_chromadb`. When the python environment is activated, just use `poetry install` inside this repository folder to install all required packages defined in the [pyproject.toml](pyproject.toml) for this project. 

Now we are ready to start!

# Prepare the documents, setup the LLM and create the vector database 

First lets import the required classes from the langchain package: 
- TextLoader to load the texts from our file

- RecursiveCharacterTextSplitter to split up the text inside the document

- Chroma-DB for the vector database

- VectorDBQA to do the retrieval of information from the vector database

In [None]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import VectorDBQA

## Load documents

Load the documents to do question answering over the sagemaker qna dataset.

In [65]:
loader = TextLoader('Sagemaker_qna.txt')
qna = loader.load()

## Split documents

Now we want to split these documents into smaller chunks. This allows to find the most relevant chunks for a query and pass only those into the LLM.

In [67]:
text_splitter_qna = RecursiveCharacterTextSplitter()
texts_qna = text_splitter_qna.split_documents(qna)

## Initialize ChromaDB

Create embeddings for each chunk and insert into the Chroma vector database. For this step, we first ne the correct emebdding to the model we want to use later.

We want to use the [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) model from hugginface, since its size is quite small and should be sufficient for this tutorial. For our documents vector database we want to use the same embeddings as for the original model. For this we use the SentenceTransformerEmbeddings.

Lets get the model from the huggingface_hub. Warning: this model has a size of about 3GB, so please only download it on your laptop with a stable internet connection. Depending on your internet connection, this might take some minutes.


In [85]:
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
model_name = "google/flan-t5-large"
embedding_function = SentenceTransformerEmbeddings(model_name=model_name)

No sentence-transformers model found with name /Users/jonasbechthold/.cache/torch/sentence_transformers/google_flan-t5-large. Creating a new one with MEAN pooling.
Some weights of the model checkpoint at /Users/jonasbechthold/.cache/torch/sentence_transformers/google_flan-t5-large were not used when initializing T5EncoderModel: ['lm_head.weight']
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Now this we have the embedding function, we can create the vector database by putting in our QNAs of sagemaker (texts_qna) and the embeddings into the Chroma_from_documents() method.

In [82]:
vectordb = Chroma.from_documents(texts_qna, embedding_function)

## Create the LLM locally

Now, lets get the LLM model of the "google-flant-t5-large" and use it locally on our computer. For this we get the model and the tokenizer from huggingface and create a pipeline for the model.

In [86]:
from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

pipe = pipeline(
    "text2text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)

## Create the LLM chain consisting of prompt and LLM

Now we initialize the chain we will use for question answering. In this case we just use Langchain as a convenience tool, to tell the model which prompt with which input question it should use. We define a simple prompt:

In [39]:
from langchain import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step.

Answer: """

prompt = PromptTemplate(template=template, input_variables=["question"])

Now we connect the prompt with the LLM:

In [40]:
from langchain import LLMChain

llm_chain = LLMChain(prompt=prompt, 
                     llm=local_llm
                     )

Lets ask the LLM a very general question, it should know the answer, since this is quite general knowledge, which would be available from wikipedia:

In [42]:
question1 = "What is the capital of Germany?"

print(llm_chain.run(question1))

The capital of Germany is Berlin. Berlin is located in Germany. So the final answer is Berlin.


## Build the information retrieval from our vector db

Now we build the method to extract relevant knowledge from our vector database to be able to both chat with the LLM and chat with our QNA vector database. For this we put in the LLM (flan-t5-large), a chain type (stuff) and the vector database (vectordb).

In [46]:
qa = VectorDBQA.from_chain_type(llm=local_llm, chain_type="stuff", vectorstore=vectordb)

# Ask questions to the LLM and the vector database

To have a comparison between the asked question without the doucment database, we ask a question to the pure llm:

In [47]:
question = "What is amazon sagemaker?"

print(llm_chain.run(question))

The Amazon Sagemaker is a type of adverb. The Amazon Sagemaker is a type of adverb. The answer: adverb.


The anser we get is quite strange. The llm seems not to have the appropriate knowledge.

Now we ask the vector database with the llm on top of it:

In [None]:
query = "What is amazon sagemaker"
qa.run(query)

'Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly'

Now the answer is much more appropriate.

Lets ask our documents a few more questions:

In [61]:
query = "What notebook instance types does amazon sagemaker provide? Whats the price for this?"
qa.run(query)

'Jupyter'

This seems to be correct. We continue with some more specific questions and get quite promising results:

In [63]:
query = "What do you know about the pricing of aws sagemaker?"
qa.run(query)

'AWS pricing'

In [74]:
query = "What do you know about the pricing of aws sagemaker? Give me some example prices!"
qa.run(query)

'Amazon SageMaker pricing is based on the number of instances you need.'

In [75]:
query = "Does Sagemaker support R?"
qa.run(query)

'Yes'

In [76]:
query = "Does sagemaker support R and in which applications?"
qa.run(query)

'Yes, R supported with Amazon SageMaker. You can use R within SageMaker Notebook instances, which include a pre-installed R kernel and the reticulate library.'

In [77]:
query = "How are SageMaker Studio Notebooks different from the notebooks instances?"
qa.run(query)

'With the new notebook experience, you can now quickly launch notebooks without needing to manually provision an instance and waiting for it to be operational'

That's it, we successfully asked our vector database questions and got quite promising question. Considering the small size of the model, a larger model trained on this specific purpose would be a great part for future exploration.