# Goal of this notebook

We want to create a custom knowledge base about amazon sagemaker QNAs which we can chat to! Since the knowledge is very domain specific about AWS sagemaker, we could ask a LLM questions about Amazon or Amazon Web Services (AWS) in general, but we might struggle to find specific details about sagemaker. 

The general idea is: If we have sensitive data or company data, we can not use the OpenAI API, since we would send our documents to the API. This has two disadvantages:

- we share our sensitive data with OpenAI
- we need an API key for this, which means it costs us money per requests depending on the number of tokens we send to the API

To be able to keep privacy of our data, we show a process how to use a local model, which will run on our computer, and use a local vector database for our domain specific knowledge. In this environment no data is leaving our local machine. This could, in a next step, be easily deploed in the cloud. For example on AWS sagemaker or GoogleCloud Vertext AI. The cost for this would be quite low. 

# Setting up the environment

Requirements:

- a notebook or computer with sufficient memory 16GB
- python environment

First we need to setup an environment. For the use of python, we recommend using pyenv virtualenv to setup a specific python version (in this case python 3.10.4) and use poetry to take care of the required python packages.

To setup the environment, just use `pyenv virtualenv 3.10.4 llm_chatbot`

Maybe an installation of python 3.10.4 is required before by using pyenv install 3.10.4. After the setup, just use pyenv activate llm_chatbot. When the python environment is activated, just use `poetry install`to install all required packages for this project.

# Document Question Answering with local flant5-large

An example of using Chroma DB and LangChain to do question answering over a sagemaker qna document with a local deployed llm.

In [64]:
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import VectorDBQA
from langchain.document_loaders import TextLoader

## Load documents

Load documents to do question answering over. If you want to do this over your documents, this is the section you should replace.

In [65]:
loader = TextLoader('Sagemaker_qna.txt')
qna = loader.load()

## Split documents

Now we want to split these documents into small chunks. This is so we can find the most relevant chunks for a query and pass only those into the LLM.

In [67]:
text_splitter_qna = RecursiveCharacterTextSplitter()
texts_qna = text_splitter_qna.split_documents(qna)

## Initialize ChromaDB

Create embeddings for each chunk and insert into the Chroma vector database. For this step, we first ne the correct emebdding to the model we want to use later.

We want to use the [open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b) model with its embeddings for our vector db.

In [69]:
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

Lets get the model from the huggingface_hub. Warning: this model has a size of almost 7GB, so please only download it on your laptop with a stable internet connection. Depending on your internet connection, this might take some minutes.

In [84]:
model_name = "google/flan-t5-large"

In [85]:
# create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(model_name=model_name)

No sentence-transformers model found with name /Users/jonasbechthold/.cache/torch/sentence_transformers/google_flan-t5-large. Creating a new one with MEAN pooling.
Some weights of the model checkpoint at /Users/jonasbechthold/.cache/torch/sentence_transformers/google_flan-t5-large were not used when initializing T5EncoderModel: ['lm_head.weight']
- This IS expected if you are initializing T5EncoderModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5EncoderModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


create the vector database with embeddings for our open_llama_3b model from huggingface to put our knowledge into the vector database.

In [82]:
vectordb = Chroma.from_documents(texts_qna, embedding_function)

## Local Model flant5large

In [86]:
from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM
from langchain import PromptTemplate, LLMChain

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

pipe = pipeline(
    "text2text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)

## Create the chain

Initialize the chain we will use for question answering. In this case we just use Langchain as a convenience tool, to tell the model which prompt with which input question it should use.

### create template and ask questions to flant5

To tell the model how it should anser our question, we create a PromptTemplate from langchain to define a prompt template.

In [39]:
template = """Question: {question}

Answer: Let's think step by step.

Answer: """

prompt = PromptTemplate(template=template, input_variables=["question"])

For this purpose we can use a langchain to connect the prompt with the llm:

In [40]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=local_llm
                     )

Lets ask the llm a very general question, it should know the answer, since this is quite general which would be available from wikipedia:

In [42]:
question1 = "What is the capital of Germany?"

print(llm_chain.run(question1))

The capital of Germany is Berlin. Berlin is located in Germany. So the final answer is Berlin.


# Create the chain with our vector db

In [46]:
qa = VectorDBQA.from_chain_type(llm=local_llm, chain_type="stuff", vectorstore=vectordb)

## Ask questions!

To have a comparison between the asked question without the doucment database, we ask a question to the pure llm:

In [47]:
question = "What is amazon sagemaker?"

print(llm_chain.run(question))

The Amazon Sagemaker is a type of adverb. The Amazon Sagemaker is a type of adverb. The answer: adverb.


The anser we get is quite strange. The llm seems not to have the appropriate knowledge. Now we ask the Vector DB with the llm on top of it:

In [48]:
query = "What is amazon sagemaker"
qa.run(query)

'Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly'

Now the answer is much more appropriate.

Lets ask a few more questions:

In [61]:
query = "What notebook instance types does amazon sagemaker provide? Whats the price for this?"
qa.run(query)

'Jupyter'

this seems to be correct. 

In [63]:
query = "What do you know about the pricing of aws sagemaker?"
qa.run(query)

'AWS pricing'

In [74]:
query = "What do you know about the pricing of aws sagemaker? Give me some example prices!"
qa.run(query)

'Amazon SageMaker pricing is based on the number of instances you need.'

In [75]:
query = "Does Sagemaker support R?"
qa.run(query)

'Yes'

In [76]:
query = "Does sagemaker support R and in which applications?"
qa.run(query)

'Yes, R supported with Amazon SageMaker. You can use R within SageMaker Notebook instances, which include a pre-installed R kernel and the reticulate library.'

In [77]:
query = "How are SageMaker Studio Notebooks different from the notebooks instances?"
qa.run(query)

'With the new notebook experience, you can now quickly launch notebooks without needing to manually provision an instance and waiting for it to be operational'