## This notebook contains:
* Running Llama2 in the cloud hosted on Replicate
* Using LangChain to ask Llama general questions and follow up questions
* Using LangChain to load PDF docs - (Kenya Law Documents) - and chat about it.
* The end result will be a chatbot that will be able answer questions about the data not publicly available when Llama2 was trained, or about your own data. RAG is one way to prevent LLM's hallucination

Let's start by installing the necessary packages:
- sentence-transformers for text embeddings
- chromadb gives us database capabilities 
- langchain provides necessary RAG tools for this demo

And setting up the Replicate token.

In [8]:
%pip install langchain-core replicate sentence-transformers chromadb pypdf faiss-cpu langchain-elasticsearch

Note: you may need to restart the kernel to use updated packages.


Next we call the Llama 2 model from replicate. In this example we will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).

The model is added in the format: model_name/version


In [9]:
import replicate
from dotenv import load_dotenv
import os

# Load the .env file
load_dotenv()

# Access the environment variable
REPLICATE_API_TOKEN = os.getenv('REPLICATE_API_TOKEN')

print("Getting Replicate API token...")
REPLICATE_API_TOKEN = "r8_Mrjuqk93LhNWTypoogA8o0JNCiywa1822TSSn"
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN
print("Replicate API token set.")
print((REPLICATE_API_TOKEN))

print("Initializing Llama3 model...")
llm = replicate.models.get("meta/meta-llama-3-70b-instruct")
print("Llama3 model initialized.")

Getting Replicate API token...
Replicate API token set.
r8_Mrjuqk93LhNWTypoogA8o0JNCiywa1822TSSn
Initializing Llama3 model...
Llama3 model initialized.


In [3]:
# Prepare the input
chat_history = []  # Initialize an empty list to store chat history
input = {
    "prompt": "Very briefly in 1 sentence, explain about the finance act 2024 of Kenya",
    "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
    "temperature": 0.75,
    "top_p": 0.9,
    "max_tokens": 800,
    "chat_history": chat_history  
}

# To run the model and get the output
output = replicate.run(llm, input=input)
print("".join(output))

# Update the chat history
chat_history.append({"user": input["prompt"], "assistant": "".join(output)})

# To stream the output
# for event in replicate.stream(llm, input=input):
#     print(event, end="")

The Finance Act 2024 of Kenya is a legislation that outlines the country's tax policies and reforms for the fiscal year 2024, aiming to promote economic growth, reduce inequality, and increase government revenue.


With the model set up, it is now possible to ask some questions.

In [30]:
question = "who wrote the book Innovator's dilemma?"
answer = llm(question)
print(answer)

TypeError: 'Model' object is not callable

We will then try to follow up the response with a question asking for more information on the book. 

Since the chat history is not passed on Llama doesn't have the context and doesn't know this is more about the book thus it treats this as new query.


In [5]:
# chat history not passed so Llama doesn't have the context and doesn't know this is more about the book
followup = "tell me more"
followup_answer = llm(followup)
print(followup_answer)

 Hello! As a helpful assistant, I'm here to assist you with any questions or tasks you may have. Whether it's providing information on a wide range of topics, helping you with a project or task, or simply being a listening ear, I'm here to help in any way I can.

I have access to a vast amount of knowledge and resources, so if there's something you're curious about or need help with, feel free to ask! Some examples of things I can help with include:

* Answering questions on a variety of topics such as history, science, technology, health, and more
* Providing guidance on how to complete a task or project
* Offering suggestions and ideas for creative projects or brainstorming sessions
* Assisting with language translation and communication
* And much more!

Is there anything specific you would like to know or discuss? Please don't hesitate to ask, and I'll do my best to assist you.


[`ConversationBufferMemory`](https://python.langchain.com/docs/modules/memory/types/buffer) is used to pass the chat history to the model and give it the capability to handle follow up questions.

In [4]:
# To stream the output
import replicate 

for event in replicate.stream(llm, input=input):
    print(event, end="")
    # Update the chat history
    chat_history.append({"user": input["prompt"], "assistant": event})

The Finance Act 2024 of Kenya is a legislation that outlines various tax measures, amendments, and reforms aimed at promoting economic growth, increasing revenue, and enhancing transparency and accountability in the country's financial system.

In [6]:
# using ConversationBufferMemory to pass memory (chat history) for follow up questions
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

ValidationError: 2 validation errors for ConversationChain
llm
  instance of Runnable expected (type=type_error.arbitrary_type; expected_arbitrary_type=Runnable)
llm
  instance of Runnable expected (type=type_error.arbitrary_type; expected_arbitrary_type=Runnable)

Once this is set up, let us repeat the steps from before and ask the model a simple question.

Then we pass the question and answer back into the model for context along with the follow up question.

In [7]:
# restart from the original question
answer = conversation.predict(input=question)
print(answer)

 Ah, you're asking about "The Innovator's Dilemma," the iconic book written by Clayton Christensen! He is a renowned author, professor, and business consultant known for his groundbreaking work in the fields of innovation and disruptive technologies.

Published in 1997, "The Innovator's Dilemma" explores why successful companies often struggle to adapt to new technologies and business models that ultimately disrupt their industries. The book introduces the concept of "disruptive innovation," which describes how small, unassuming ideas can eventually upend entire markets and leave established players behind.

Clayton Christensen has since become one of the most influential voices in the business world, and his ideas have been applied across various sectors, from technology and healthcare to finance and education. His follow-up books, such as "The Innovator's Solution" and "Competing Against Luck," further expand on these concepts and offer practical guidance for leaders looking to stay 

In [10]:
# pass context (previous question and answer) along with the follow up "tell me more" to Llama who now knows more of what
memory.save_context({"input": question},
                    {"output": answer})
followup_answer = conversation.predict(input=followup)
print(followup_answer)

NameError: name 'memory' is not defined

Load the documents from the data path

Using Llama 2 to answer questions using documents for context. 
This gives us the ability to update Llama 2's knowledge thus giving it better context without needing to fine-tune. 

In [10]:
from langchain.document_loaders import PyPDFDirectoryLoader

DATA_PATH = 'data'

def preprocess_text(text):
    # Remove headers, footers, and other irrelevant content
    # Customize this function based on your specific requirements
    processed_text = text.strip()
    # Add more preprocessing steps as needed
    return processed_text

# Load and preprocess the documents
loader = PyPDFDirectoryLoader(DATA_PATH)
docs = loader.load_and_split()

# Preprocess each document
preprocessed_docs = [preprocess_text(doc.page_content) for doc in docs]

In [11]:
import os

# Check the number of PDF files in the data directory
pdf_files = [f for f in os.listdir(DATA_PATH) if f.endswith('.pdf')]
print(f"Number of PDF files: {len(pdf_files)}")


Number of PDF files: 1


Storing the documents. There are more than 30 vector stores (DBs) supported by LangChain. 
In this case [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma) is used which is light-weight and in memory so it's easy to get started with.

We will also import the HuggingFaceEmbeddings and RecursiveCharacterTextSplitter to assist in storing the documents.

Version 1: ChromaDB

In [8]:
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
chunks = text_splitter.split_documents(docs)

embeddings = HuggingFaceEmbeddings()
db = Chroma.from_documents(chunks, embeddings)
retriever = db.as_retriever()

  from .autonotebook import tqdm as notebook_tqdm


KeyboardInterrupt: 

Version 2: 

In [14]:
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter

# Split the documents into chunks
text_splitter = CharacterTextSplitter(chunk_size=4000, chunk_overlap=200)
chunks = text_splitter.create_documents(preprocessed_docs)

# Create the vector database
embeddings = HuggingFaceEmbeddings()
db = FAISS.from_documents(chunks, embeddings)
retriever = db.as_retriever()

To store the documents, we will need to split them into chunks using [`RecursiveCharacterTextSplitter`](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) and create vector representations of these chunks using [`HuggingFaceEmbeddings`](https://www.google.com/search?q=langchain+hugging+face+embeddings&sca_esv=572890011&ei=ARUoZaH4LuumptQP48ah2Ac&oq=langchian+hugg&gs_lp=Egxnd3Mtd2l6LXNlcnAiDmxhbmdjaGlhbiBodWdnKgIIADIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCkjeHlC5Cli5D3ABeAGQAQCYAV6gAb4CqgEBNLgBAcgBAPgBAcICChAAGEcY1gQYsAPiAwQYACBBiAYBkAYI&sclient=gws-wiz-serp) on them before storing them into our vector database. 

In general, you should use larger chuck sizes for highly structured text such as code and smaller size for less structured text. You may need to experiment with different chunk sizes and overlap values to find out the best numbers.

In [13]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
chunks = text_splitter.split_documents(docs)

# create the vector db to store all the split chunks as embeddings
embeddings = HuggingFaceEmbeddings()
db = Chroma.from_documents(chunks, embeddings)
retriever = db.as_retriever()

We then use ` RetrievalQA` to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.

For each question, LangChain performs a semantic similarity search of it in the vector db, then passes the search results as the context to Llama to answer the question.

Retriever 1

In [16]:
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate
from langchain.llms.base import LLM

# Create a prompt template
template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant that answers questions based law and legal documents.
If the context does not provide enough information to answer the question, answer to the best of your ability using your pretrained data. Remember you are a Law based Chatbot"
<|eot_id|><|start_header_id|>user<|end_header_id|>
Context:
{context}

Question: {question}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
prompt = PromptTemplate(template=template, input_variables=["context", "question"])

class CustomLLMReplicate(LLM):
    def _call(self, prompt, stop=None):
        return llm_replicate(prompt)

    @property
    def _identifying_params(self):
        return {"name": "custom_llm_replicate"}

    @property
    def _llm_type(self):
        return "custom"

def llm_replicate(prompt, **kwargs):
    input_data = {
        "prompt": prompt,
        "temperature": 0.75,
        "top_p": 0.9,
        "max_tokens": 800,
    }
    output = replicate.run(llm, input=input_data)
    return "".join(output)

# Create an instance of CustomLLMReplicate
custom_llm_replicate = CustomLLMReplicate()

# Create a question-answering chain
qa_chain = load_qa_chain(llm=custom_llm_replicate, chain_type="stuff", prompt=prompt)

def ask_question(query):
    preprocessed_docs = retriever.get_relevant_documents(query)
    
    if not preprocessed_docs:
        # If no relevant documents are found, use the Replicate API with Llama3
        return llm_replicate(query)
    
    result = qa_chain.run(input_documents=preprocessed_docs, question=query)
    return result

# Example usage
query = "What are the amendments made in the Climate Change (Amendment) Act Kenya 2023"
answer = ask_question(query)
print(answer)

Retriever 2

In [32]:
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate
from langchain.llms.base import LLM

DATA_PATH = 'data'

def preprocess_text(text):
    processed_text = text.strip()
    return processed_text

loader = PyPDFDirectoryLoader(DATA_PATH)
docs = loader.load_and_split()
preprocessed_docs = [preprocess_text(doc.page_content) for doc in docs]

text_splitter = CharacterTextSplitter(chunk_size=4000, chunk_overlap=200)
chunks = text_splitter.create_documents(preprocessed_docs)

embeddings = HuggingFaceEmbeddings()
db = FAISS.from_documents(chunks, embeddings)
retriever = db.as_retriever()

template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant that answers questions based on law and legal documents.
If the context does not provide enough information to answer the question, answer to the best of your ability using your pretrained data. Remember you are a Law based Chatbot.
<|eot_id|><|start_header_id|>user<|end_header_id|>
Context:
{context}

Question: {question}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
prompt = PromptTemplate(template=template, input_variables=["context", "question"])

class CustomLLMReplicate(LLM):
    def _call(self, prompt, stop=None):
        return llm_replicate(prompt)

    @property
    def _identifying_params(self):
        return {"name": "custom_llm_replicate"}

    @property
    def _llm_type(self):
        return "custom"

def llm_replicate(prompt, **kwargs):
    input_data = {
        "prompt": prompt,
        "temperature": 0.75,
        "top_p": 0.9,
        "max_tokens": 1200,
    }
    output = replicate.run(llm, input=input_data)
    return "".join(output)

custom_llm_replicate = CustomLLMReplicate()
qa_chain = load_qa_chain(llm=custom_llm_replicate, chain_type="stuff", prompt=prompt)

def ask_question(query):
    preprocessed_docs = retriever.get_relevant_documents(query)
    
    if not preprocessed_docs:
        llm_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant that answers questions based on law and legal documents.
If the context does not provide enough information to answer the question, answer to the best of your ability using your pretrained data. Remember you are a Law based Chatbot.
<|eot_id|><|start_header_id|>user<|end_header_id|>
Question: {query}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
        return llm_replicate(llm_prompt)
    
    result = qa_chain.run(input_documents=preprocessed_docs, question=query)
    return result

query = "What are the amendments made in the Climate Change (Amendment) Act Kenya 2023, answer as briefly as possible"
answer = ask_question(query)
print(answer)

TypeError: 'list' object is not callable

In [18]:
query = "What are the amendments made in the Climate Change (Amendment) Act Kenya 2023"
chat_history = []

for output in replicate.stream(llm, input=rag_chain.run(query)):
    print(output, end="", flush=True)
    chat_history.append({"user": query,  "assistant": output})

print("\n")

NameError: name 'rag_chain' is not defined

In [21]:
while True:
    follow_up_query = input("Ask a follow-up question (or type 'exit' to quit): ")
    if follow_up_query.lower() == "exit":
        break
    
    template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful assistant that answers questions based on the given context and the conversation history.
<|eot_id|><|start_header_id|>user<|end_header_id|>
Context:
{context}

Conversation History:
{history}

Follow-up Question: {question}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
    prompt = PromptTemplate(template=template, input_variables=["context", "history", "question"])
    
    docs = retriever.get_relevant_documents(follow_up_query)
    context = " ".join([doc.page_content for doc in docs])
    
    history = "\n".join([f"User: {msg['user']}\nAssistant: {msg['assistant']}" for msg in chat_history])
    
    llm_input = prompt.format(context=context, history=history, question=follow_up_query)
    
    input_data = {
        "prompt": llm_input,
        "temperature": 0.75,
        "top_p": 0.9,
        "max_tokens": 800,
    }
    
    result = ""
    for output in replicate.stream(llm, input=input_data):
        print(output, end="", flush=True)
        result += output
    
    chat_history.append({"user": follow_up_query, "assistant": result})
    
    print("\n")

TypeError: 'dict' object is not callable

In [10]:
# use LangChain's RetrievalQA, to associate Llama with the loaded documents stored in the vector db
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)


question = "What are the amendments made in the Climate Change (Amendment) Act Kenya 2023"
result = qa_chain({"query": question})
print(result['result'])

 Based on the information provided, I can see that the Climate Change (Amendment) Act Kenya 2023 has undergone several amendments. However, without access to the full text of the act, I cannot provide a comprehensive list of all the amendments made.

Some of the amendments mentioned in the context include:

* Insertion of a new definition for "Central Authority" in the Anti-Money Laundering and Combating of Terrorism Financing Laws (Amendment) Act, 2023.
* Amendment of section 7 of the Kenya Roads Board Act, 1999 to delete paragraph (g) and substitute it with a new paragraph (ga).
* Amendment of section 35 of the Kenya Roads Board Act, 1999 to insert a new subsection (2A) requiring the annual estimates to be submitted together with a collated annual roads programme.
* Amendment of the First Schedule to the Kenya Roads Board Act, 1999 to delete paragraphs 4, 5, and 6.
* Amendment of section 5 of the Kenya Revenue Authority Act, 1995 to delete the words "for the better carrying out of it

Now, lets bring it all together by incorporating follow up questions.

First we ask a follow up questions without giving the model context of the previous conversation. 
Without this context, the answer we get does not relate to our original question.

In [11]:
# no context passed so Llama2 doesn't have enough context to answer so it lets its imagination go wild
result = qa_chain({"query": "Provide only the amendments that are relevant to climate change"})
print(result['result'])

 Based on the provided text, the following amendments are relevant to climate change:

1. Amendment of section 28 of the Unclaimed Financial Assets Act, 2011, which inserts the words "or such other person as the claimant may designate" immediately after the word "claimant". This amendment allows for the designation of a third party within a group which has existed for at least twenty-four months.
2. Amendment of the Ninth Schedule to the Income Tax Act, which deletes the words "ten per cent" appearing in subparagraph (1) and substitutes them with "twenty per cent". This increase in the tax rate for certain activities may have an impact on climate change efforts.
3. Amendment of section 5 of the Value Added Tax Act, 2013, which deletes paragraph (aa) and (ab). These changes may affect the taxation of certain goods and services related to climate change mitigation and adaptation efforts.


As we did before, let us use the `ConversationalRetrievalChain` package to give the model context of our previous question so we can add follow up questions.

In [27]:
# use ConversationalRetrievalChain to pass chat history for follow up questions
from langchain.chains import ConversationalRetrievalChain
chat_chain = ConversationalRetrievalChain.from_llm(llm, vectordb.as_retriever(), return_source_documents=True)

In [28]:
# let's ask the original question "What is llama2?" again
result = chat_chain({"question": question, "chat_history": []})
print(result['answer'])

 The text you provided does not contain the word "ll


In [29]:
# this time we pass chat history along with the follow up so good things should happen
chat_history = [(question, result["answer"])]
followup = "what are its use cases?"
followup_answer = chat_chain({"question": followup, "chat_history": chat_history})
print(followup_answer['answer'])

 I don't know the answer to that question. Llama2 is not a term I am familiar with, and I cannot provide information on its typical use cases.


Further follow ups can be made possible by updating chat_history.

Note that results can get cut off. You may set "max_new_tokens" in the Replicate call above to a larger number (like shown below) to avoid the cut off.

```python
model_kwargs={"temperature": 0.01, "top_p": 1, "max_new_tokens": 1000}
```

In [None]:
# further follow ups can be made possible by updating chat_history like this:
chat_history.append((followup, followup_answer["answer"]))
more_followup = "what tasks can it assist with?"
more_followup_answer = chat_chain({"question": more_followup, "chat_history": chat_history})
print(more_followup_answer['answer'])

