## This notebook contains:
* Running Llama2 in the cloud hosted on Replicate
* Using LangChain to ask Llama general questions and follow up questions
* Using LangChain to load PDF docs - (Kenya Law Documents) - and chat about it.
* The end result will be a chatbot that will be able answer questions about the data not publicly available when Llama2 was trained, or about your own data. RAG is one way to prevent LLM's hallucination

Let's start by installing the necessary packages:
- sentence-transformers for text embeddings
- chromadb gives us database capabilities 
- langchain provides necessary RAG tools for this demo

And setting up the Replicate token.

In [None]:
%pip install langchain replicate sentence-transformers chromadb pypdf

Next we call the Llama 2 model from replicate. In this example we will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).

The model is added in the format: model_name/version


In [2]:
# set up the Replicate API token
from dotenv import load_dotenv
import os

# Load the .env file
load_dotenv()

# Access the environment variable
REPLICATE_API_TOKEN = os.getenv('REPLICATE_API_TOKEN')

print("Getting Replicate API token...")
REPLICATE_API_TOKEN = "r8_aARpJdbqdFixYHVQ9LvCvaoRA0Svg8j2Pw5W3" 
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN
print("Replicate API token set.")

# initialize the Llama2 model

from langchain.llms import Replicate

print("Initializing Llama3 model...")
llama3_70b = "meta/meta-llama-3-70b-instruct"
llm = Replicate(
    model=llama3_70b,
    model_kwargs={"temperature": 0.75, "top_p": 0.9, "max_new_tokens": 500}
)
print("Llama3 model initialized.")


Getting Replicate API token...
Replicate API token set.
Initializing Llama3 model...
Llama3 model initialized.


[`ConversationBufferMemory`](https://python.langchain.com/docs/modules/memory/types/buffer) is used to pass the chat history to the model and give it the capability to handle follow up questions.

In [4]:
# using ConversationBufferMemory to pass memory (chat history) for follow up questions
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory,
    verbose=False
)

Once this is set up, let us repeat the steps from before and ask the model a simple question.

Then we pass the question and answer back into the model for context along with the follow up question.

In [7]:
# restart from the original question
answer = conversation.predict(input=question)
print(answer)

 Ah, you're asking about "The Innovator's Dilemma," the iconic book written by Clayton Christensen! He is a renowned author, professor, and business consultant known for his groundbreaking work in the fields of innovation and disruptive technologies.

Published in 1997, "The Innovator's Dilemma" explores why successful companies often struggle to adapt to new technologies and business models that ultimately disrupt their industries. The book introduces the concept of "disruptive innovation," which describes how small, unassuming ideas can eventually upend entire markets and leave established players behind.

Clayton Christensen has since become one of the most influential voices in the business world, and his ideas have been applied across various sectors, from technology and healthcare to finance and education. His follow-up books, such as "The Innovator's Solution" and "Competing Against Luck," further expand on these concepts and offer practical guidance for leaders looking to stay 

In [5]:
# pass context (previous question and answer) along with the follow up "tell me more" to Llama who now knows more of what
memory.save_context({"input": question},
                    {"output": answer})
followup_answer = conversation.predict(input=followup)
print(followup_answer)

NameError: name 'question' is not defined

Load the documents from the data path

Using Llama 2 to answer questions using documents for context. 
This gives us the ability to update Llama 2's knowledge thus giving it better context without needing to fine-tune. 

In [3]:
from langchain.document_loaders import PyPDFDirectoryLoader

DATA_PATH = 'data'

In [4]:
loader = PyPDFDirectoryLoader(DATA_PATH)
docs = loader.load()

In [5]:
import os

# Check the number of PDF files in the data directory
pdf_files = [f for f in os.listdir(DATA_PATH) if f.endswith('.pdf')]
print(f"Number of PDF files: {len(pdf_files)}")


Number of PDF files: 23


Storing the documents. There are more than 30 vector stores (DBs) supported by LangChain. 
In this case [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma) is used which is light-weight and in memory so it's easy to get started with.

We will also import the HuggingFaceEmbeddings and RecursiveCharacterTextSplitter to assist in storing the documents.

In [6]:
from langchain.vectorstores import Chroma

# embeddings are numerical representations of the question and answer text
from langchain.embeddings import HuggingFaceEmbeddings

# use a common text splitter to split text into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

To store the documents, we will need to split them into chunks using [`RecursiveCharacterTextSplitter`](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) and create vector representations of these chunks using [`HuggingFaceEmbeddings`](https://www.google.com/search?q=langchain+hugging+face+embeddings&sca_esv=572890011&ei=ARUoZaH4LuumptQP48ah2Ac&oq=langchian+hugg&gs_lp=Egxnd3Mtd2l6LXNlcnAiDmxhbmdjaGlhbiBodWdnKgIIADIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCkjeHlC5Cli5D3ABeAGQAQCYAV6gAb4CqgEBNLgBAcgBAPgBAcICChAAGEcY1gQYsAPiAwQYACBBiAYBkAYI&sclient=gws-wiz-serp) on them before storing them into our vector database. 

In general, you should use larger chuck sizes for highly structured text such as code and smaller size for less structured text. You may need to experiment with different chunk sizes and overlap values to find out the best numbers.

In [7]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = text_splitter.split_documents(docs)

# create the vector db to store all the split chunks as embeddings
embeddings = HuggingFaceEmbeddings()
vectordb = Chroma.from_documents(
    documents=all_splits,
    embedding=embeddings,
)

  from .autonotebook import tqdm as notebook_tqdm


We then use ` RetrievalQA` to retrieve the documents from the vector database and give the model more context on Llama 2, thereby increasing its knowledge.

For each question, LangChain performs a semantic similarity search of it in the vector db, then passes the search results as the context to Llama to answer the question.

In [10]:
# use LangChain's RetrievalQA, to associate Llama with the loaded documents stored in the vector db
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)


question = "What are the amendments made in the Climate Change (Amendment) Act Kenya 2023"
result = qa_chain({"query": question})
print(result['result'])

 Based on the information provided, I can see that the Climate Change (Amendment) Act Kenya 2023 has undergone several amendments. However, without access to the full text of the act, I cannot provide a comprehensive list of all the amendments made.

Some of the amendments mentioned in the context include:

* Insertion of a new definition for "Central Authority" in the Anti-Money Laundering and Combating of Terrorism Financing Laws (Amendment) Act, 2023.
* Amendment of section 7 of the Kenya Roads Board Act, 1999 to delete paragraph (g) and substitute it with a new paragraph (ga).
* Amendment of section 35 of the Kenya Roads Board Act, 1999 to insert a new subsection (2A) requiring the annual estimates to be submitted together with a collated annual roads programme.
* Amendment of the First Schedule to the Kenya Roads Board Act, 1999 to delete paragraphs 4, 5, and 6.
* Amendment of section 5 of the Kenya Revenue Authority Act, 1995 to delete the words "for the better carrying out of it

Now, lets bring it all together by incorporating follow up questions.

First we ask a follow up questions without giving the model context of the previous conversation. 
Without this context, the answer we get does not relate to our original question.

In [11]:
# no context passed so Llama2 doesn't have enough context to answer so it lets its imagination go wild
result = qa_chain({"query": "Provide only the amendments that are relevant to climate change"})
print(result['result'])

 Based on the provided text, the following amendments are relevant to climate change:

1. Amendment of section 28 of the Unclaimed Financial Assets Act, 2011, which inserts the words "or such other person as the claimant may designate" immediately after the word "claimant". This amendment allows for the designation of a third party within a group which has existed for at least twenty-four months.
2. Amendment of the Ninth Schedule to the Income Tax Act, which deletes the words "ten per cent" appearing in subparagraph (1) and substitutes them with "twenty per cent". This increase in the tax rate for certain activities may have an impact on climate change efforts.
3. Amendment of section 5 of the Value Added Tax Act, 2013, which deletes paragraph (aa) and (ab). These changes may affect the taxation of certain goods and services related to climate change mitigation and adaptation efforts.


As we did before, let us use the `ConversationalRetrievalChain` package to give the model context of our previous question so we can add follow up questions.

In [27]:
# use ConversationalRetrievalChain to pass chat history for follow up questions
from langchain.chains import ConversationalRetrievalChain
chat_chain = ConversationalRetrievalChain.from_llm(llm, vectordb.as_retriever(), return_source_documents=True)

In [28]:
# let's ask the original question "What is llama2?" again
result = chat_chain({"question": question, "chat_history": []})
print(result['answer'])

 The text you provided does not contain the word "ll


In [29]:
# this time we pass chat history along with the follow up so good things should happen
chat_history = [(question, result["answer"])]
followup = "what are its use cases?"
followup_answer = chat_chain({"question": followup, "chat_history": chat_history})
print(followup_answer['answer'])

 I don't know the answer to that question. Llama2 is not a term I am familiar with, and I cannot provide information on its typical use cases.


Further follow ups can be made possible by updating chat_history.

Note that results can get cut off. You may set "max_new_tokens" in the Replicate call above to a larger number (like shown below) to avoid the cut off.

```python
model_kwargs={"temperature": 0.01, "top_p": 1, "max_new_tokens": 1000}
```

In [None]:
# further follow ups can be made possible by updating chat_history like this:
chat_history.append((followup, followup_answer["answer"]))
more_followup = "what tasks can it assist with?"
more_followup_answer = chat_chain({"question": more_followup, "chat_history": chat_history})
print(more_followup_answer['answer'])

