# Chat with Self Help Concepts with RAG

This notebook will guide you through the process of setting up the environment, importing documents, and interacting with LangChain for document-based Q&A. We'll cover topics such as document preprocessing, question formulation, and analyzing the model's responses.

Whether you're a researcher, student, or professional, this demo notebook will showcase how LangChain can revolutionize your document exploration and information retrieval workflows.

# Prerequisites

This code installs several Python packages that are required for the project. Explanations happily generated for you by [Chepetto](https://openai.com/blog/chatgpt).

- [`langchain`](<https://python.langchain.com/>) is a package for language modeling and language generation tasks.
- [`openai`](<https://openai.com/>) is a package for accessing the OpenAI API, which provides access to various language models and AI tools.
- [`pypdf`](<https://pypi.org/project/PyPDF2/>) is a package for working with PDF files in Python.
- [`tiktoken`](<https://github.com/openai/tiktoken>) is a package for accessing the TikTok API.
- [`faiss-cpu`](<https://github.com/facebookresearch/faiss>) is a package for performing efficient similarity searches on large datasets using the FAISS library.


In [62]:
%pip install -qU langchain openai pypdf tiktoken faiss-cpu

Note: you may need to restart the kernel to use updated packages.


# OpenAI API Key

To use the OpenAI API, you need to obtain an API key from the [OpenAI website](https://platform.openai.com/account/api-keys). The API key is a unique identifier that allows you to access the OpenAI API and make requests to it. By setting the 'OPENAI_API_KEY' environment variable, you can securely provide your API key to the code without hardcoding it into the script.

In [2]:
import os
import getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass("OPENAI_API_KEY")

# Embeddings setup

This code initializes an instance of the [OpenAIEmbeddings](https://python.langchain.com/en/latest/reference/modules/embeddings.html?highlight=embeddings#langchain.embeddings.OpenAIEmbeddings) class and assigns it to the variable embeddings. An [embedding](https://platform.openai.com/docs/guides/embeddings) is a way to represent words or phrases as numeric vectors, which can be used as input to machine learning models.  The `OpenAIEmbeddings` class provides access to pre-trained word embeddings from OpenAI, which were trained on a large corpus of text data using advanced deep learning techniques.

Once you have initialized an instance of the `OpenAIEmbeddings` class, you can use it to obtain the embedding vector for any given chunk of text. This can be useful for a variety of [natural language processing](https://en.wikipedia.org/wiki/Natural_language_processing) (NLP) tasks, such as sentiment analysis, language translation, and text classification. In this notebook we use it to do [semantic search](https://en.wikipedia.org/wiki/Semantic_search) with a [vector database](https://www.youtube.com/watch?v=klTvEwg3oJ4&ab_channel=Fireship) in this case.

## Model

| Name | Tokenizer | Max input tokens | Output dimensions |
| :--- | :--- | ---: | ---: |
| text-embedding-ada-002 | cl100k_base | 8191 | 1536 |




In [3]:
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

# Splitter setup

The [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/modules/indexes/text_splitters/examples/recursive_text_splitter.html) is a text splitting tool that takes in a large text document as input and splits it into smaller chunks for downstream processing. Here's what each parameter in the splitter setup means:

- `chunk_size`: This parameter specifies the size of each chunk of text that the splitter will output. In this case, the splitter is set up to output chunks of 500 characters each.

- `chunk_overlap`: This parameter specifies the number of characters of overlap that each chunk will have with the next chunk. In this case, the splitter is set up to have an overlap of 20 characters between adjacent chunks.

- `length_function`: This parameter specifies the function that the splitter will use to calculate the length of the input text. In this case, the `len` function is used, which returns the number of characters in the text.

Together, these parameters determine how the input text will be split into smaller chunks. The splitter will output chunks of 500 characters each, with an overlap of 20 characters between adjacent chunks, until the entire input text has been processed. This setup is designed to balance the need for small enough chunks for efficient processing, with enough overlap between chunks to minimize the risk of losing contextual information at the boundaries between chunks.

In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap  = 50
)

# Load (and split) documents

This code snippet loads PDF files from a directory named "pdf/" using a [PyPDFDirectoryLoader](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/pdf.html?highlight=PyPDFDirectoryLoader) class from the `langchain.document_loaders` module. The `loader` variable is an instance of `PyPDFDirectoryLoader`, which takes the directory path as an argument.

After instantiating the loader, the code calls the `load_and_split` method to load the PDF files from the directory and split their text using the text splitter we created before.

## Upload your PDFs
Create a folder called 'pdf' and throw in any number of pdf's you'd like to chat with.

> Note; The pdf's will be deleted once you close the notebook.

In [7]:
from langchain.document_loaders import PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("pdf/")
docs = loader.load_and_split(text_splitter=text_splitter)
len(docs)


4427

In [None]:
docs[243].page_content # arbitrary chunk

# Vector store setup

ChromaDB

In [11]:
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

faiss_index = FAISS.from_documents(docs, embeddings)


# Save the db

This code saves the FAISS index created in the previous code cell to disk with the name `faiss_index`. The `save_local()` method is called on the `faiss_index` object, which is the FAISS index created earlier. The `save_local()` method is a utility method provided by the FAISS class to save the index to the local file system.

After executing this code, a file named "faiss_index" should be created in the current working directory. This file contains the serialized version of the FAISS index, which can be loaded back into memory later using the `FAISS.load_local()` method.

In [13]:
faiss_index.save_local("faiss_index-dsm5-tr")

# Test the vector store

This code performs a similarity search using the FAISS index created earlier and the query string *"Is a prototyping more than enough for software?"*.

The `similarity_search()` method is called on the faiss_index object with two arguments: the query string and `k=5`, which specifies that the top 5 most similar documents should be returned. The result of the similarity search is stored in the `query_result` variable.

The code then iterates over the chunks in the `query_result` list and prints the metadata and page content of each `chunk`. Specifically, it prints the page number and source of the document, along with its page content.

In [21]:
query_result = faiss_index.similarity_search("What is a personality disorder?", k=5)

for chunk in query_result:
    print(str(chunk.metadata["page"]) + " " + chunk.metadata["source"] + " :", chunk.page_content[:])

979 pdf/DSM5TR.pdf : Personality Disorders
This chapter
 begins with a general definition of personality disorder that applies to each of
the 10 specific personality disorders. A 
personality disorder
 is an enduring pattern of inner
experience and behavior that deviates markedly from the norms and expectations of the
individual’s culture, is pervasive and inflexible, has an onset in adolescence or early adulthood,
is stable over time, and leads to distress or impairment.
With any ongoing review process, especially one of this complexity, different viewpoints
emerge, and an effort was made to accommodate them. Thus, personality disorders are included
in both Sections II and III. The material in Section II represents an update of text associated with
the same criteria found in DSM-5 (which were carried over from DSM-IV-TR), whereas Section
III includes the proposed model for personality disorder diagnosis and conceptualization
979 pdf/DSM5TR.pdf : developed by the DSM-5 Personality and 

# Chat memory

This code imports the [ConversationBufferWindowMemory](https://python.langchain.com/en/latest/modules/memory/types/buffer_window.html) class from the `langchain.memory` module and creates an instance of it called `memory`. This class represents a memory buffer that stores conversations in a windowed fashion, meaning that the buffer only retains a certain number of recent conversations.

The constructor of the `ConversationBufferWindowMemory` class takes two arguments: `memory_key` and `return_messages`. The `memory_key` parameter specifies a unique identifier for the memory buffer, and the `return_messages` parameter indicates whether or not to return the stored messages along with their metadata when accessing the memory buffer.

In this code, the `memory_key` is set to "chat_history", which is being used to store the chat conversations. The return_messages parameter is set to `True`, which indicates that the stored messages will be returned along with their metadata when accessing the memory buffer.

In [30]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(memory_key="chat_history", return_messages=False)

# Chain setup

This code imports several classes and functions from various modules in the langchain package and creates an instance of the [ConversationalRetrievalChain](https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html?highlight=ConversationalRetrievalChain) class called `qa`.

The `ConversationalRetrievalChain` class is a high-level class that provides an interface for building a conversational agent that can perform retrieval-based question answering. In this code, the `qa` instance is initialized using the `from_llm()` method, which initializes the agent using an LLM model, a retriever and the memory buffer.

### LLM
The `OpenAI` class from the `langchain.llms` module represents an instance of the OpenAI language model. In this code, an instance of the OpenAI class is created of the model "[gpt-3.5-turbo](https://platform.openai.com/docs/models)".

### Vector Store
The `faiss_index.as_retriever()` method returns a retriever instance that wraps the FAISS index created earlier. This retriever is used to retrieve candidate answers to questions asked of the conversational agent.

### Chat History Memory
The `memory` variable is a memory buffer that was created earlier using the `ConversationBufferWindowMemory` class. This memory buffer is used to store and retrieve past conversations for use in future interactions.

The `verbose=True` parameter indicates that verbose output should be produced when running the conversational agent.

In [None]:
from langchain.chains import ConversationalRetrievalChain, LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import RetrievalQA, RetrievalQAWithSourcesChain

qa = ConversationalRetrievalChain.from_llm(
    OpenAI(model_name="gpt-3.5-turbo", temperature=0.7, max_tokens=2000),
    faiss_index.as_retriever(k=5),
    memory=memory, verbose=False)


# Ask away

This code snippet involves a conversational agent that performs question-answering tasks. The user inputs a question, which is passed to the agent as a dictionary with a "question" key.

The agent then creates an embedding of the question to query the FAISS index and retrieve relevant text chunks based on an internal ranking criterion.

Next, the agent makes two calls to the LLM model ("gpt-3.5-turbo").

- The first call uses the retrieved text chunks, chat history, and the current user question to prompt the LLM to come up with a 'better' question for the entire context.
- The second call uses the enhanced question to retrieve the actual answer to the original user question.

The resulting answer is stored in the 'chat_result variable', which contains metadata and content related to the answer. The actual answer can be accessed using the "answer" key of the 'chat_result' dictionary.

In [58]:
query = "How is this entitlement observable?"
#query = "Wie is Marcus Quintillianus?"

chat_result = qa({"question": query})
chat_result

{'question': 'How is this entitlement observable?',
 'chat_history': [HumanMessage(content='Is this a common personality disorder?', additional_kwargs={}, example=False),
  AIMessage(content='Based on the given information, the estimated prevalence of narcissistic personality disorder ranges from 0.0% to 6.2%, with a median prevalence of 1.6%.', additional_kwargs={}, example=False),
  HumanMessage(content="That's huge!", additional_kwargs={}, example=False),
  AIMessage(content='The estimated prevalence of narcissistic personality disorder is 1.6%.', additional_kwargs={}, example=False),
  HumanMessage(content='What are the gradations?', additional_kwargs={}, example=False),
  AIMessage(content='The different levels or degrees of narcissistic personality disorder are not specified in the given context.', additional_kwargs={}, example=False),
  HumanMessage(content='When does treats become a disorder in this context??', additional_kwargs={}, example=False),
  AIMessage(content='Traits b