# Chat with Self Help Concepts with RAG

## Prerequisites


In [1]:
%pip install -qU langchain tiktoken unstructured openai tqdm chromadb  # NOTE: Chroma needs Python 3.10.x

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


# OpenAI API Key

To use the OpenAI API, you need to obtain an API key from the [OpenAI website](https://platform.openai.com/account/api-keys). The API key is a unique identifier that allows you to access the OpenAI API and make requests to it. By setting the 'OPENAI_API_KEY' environment variable, you can securely provide your API key to the code without hardcoding it into the script.

In [2]:
import os
import getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass("OPENAI_API_KEY")

# Embeddings setup

This code initializes an instance of the [OpenAIEmbeddings](https://python.langchain.com/en/latest/reference/modules/embeddings.html?highlight=embeddings#langchain.embeddings.OpenAIEmbeddings) class and assigns it to the variable embeddings. An [embedding](https://platform.openai.com/docs/guides/embeddings) is a way to represent words or phrases as numeric vectors, which can be used as input to machine learning models.  The `OpenAIEmbeddings` class provides access to pre-trained word embeddings from OpenAI, which were trained on a large corpus of text data using advanced deep learning techniques.

Once you have initialized an instance of the `OpenAIEmbeddings` class, you can use it to obtain the embedding vector for any given chunk of text. This can be useful for a variety of [natural language processing](https://en.wikipedia.org/wiki/Natural_language_processing) (NLP) tasks, such as sentiment analysis, language translation, and text classification. In this notebook we use it to do [semantic search](https://en.wikipedia.org/wiki/Semantic_search) with a [vector database](https://www.youtube.com/watch?v=klTvEwg3oJ4&ab_channel=Fireship) in this case.

## Model

| Name | Tokenizer | Max input tokens | Output dimensions |
| :--- | :--- | ---: | ---: |
| text-embedding-ada-002 | cl100k_base | 8191 | 1536 |




In [3]:
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

# Splitter setup

The [RecursiveCharacterTextSplitter](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) is a text splitting tool that takes in a large text document as input and splits it into smaller chunks for downstream processing. Here's what each parameter in the splitter setup means:

- `chunk_size`: This parameter specifies the size of each chunk of text that the splitter will output. In this case, the splitter is set up to output chunks of 500 characters each.

- `chunk_overlap`: This parameter specifies the number of characters of overlap that each chunk will have with the next chunk. In this case, the splitter is set up to have an overlap of 20 characters between adjacent chunks.

- `length_function`: This parameter specifies the function that the splitter will use to calculate the length of the input text. In this case, the `len` function is used, which returns the number of characters in the text.

Together, these parameters determine how the input text will be split into smaller chunks. The splitter will output chunks of 500 characters each, with an overlap of 20 characters between adjacent chunks, until the entire input text has been processed. This setup is designed to balance the need for small enough chunks for efficient processing, with enough overlap between chunks to minimize the risk of losing contextual information at the boundaries between chunks.

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 750,
    chunk_overlap  = 50,
    length_function = len,
    is_separator_regex = False,
    separators = ['.','!','?','\n','\n\n'],
)

# Load (and split) documents

In [12]:
from langchain.document_loaders import TextLoader, DirectoryLoader
from tqdm import tqdm

text_loader_kwargs={'autodetect_encoding': True}
loader = DirectoryLoader('txt/', loader_cls=TextLoader, glob='**/*.txt', show_progress=True, loader_kwargs=text_loader_kwargs)

docs = loader.load_and_split(text_splitter=text_splitter)
len(docs)


100%|██████████| 6/6 [00:00<00:00, 661.39it/s]


3914

In [15]:
docs[555]

Document(page_content='. It\'s so fulfilling now when a new artist plays me their single and says, "This is going to be the next HAYA." After Andrew Poll built his pregnancy prediction machine, after he identified hundreds of thousands of female shoppers who were probably pregnant, after someone pointed out that some, in fact, most, of those women might be a little upset if they received an advertisement making it obvious Target knew their reproductive status. Everyone decided to take a step back and consider their options. The marketing department thought it might be wise to conduct a few small experiments before rolling out a national campaign', metadata={'source': 'txt/The Power Of Habit - Charles Duhigg.txt'})

In [16]:
def cleanup_chunk(chunk: str) -> str:
    chunk.page_content = chunk.page_content.removeprefix(". ")

    # push start to end dots
    if not chunk.page_content.endswith("."):
        chunk.page_content += "."
        
    # damn newline bastards
    chunk.page_content = chunk.page_content.replace('\n', ' ')
    
    # remove the emptiness from it all
    while chunk.page_content.find('  ') != -1:
        chunk.page_content = chunk.page_content.replace('  ', ' ')

    return chunk

In [18]:
docs = [cleanup_chunk(chunk) for chunk in tqdm(docs)]
docs[555]

100%|██████████| 3914/3914 [00:00<00:00, 220423.83it/s]


Document(page_content='It\'s so fulfilling now when a new artist plays me their single and says, "This is going to be the next HAYA." After Andrew Poll built his pregnancy prediction machine, after he identified hundreds of thousands of female shoppers who were probably pregnant, after someone pointed out that some, in fact, most, of those women might be a little upset if they received an advertisement making it obvious Target knew their reproductive status. Everyone decided to take a step back and consider their options. The marketing department thought it might be wise to conduct a few small experiments before rolling out a national campaign.', metadata={'source': 'txt/The Power Of Habit - Charles Duhigg.txt'})

In [19]:
chunk_testset = docs[88:93]

for chunk in chunk_testset:
    print(chunk.page_content)
    print('---')

It tastes so good. After all, one dose of processed meat, salty fries, and sugary soda poses a relatively small health risk, right? It's not like you do it all the time. But habits emerge without our permission. Studies indicate that families usually don't intend to eat fast food on a regular basis. What happens is that a once-a-month pattern slowly becomes once a week, and then twice a week as the cues and rewards create a habit until the kids are consuming an unhealthy amount of hamburgers and fries.
---
When researchers at the University of North Texas and Yale tried to understand why families gradually increased their fast food consumption, they found a series of cues and rewards that most customers never knew were influencing their behaviors. They discovered the habit loop. Every McDonald's, for instance, looks the same the company deliberately tries to standardize stores architecture and what employees say to customers, so everything is a consistent cue to trigger eating routines

In [20]:
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
print(encoding)

<Encoding 'cl100k_base'>


In [21]:
chunk = chunk_testset[0].page_content
num_tokens = encoding.encode(chunk)
print(num_tokens)

[2181, 36263, 779, 1695, 13, 4740, 682, 11, 832, 19660, 315, 15590, 13339, 11, 74975, 53031, 11, 323, 31705, 661, 39962, 34103, 264, 12309, 2678, 2890, 5326, 11, 1314, 30, 1102, 596, 539, 1093, 499, 656, 433, 682, 279, 892, 13, 2030, 26870, 34044, 2085, 1057, 8041, 13, 19241, 13519, 430, 8689, 6118, 1541, 956, 30730, 311, 8343, 5043, 3691, 389, 264, 5912, 8197, 13, 3639, 8741, 374, 430, 264, 3131, 7561, 23086, 5497, 14297, 9221, 3131, 264, 2046, 11, 323, 1243, 11157, 264, 2046, 439, 279, 57016, 323, 21845, 1893, 264, 14464, 3156, 279, 6980, 527, 35208, 459, 53808, 3392, 315, 57947, 388, 323, 53031, 13]


In [22]:
print(f"{len(num_tokens)} tokens, {len(chunk)} characters.")

107 tokens, 507 characters.


In [23]:
total_chars = 0
total_tokens = 0

for chunk in chunk_testset:
    num_chars = len(chunk.page_content)
    total_chars += num_chars
    num_tokens = len(encoding.encode(chunk.page_content))
    total_tokens += num_tokens
    print(f"{num_tokens} tokens, {num_chars} characters.") 

print('---')
print(f"Total {total_tokens} tokens, {total_chars} characters.")

107 tokens, 507 characters.
83 tokens, 489 characters.
142 tokens, 727 characters.
154 tokens, 735 characters.
155 tokens, 734 characters.
---
Total 641 tokens, 3192 characters.


# Vector store setup

ChromaDB

In [24]:
import uuid
from langchain.vectorstores import Chroma 
from langchain.embeddings.openai import OpenAIEmbeddings

ids = [str(uuid.uuid4()) for i in range(1, len(docs) + 1)] # random UUIDs





# TODO: Load from disk





#db = Chroma.from_documents(docs, embeddings, persist_directory="./chroma_db", collection_name='pdev-books', ids=ids)

query = "What is procrastination?"
docs = db.similarity_search(query)


AttributeError: 'Chroma' object has no attribute 'similarity_earch'

In [39]:
query = "What is deep work?"
answer = db.similarity_search(query)
for a in answer:
    print(a)
    print('---')
    

page_content="In this book, however, I'm interested in his commitment to the following skill, which almost certainly played a key role in his accomplishments. Deep Work. activities performed in a state of distraction-free concentration that push your cognitive capabilities to their limit. These efforts create new value, improve your skill, and are hard to replicate. Deep work is necessary to wring every last drop of value out of your current intellectual capacity. We now know from decades of research, in both psychology and neuroscience, that the state of mental strain that accompanies deep work is also necessary to improve your abilities." metadata={'source': 'txt/Deep Work - Cal Newport.txt'}
---
page_content="The Deep Work Hypothesis. The ability to perform deep work is becoming increasingly rare at exactly the same time it is becoming increasingly valuable in our economy. As a consequence, the few who cultivate this skill and then make it the core of their working life will thrive.

In [42]:
db.similarity_search_with_score('Am I too lazy to take responsibility for a project?') # score is cosine distance - lower the better

[(Document(page_content="So I have invented another myth for myself, that I'm irresponsible. I'm actively irresponsible. I tell everyone I don't do anything. If anyone asks me to be on a committee for admissions, no, I tell them, I'm irresponsible. Feynman was adamant in avoiding administrative duties, because he knew they would only decrease his ability to do the one thing that mattered most in his professional life – to do real good physics work. Feynman, we can assume, was probably bad at responding to emails, and would likely switch universities if you would try to move him into an open office or demand that he tweet. about what matters provides clarity about what does not.", metadata={'source': 'txt/Deep Work - Cal Newport.txt'}),
  0.37731269001960754),
 (Document(page_content="People would have worked with me, coached me. Or, the system let me get away with more and more. I really liked it less and less. He got mad at the system. Hi there, John. This was your life. Ever think of

# Test the vector store


In [38]:
query_result = db.similarity_search("What is procrastination?", k=4)

for chunk in query_result:
    print(f"{chunk.metadata}: {chunk.page_content[:]}")

{'source': 'txt/Atomic Habits - James Clear.txt'}: Most of us are experts at avoiding criticism. It doesn't feel good to fail or to be judged publicly, so we tend to avoid situations where that might happen. And that's the biggest reason why you slip into motion rather than taking action. You want to delay failure. It's easy to be in motion and convince yourself that you're still making progress. You think, "I've got conversations going with four potential clients right now. This is good. We're moving in the right direction." Or, "I brainstormed some ideas for that book I want to write. This is coming together." Motion makes you feel like you're getting things done. But really, you're just preparing to get something done.
{'source': 'txt/The Power Of Habit - Charles Duhigg.txt'}: When people marshal their will power to quit procrastinating, they often succeed, at first. Over time, though, their will power muscle starts to fade. The book they're supposed to be studying or the memo they'

# CONTINUE

# Chat memory

This code imports the [ConversationBufferWindowMemory](https://python.langchain.com/en/latest/modules/memory/types/buffer_window.html) class from the `langchain.memory` module and creates an instance of it called `memory`. This class represents a memory buffer that stores conversations in a windowed fashion, meaning that the buffer only retains a certain number of recent conversations.

The constructor of the `ConversationBufferWindowMemory` class takes two arguments: `memory_key` and `return_messages`. The `memory_key` parameter specifies a unique identifier for the memory buffer, and the `return_messages` parameter indicates whether or not to return the stored messages along with their metadata when accessing the memory buffer.

In this code, the `memory_key` is set to "chat_history", which is being used to store the chat conversations. The return_messages parameter is set to `True`, which indicates that the stored messages will be returned along with their metadata when accessing the memory buffer.

In [72]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(memory_key="chat_history", return_messages=False)

# Chain setup

This code imports several classes and functions from various modules in the langchain package and creates an instance of the [ConversationalRetrievalChain](https://python.langchain.com/en/latest/modules/chains/index_examples/chat_vector_db.html?highlight=ConversationalRetrievalChain) class called `qa`.

The `ConversationalRetrievalChain` class is a high-level class that provides an interface for building a conversational agent that can perform retrieval-based question answering. In this code, the `qa` instance is initialized using the `from_llm()` method, which initializes the agent using an LLM model, a retriever and the memory buffer.

### LLM
The `OpenAI` class from the `langchain.llms` module represents an instance of the OpenAI language model. In this code, an instance of the OpenAI class is created of the model "[gpt-3.5-turbo](https://platform.openai.com/docs/models)".

### Vector Store
The `faiss_index.as_retriever()` method returns a retriever instance that wraps the FAISS index created earlier. This retriever is used to retrieve candidate answers to questions asked of the conversational agent.

### Chat History Memory
The `memory` variable is a memory buffer that was created earlier using the `ConversationBufferWindowMemory` class. This memory buffer is used to store and retrieve past conversations for use in future interactions.

The `verbose=True` parameter indicates that verbose output should be produced when running the conversational agent.

In [78]:
from langchain.chains import ConversationalRetrievalChain, LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain.chains import RetrievalQA, RetrievalQAWithSourcesChain

qa = ConversationalRetrievalChain.from_llm(
    OpenAI(model_name="gpt-3.5-turbo", temperature=0.7, max_tokens=1000),
    db.as_retriever(k=4),
     verbose=True) #memory=memory


# Ask away

This code snippet involves a conversational agent that performs question-answering tasks. The user inputs a question, which is passed to the agent as a dictionary with a "question" key.

The agent then creates an embedding of the question to query the FAISS index and retrieve relevant text chunks based on an internal ranking criterion.

Next, the agent makes two calls to the LLM model ("gpt-3.5-turbo").

- The first call uses the retrieved text chunks, chat history, and the current user question to prompt the LLM to come up with a 'better' question for the entire context.
- The second call uses the enhanced question to retrieve the actual answer to the original user question.

The resulting answer is stored in the 'chat_result variable', which contains metadata and content related to the answer. The actual answer can be accessed using the "answer" key of the 'chat_result' dictionary.

In [79]:
query = "what are possible ways to counter procrastination?"
#query = "Wie is Marcus Quintillianus?"

chat_result = qa({"question": query})
chat_result

ValueError: Missing some input keys: {'chat_history'}