# 01. Indexes
In this notebook we will cover index, their creation, usage and maintainance.

In [2]:
import os
import openai
from dotenv import load_dotenv, find_dotenv
import langchain as lc

import warnings
warnings.filterwarnings('ignore')

load_dotenv(find_dotenv())
openai.api_key = os.environ["OPENAI_API_KEY"]

#### 1.1 Loaders
To use our own dataset with LLM, we have to first load them into a vector database. 

In [3]:
# Let assume we have some FAQ documents. and we want to use them
# when someone query to AI.
loader = lc.document_loaders.DirectoryLoader( "./FAQ",
                                             glob="**/*.txt",
                                             loader_cls=lc.document_loaders.TextLoader,
                                             show_progress=True)
docs = loader.load() # It should load 3 files.

100%|██████████| 3/3 [00:00<00:00, 292.73it/s]


#### 1.2 Text Splitter
Now we have to create the chunk of text. So that we have enough data.

In [7]:
text_splitter = lc.text_splitter.RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100)

documents = text_splitter.split_documents(docs)
documents

[Document(page_content='Q: What are the hours of operation for your restaurant?\nA: Our restaurant is open from 11 a.m. to 10 p.m. from Monday to Saturday. On Sundays, we open at 12 p.m. and close at 9 p.m.\n\nQ: What type of cuisine does your restaurant serve?\nA: Our restaurant specializes in contemporary American cuisine with an emphasis on local and sustainable ingredients.', metadata={'source': 'FAQ\\General.txt'}),
 Document(page_content='Q: Do you offer vegetarian or vegan options?\nA: Yes, we have a range of dishes to cater to vegetarians and vegans. Please let our staff know about any dietary restrictions you have when you order.', metadata={'source': 'FAQ\\General.txt'}),
 Document(page_content="Q: What are the ingredients in your gluten-free options?\nA: Our gluten-free dishes are prepared using a variety of ingredients that don't contain gluten. Some options include our Quinoa Salad and our Grilled Chicken with Roasted Vegetables.", metadata={'source': 'FAQ\\Health.txt'}),


#### 1.3 Embeddings
Now time to convert our text into OpenAI embedding or any LLM embedding.

In [8]:
embeddings = lc.embeddings.OpenAIEmbeddings()

#### 1.4 Loading Text Embedding (Vectors) into VectorDB using (FAISS)

In [12]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.7.4-cp39-cp39-win_amd64.whl (10.8 MB)
     ---------------------------------------- 10.8/10.8 MB 2.6 MB/s eta 0:00:00
Installing collected packages: faiss-cpu
Successfully installed faiss-cpu-1.7.4


In [23]:
import pickle

vectorstore = lc.vectorstores.FAISS.from_documents(documents, embeddings)

with open("vectorstore.pkl", "wb") as f:
    pickle.dump(vectorstore, f)

#### 1.5 Loading the Database

In [25]:
# Although it already loaded.

with open("vectorstore.pkl", "rb") as f:
    vectorstore = pickle.load(f)

#### 1.6 Prompts
We can define the starting point of conversation as well.

In [26]:
prompt_template = """You are a helpful assistant for our restaurant.

{context}

Question: {question}
Answer here:"""
PROMPT = lc.PromptTemplate(
    template=prompt_template, 
    input_variables=["context", "question"]
)

#### 1.6 Chains
With chains we can manipulate the I/O of the LLM

In [27]:
llm = lc.OpenAI()
qa = lc.chains.RetrievalQA.from_chain_type(llm=llm,
                                           chain_type='stuff',
                                           retriever=vectorstore.as_retriever(),
                                           chain_type_kwargs={"prompt":PROMPT})
query = "When does the restaurant open?"
qa.run(query)

' Our restaurant is open from 11 a.m. to 10 p.m. from Monday to Saturday. On Sundays, we open at 12 p.m. and close at 9 p.m.'

#### 1.7 Memory
In the example just shown, each request stands alone. A great strength of an LLM, however, is that it can take the entire chat history into account when responding. For this, however, a chat history must be built up from the different questions and answers. With different memory classes this is very easy in Langchain.

In [28]:
memory = lc.memory.ConversationBufferMemory(
    memory_key='chat_history',
    return_messages=True,
    output_key="answer")

#### 1.8. Use Memory in Chains
The memory class can now easily be used in a chain. This is recognizable, for example, by the fact that when one speaks of "it", the bot understands the rabbit in this context.

In [29]:
# We cannot use memory with all type of chains. So we are using
# ConversationalRetrievalChain here for memory

qa = lc.chains.ConversationalRetrievalChain.from_llm(
    llm=llm,
    memory=memory,
    retriever=vectorstore.as_retriever(),
    combine_docs_chain_kwargs={"prompt": PROMPT},
)


query = "Do you offer vegan food?"
qa({"question": query})
qa({"question": "How much does it cost?"})

{'question': 'How much does it cost?',
 'chat_history': [HumanMessage(content='Do you offer vegan food?', additional_kwargs={}, example=False),
  AIMessage(content=' Yes, we have a range of vegan-friendly dishes, including salads, soups, and entrees. Please let our staff know about any dietary restrictions you have when you order.', additional_kwargs={}, example=False),
  HumanMessage(content='How much does it cost?', additional_kwargs={}, example=False),
  AIMessage(content=' The price of our vegan-friendly dishes varies depending on the ingredients used. Please ask our staff for the exact prices when you order.', additional_kwargs={}, example=False)],
 'answer': ' The price of our vegan-friendly dishes varies depending on the ingredients used. Please ask our staff for the exact prices when you order.'}