# WorkFlow:
1. download llama2 model (the bloke on hf)
2. load the model using ctransformers (we can't use transformers to load llama2 as we don't have gpu)
   
   python bindings for the transformers in c/c++
3. sentence transformer -> embedding transformer to create the embeddings
4. vector store -> to store the embeddings in lower dimensions

   1. chromaDB
   2. faissCPU
   3. qdrant

Data -> Preprocessing (langchain) -> embedding using -> sentence transformer -> vector db/store (faissCPU) -> faissCPU


user -> prompt (screen) -> faissCPU

in vector store -> cosine similarity

no latency

metadata

###################################################################################################

# Ingest.py

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.embeddings import HuggingFaceEmbeddings # if u get error, replace it with sentence transformer
from langchain.vectorstores import FAISS

In [4]:
DATA_PATH = 'data/'
DB_FAISS_PATH = 'vectorstores/db_failures'

**Intuition for chuck_size**

GPT-3.5-turbo supports a context window of 4096 tokens — that means that input tokens + generated ( / completion) output tokens, cannot total more than 4096 without hitting an error.

So we 100% need to keep below this. If we assume a very safe margin of ~2000 tokens for the input prompt into gpt-3.5-turbo, leaving ~2000 tokens for conversation history and completion.

With this ~2000 token limit we may want to include five snippets of relevant information, meaning each snippet can be no more than 400 token long.

 loading PDF documents, extracting their text content, and splitting that content into smaller text chunks


In [None]:
# create vector DB
def create_vector_db():
    
    # since our input documents are in pdf format, we use PyPDFLoader, and give extensions as .pdf in glob
    # DATA_PATH specifies the path from where we will input our data
    loader = DirectoryLoader(DATA_PATH, glob='*.pdf', loader_cls=PyPDFLoader) 
    documents = loader.load() # loaded documents are stored in the documents variable, as a list of objects
    
    # chunk_size -> specifies the number of tokens per list
    # chunk_overlap -> how much overlap is taking place between start and end (sliding window concept)
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    texts = text_splitter.split_documents(documents)
    
    embeddings = HuggingFaceEmbeddings(model_name = 'sentence-transformers/all-MiniLM-L6-v2',
                                       model_kwargs = {'device':'cpu'})
    
    db = FAISS.from_documents(texts, embeddings)
    db.save_local(DB_FAISS_PATH)

In [5]:
if __name__ == '__main__':
    create_vector_db()

###################################################################################################