<h2>Agentic RAG App Doc-store (Vector-store) and Embeddings</h2>

Will get a doc/vector store established, and play around with it to see how it works with the groq LLM. Any missing information will then be filled by implementing with tavily (internet-search). Will then establish agents with specific roles with CrewAI. Write functions for this (week 1), and then establish the streamlit front-end (week 2).

<h3>Doc-store (Vector-store) and Embeddings</h3>

1. Your documents live locally in your project

2. You turn those documents into embeddings (numbers)

3. You store those numbers locally in FAISS

4. A user asks a question. The question is directed to the 'researcher' agent of CrewAI.

5. The researcher finds the most relevant document chunks using FAISS, or use 'tavily' to do an internet search for the information.

6. You send those chunks to Groq. The other agents use grow to fulfill their role.

7. Groq writes the answer

Groq never sees embeddings.
FAISS never talks to Groq.
The agents of CrewAI are agents, they use groq in different ways to fulfill their role e.g find an answer, write and answer, critique an answer.


<h4>Creating the doc store (done using FAISS)

We need to create a vector-store that can take in docuements and store them. We need to be able to save the store so we don't have to keep feeding it the same doceuments between uses. We also wnat to be able to add new docuemnts as we find them.

In [4]:
import langchain_core
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore

from langchain_core.documents import Document

from langchain_community.document_loaders import (
    TextLoader,
    PyPDFLoader,
    UnstructuredMarkdownLoader,
    DirectoryLoader,
)

from sentence_transformers import SentenceTransformer

In [5]:
import os
from typing import List, Union

# wrapper around sentence_transformers to match LangChain embeddings interface
class SentenceTransformerEmbeddingsWrapper:
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)

    def embed_documents(self, texts: List[str]):
        # returns list[list[float]]
        emb = self.model.encode(texts, show_progress_bar=False)
        return emb.tolist()

    def embed_query(self, text: str):
        emb = self.model.encode([text], show_progress_bar=False)
        return emb[0].tolist()

def create_faiss_store_from_documents(documents: List[Document], index_dir: str = "faiss_index", embedding_model: str = "all-MiniLM-L6-v2"):
    """
    Build a FAISS vectorstore from a list of langchain Document objects and save it to disk.
    Returns the in-memory FAISS store.
    """
    os.makedirs(index_dir, exist_ok=True)
    embeddings = SentenceTransformerEmbeddingsWrapper(embedding_model)
    store = FAISS.from_documents(documents, embeddings)
    store.save_local(index_dir)
    return store

def load_faiss_store(index_dir: str = "faiss_index", embedding_model: str = "all-MiniLM-L6-v2"):
    """
    Load a previously saved FAISS vectorstore from disk.
    """
    embeddings = SentenceTransformerEmbeddingsWrapper(embedding_model)
    return FAISS.load_local(index_dir, embeddings)

def add_documents_and_save(store: FAISS, new_documents: List[Document], index_dir: str = "faiss_index"):
    """
    Add documents to an existing FAISS store and persist to disk.
    """
    store.add_documents(new_documents)
    store.save_local(index_dir)
    return store

# Example: load documents from a directory and create/save the index
# loader = DirectoryLoader("path/to/docs", loader_cls=TextLoader)
# docs = loader.load()
# store = create_faiss_store_from_documents(docs, index_dir="faiss_index")
# later you can reload with:
# store = load_faiss_store("faiss_index")