### **Introduction to Retrievers (Concise)**

A **retriever** is an interface that returns documents based on an unstructured query, accepting a string as input and providing a list of documents as output. While vector stores can act as retrievers using the `as_retriever()` method, retrievers can also include other retrieval methods.

**Search Types:** 
   - Top k
   - Similarity
   - Maximum Marginal Relevance (MMR)
   - Similarity Score Threshold


1. **Specify Top k**
    ```python
    retriever = db_connection.as_retriever(search_kwargs={"k": 3})
    ```
2. **Specify Top k and Search Type**
    ```python
    retriever = db_connection.as_retriever(search_type="similarity", search_kwargs={"k": 3})
    ```
3. **Maximum Marginal Relevance Retrieval**
    ```python
    retriever = db_connection.as_retriever(search_type="mmr")
    ```
4. **Similarity Score Threshold Retrieval**  
    Apply a cutoff or a threshold such that any document which is below the cutoff is not returned.
    ```python
    retriever = db_connection.as_retriever(
        search_type="similarity_score_threshold", 
        search_kwargs={"k": 3, "score_threshold": 0.5}
    )
    ```

### **Step 1: Initialize an Embedding Model**

In [2]:
from langchain_huggingface import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

### **Step 2: Setting a Connection with the ChromaDB**

In [3]:
from langchain_chroma import Chroma

db = Chroma(collection_name="vector_database",
            embedding_function=embedding_model,
            persist_directory='./chroma_db')

In [5]:
# We can check the already existing values
print(len(db.get()["ids"]))

1004


### **Step 3: Setup Retriever**

In [6]:
retriever = db.as_retriever(search_type="similarity", search_kwargs={'k': 3})

print(type(retriever))

<class 'langchain_core.vectorstores.base.VectorStoreRetriever'>


In [7]:
query = "What is their on Julie vs Rachels List?"

result = retriever.invoke(query)

In [16]:
result

[Document(metadata={'source': 'example_data\\subtitles\\Friends_2x08.srt'}, page_content="145\n00:08:42,594 --> 00:08:44,425\nNo, Amish boy.\n\n146\n00:08:46,398 --> 00:08:50,061\nLet's start with the cons\nbecause they're more fun.\n\n147\n00:08:50,335 --> 00:08:51,165\nRachel first.\n\n148\n00:08:52,171 --> 00:08:53,331\nI don't know.\n\n149\n00:08:53,839 --> 00:08:55,067\nI mean....\n\n150\n00:08:55,274 --> 00:08:59,802\nAll right, I guess you can say\nshe's a little spoiled sometimes.\n\n151\n00:09:00,245 --> 00:09:01,940\nYou could say that.\n\n152\n00:09:03,816 --> 00:09:07,775\nI guess, sometimes\nshe's a little ditzy, you know?\n\n153\n00:09:08,153 --> 00:09:11,088\nAnd I've seen her be a little\ntoo into her looks.\n\n154\n00:09:11,757 --> 00:09:13,816\nAnd Julie and I have\na lot in common..."),
 Document(metadata={'source': 'example_data\\subtitles\\Friends_2x08.srt'}, page_content="149\n00:08:53,839 --> 00:08:55,067\nI mean....\n\n150\n00:08:55,274 --> 00:08:59,802\nAll rig