####  Document Retriever

A document retriever is Abacus own vector database. It can be used to:
1. Create embeddings of documents
2. Retrieve document chunks that are semantically close to a phrase that the user passes.

#### How to create RAG on the fly with a local document

In [None]:
# Here, we upload training file from the current location of the notebook
# You can add files to Jupyter Notebook by drag and drop
from abacusai.client import BlobInput
import abacusai
client = abacusai.ApiClient('YOUR_API_KEY')
document = BlobInput.from_local_file("YOUR_DOCUMENT.pdf/word")

In [None]:
# Returns chunks of documents that are relevant to the query and can be used to feed an LLM
# Example for blob in memory of notebook

relevant_snippets = client.get_relevant_snippets(
        blobs={"document": document.contents},
        query="What are the key terms")

In [None]:
# Returns chunks of documents that are relevant to the query and can be used to feed an LLM
# Example for document in the docstore

relevant_snippets = client.get_relevant_snippets(
        doc_ids = ['YOUR_DOC_ID_1','YOUR_DOC_ID_2'],
        query="What are the key terms")

relevant_snippets

#### Using A document Retriever as a standalone deployment
You can also use a documen retriever, even if a ChatLLM model is not trained!

In [None]:
# First we connect our docstore to our project

client.add_feature_group_to_project(
    feature_group_id='YOUR_FEATURE_GROUP_ID_WITH_DOCUMENTS'
    project_id='YOUR_PROJECT_ID',
    feature_group_type='DOCUMENTS'  # Optional, defaults to 'CUSTOM_TABLE'. But important to set 'DOCUMENTS' as it will enable Document retriver to work properly with it
)

In [None]:
ifm = client.infer_feature_mappings(project_id='PROJECT_ID',feature_group_id='FEATURE_GROUP_ID')

# ifm = client.infer_feature_mappings(project_id='15ed76a6a8',feature_group_id='98a8d9cce')
ifm

InferredFeatureMappings(error='',
  feature_mappings=[FeatureMapping(feature_mapping='DOCUMENT_ID',
  feature_name='doc_id'), FeatureMapping(feature_mapping='DOCUMENT',
  feature_name='file_description')])

In [None]:
# This blocs of code might be useful to fix featuregroup for docstore usage by document retrievers

# client.set_feature_group_type(project_id='YOUR_PROJECT_ID', feature_group_id='98a8d9cce', feature_group_type='DOCUMENTS')
# client.set_feature_mapping(project_id='YOUR_PROJECT_ID',feature_group_id = 'YOUR_FEATURE_GROUP_ID',feature_name='doc_id',feature_mapping='DOCUMENT_ID')
# client.set_feature_mapping(project_id='YOUR_PROJECT_ID',feature_group_id = 'YOUR_FEATURE_GROUP_ID',feature_name='page_infos',feature_mapping='DOCUMENT')


In [None]:
# Creating a document retriever

document_retriever = client.create_document_retriever(
    project_id=project_id,
    name='NAME_OF_YOUR_DOCUMENT_RETRIEVER',
    feature_group_id='YOUR_FEATURE_GROUP_ID'
)


In [None]:
# Accessing document retriever that is already crreated

# dr = client.describe_document_retriever_by_name('DOCUMENT_RETRIEVER_NAME')
# dr

In [None]:
r = client.describe_document_retriever(document_retriever.id)
# Filters allow you to filter the documents that the doc retriever can use on the fly, using some columns of the training feature group that was used as input to the doc retriever.
# Filters are also available when using .get_chat_reponse

client.get_matching_documents(document_retriever_id='DOCUMENT_RETRIEVER_ID', 
                              query='WHATEVER_YOU_NEED_TO_ASK',limit= 10,
                              filters={"state": ["MICHIGAN","NATIONAL"]})

[]

In [None]:
# Examples of document retriever usage

res = document_retriever.get_matching_documents("Agreement of the Parties")
len(res)

10

In [None]:
# Example of getting no results

res2 = document_retriever.get_matching_documents("planting potatoes on a mars", required_phrases=['mars'])
res2

[]