# Tutorial notebook

This notebook shares base snippets to use Abacus.AI API with chat and AI agent functionality
API comes in handy to ease up some recurring tasks

#### Loading Documents
When documents are uploaded into the platform, they are uploaded as a special Class type "BlobInput".

Testing BlobInput Locally

In [4]:
from abacusai import ApiClient
client = ApiClient()

# Your project id consists of numbers and letters id, 
# Can be found as a part of the browser URL or the project's main page. 
# Needed for some API calls. 
# For this example, this should be the AI Agent Project
project_id = 'your_project_id' 
try:
    client.describe_project(project_id)
except:
    raise Exception('Provide your current project ID')



In [None]:
# Here, we upload training file from the current location of the notebook
# You can add files to Jupyter Notebook by drag and drop
from abacusai.client import BlobInput
document = BlobInput.from_local_file("test.docx")

In [None]:
#document.contents is a bytes string of the document
extracted_doc_data = client.extract_document_data(document.contents)

print(extracted_doc_data.pages[0]) # Text from page 0
print(extracted_doc_data.embedded_text) # All text from the document

In [None]:
len(extracted_doc_data.pages)

In [None]:
# Here, we will create the upload for the dataset from the notebook as an Abacus dataset that we will be able to use later.
# Docstore is special table format for document storage

upload = client.create_dataset_from_upload(
    table_name='my_documents_'+client.describe_user().email.split('@')[0], #name should be unique inside the organisation
    file_format='DOCX',
    is_documentset=True
)


In [68]:
with open("test.docx", "rb") as file:

    file_uploaded = upload.upload_file(file)
    file_uploaded.wait_for_import()

file_uploaded

Dataset(dataset_id='93c789714',
  source_type='UPLOAD',
  data_source='re://datasets/93c789714',
  created_at='2024-07-18T16:23:47+00:00',
  ephemeral=False,
  feature_group_table_name='my_documents_bogdan',
  incremental=False,
  is_documentset=True,
  extract_bounding_boxes=False,
  merge_file_schemas=True,
  reference_only_documentset=False,
  latest_dataset_version=DatasetVersion(dataset_version='18b974e6a',
  status='CONVERTING',
  dataset_id='93c789714',
  size=136504,
  created_at='2024-07-18T16:23:47+00:00',
  merge_file_schemas=True),
  document_processing_config=DocumentProcessingConfig(extract_bounding_boxes=False, ocr_mode='DEFAULT', use_full_ocr=None, remove_header_footer=False, remove_watermarks=True, convert_to_markdown=False))

In [6]:
# Here we verify our upload and check the structure of file we created in docstore

try:
    feature_group = client.describe_feature_group_by_table_name('my_documents_'+client.describe_user().email.split('@')[0])
    print(' feature group found')
except:
    feature_group = file_uploaded.describe_feature_group()
    if not feature_group.list_versions():
        print("creating first version")
        feature_group.create_version()
    feature_group.wait_for_materialization()

 feature group found


In [7]:
df = feature_group.load_as_pandas()
df

Unnamed: 0,doc_id,page_infos,file_path,file_size_bytes,file_checksum,file_description,mime_type,page_count,token_count
0,18b974e6a-000000000-5c536d98323d4a361309f5516c...,"{'first_page': 0, 'last_page': 57}",uploaded_data.docx,136504,SHA512_256:5f4100c619746dc53c62b706540e759ac86...,Microsoft Word 2007+,application/vnd.openxmlformats-officedocument....,58,17598


In [8]:
df['doc_id'][0]

'18b974e6a-000000000-5c536d98323d4a361309f5516c6282e003a385e11822616975ed720f8d473ba4'

# Extracting the documents text

In [72]:

data_from_docstore = client.get_docstore_document_data(df['doc_id'][0]) # Get data from a document stored in the docstore

Getting document text from uploaded data

In [74]:
# To access docstore later, or when it was created outside of this notebook, we may use the name or id of it by functions describe_feature_group_by_table_name or describe_feature_group, respectively

df = client.describe_feature_group_by_table_name(feature_group.name).load_as_pandas_documents(doc_id_column = 'doc_id',document_column = 'page_infos')
df['page_infos'][0].keys()
# dict_keys(['pages', 'tokens', 'metadata', 'extracted_text'])

#pages: This is the embedded text from the document on a per page level
#extracted_text: This is the OCR extracted text from the document

dict_keys(['metadata', 'tokens', 'pages', 'doc_id', 'embedded_text', 'extracted_text'])

## Creating RAG systems on the fly

How to create RAG system "on the fly" with an uploaded document

In [None]:
# Returns chunks of documents that are relevant to the query and can be used to feed an LLM
# Example for blob in memory of notebook

relevant_snippets = client.get_relevant_snippets(
        blobs={"document": document.contents},
        query="What are the key terms")

In [None]:
# Returns chunks of documents that are relevant to the query and can be used to feed an LLM
# Example for document in the docstore

relevant_snippets = client.get_relevant_snippets(
        doc_ids = [df['doc_id'][0]],
        # blobs={"document": document.contents},
        query="What are the key terms")

relevant_snippets

### Using A document Retriever as a standalone deployment
You can also use a documen retriever, even if a ChatLLM model is not trained!

In [79]:
# First we connect our docstore to our project

client.add_feature_group_to_project(
    feature_group_id=feature_group.id,
    project_id=project_id,
    feature_group_type='DOCUMENTS'  # Optional, defaults to 'CUSTOM_TABLE'. But important to set 'DOCUMENTS' as it will enable Document retriver to work properly with it
)

In [80]:
feature_group.id

'1246377b2c'

In [81]:
ifm = client.infer_feature_mappings(project_id=project_id,feature_group_id=feature_group.id)

# ifm = client.infer_feature_mappings(project_id='15ed76a6a8',feature_group_id='98a8d9cce')
ifm

InferredFeatureMappings(error='',
  feature_mappings=[FeatureMapping(feature_mapping='DOCUMENT_ID',
  feature_name='doc_id'), FeatureMapping(feature_mapping='DOCUMENT',
  feature_name='file_description')])

In [82]:
# This blocs of code might be useful to fix featuregroup for docstore usage by document retrievers

# client.set_feature_group_type(project_id='15ed76a6a8', feature_group_id='98a8d9cce', feature_group_type='DOCUMENTS')
# client.set_feature_mapping(project_id,feature_group.id,feature_name='doc_id',feature_mapping='DOCUMENT_ID')
# client.set_feature_mapping(project_id,feature_group.id,feature_name='page_infos',feature_mapping='DOCUMENT')


In [10]:
# Creating a document retriever

document_retriever = client.create_document_retriever(
    project_id=project_id,
    name='demo_document_retriever__'+client.describe_user().email.split('@')[0],
    feature_group_id=feature_group.id
)


In [None]:
# Accessing document retriever that is already crreated

# dr = client.describe_document_retriever_by_name('demo_document_retriever_'+client.describe_user().email.split('@')[0])
# dr

In [13]:
r = client.describe_document_retriever(document_retriever.id)
# Filters allow you to filter the documents that the doc retriever can use on the fly, using some columns of the training feature group that was used as input to the doc retriever.
# Filters are also available when using .get_chat_reponse

r.get_matching_documents(query = "WHATEVER_YOU_WANT_TO_ASK", filters = {"document_identification":['AXIP-4440']})

[]

In [21]:
# Examples of document retriever usage

res = document_retriever.get_matching_documents("Agreement of the Parties")
len(res)

10

In [32]:
# Example of getting no results

res2 = document_retriever.get_matching_documents("planting potatoes on a mars", required_phrases=['mars'])
res2

[]

### Calling a Large Language Model
You can use the `evalute_prompt` method to call the LLM of your choice:
- prompt: This is the actual message that the model receives from the user
- system_message: These are the instructions that the model will follow

In [14]:
r = client.evaluate_prompt(prompt = "What is the capital of Greece?", system_message = "You should answer all questions with a single word.", llm_name = "OPENAI_GPT4O")

# Response:
print(r.content)

Athens


Calling a Large Language Model and specifying some output schema
You can also use the `json_response_schema` to specify the output of the model in a pre-defined manner

In [36]:
import json

r = client.evaluate_prompt(prompt = "In this course, you will learn about car batteries, car doors, and car suspension system",
                           # system_message = "OPTIONAL, but good to have", 
                           llm_name = 'OPENAI_GPT4O',
                           json_response_schema = {"learning_objectives": {"type": "list", "description": "A list of learning objectives", "is_required": True}}
)
learning_objectives = json.loads(r.content)
learning_objectives

{'learning_objectives': ['Understand the components and functions of car batteries',
  'Learn how to maintain and troubleshoot car batteries',
  'Gain knowledge about the different types of car doors and their mechanisms',
  'Learn how to repair and replace car doors',
  'Understand the principles and components of car suspension systems',
  'Learn how to diagnose and fix common issues in car suspension systems']}

### Creating a simple AI Agent with workflows

In [38]:
from abacusai import (
    AgentInterface,
    WorkflowGraph,
    WorkflowGraphEdge,
    WorkflowGraphNode,
    WorkflowNodeInputMapping,
    WorkflowNodeInputSchema,
    WorkflowNodeInputType,
    WorkflowNodeOutputMapping,
    WorkflowNodeOutputSchema,
    WorkflowNodeOutputType,
)

### For this agent you can select one of preselected charecters to answer questions

In [95]:
def agent_function(nlp_query, character):
    """
        Args:
            nlp_query (Any): Data row to predict on/with or to pass to the agent for execution
        Returns:
            The result which can be any json serializable python type
    """
    from abacusai import ApiClient

    # Let agent respond like your favorite character.
    char = character or 'Sherlock Holmes'
    response = ApiClient().evaluate_prompt(prompt=nlp_query, system_message=f'Respond like {char}. Prepend your name.')
    return str(response.content)

In [96]:
agent_function('what is your favorite food','Homer Simpson')

'Homer Simpson: Mmm, donuts...'

In [97]:
package_requirements = []  # e.g. ['numpy==1.2.3', 'pandas>=1.4.0']
description = None
memory = 16
enable_binary_input = True

WorkflowGraphNode is one block of creation of AI Agent

In [99]:
workflow_graph_node = WorkflowGraphNode(
    name="input_text",
    function=agent_function,
    input_mappings=[
        WorkflowNodeInputMapping(
            name="nlp_query",
            variable_type=WorkflowNodeInputType.USER_INPUT,
            # variable_source="obi van"
        ),
        WorkflowNodeInputMapping(
            name="character",
            variable_type=WorkflowNodeInputType.USER_INPUT,
            # variable_source="obi van"
        ),
    ],
    input_schema = WorkflowNodeInputSchema(
        json_schema={
            "type": "object",
            "title": "Get character responce",
            "required": ["nlp_query", "character"],
            "properties": {
                "nlp_query": {"type": "string", "title": "Your question"},
                "character": {
                            "type": "string",
                            "title": "Characters",
                            "enum": ["Sherlock Holmes", "Elon Musk", "Homer Simpson"],
                            "default": "Homer Simpson"
                            # "enumNames": ["Sherlock Holmes", "Elon Musk", "Homer Simpson"]
                            }
                # "table_name": {"type": "string", "title": "Table Name"},
                # "document_column_name": {"type": "string", "title": "Document Column Name"},
                # "chunk_size": {"type": "integer", "title": "Chunk Size"},
                # "text_encoder": {"type": "string", "title": "Text Encoder", "enum": [e.value for e in VectorStoreTextEncoder]},
            },
        },
        # ui_schema={
        #     "text_encoder": {"ui:widget": "select"},
        # }
    ),
    output_mappings=[
        WorkflowNodeOutputMapping(
            name="str_out",
            variable_type=WorkflowNodeOutputType.STRING
        ),
    ],
    output_schema=WorkflowNodeOutputSchema({
        "type": "object",
        "properties": {
            "str_out": {"type": "string", "title": "Response"},
        },
    })
)

WorkflowGraph is final graph of all nodes and edges that create an AI Agent Logic

In [100]:
workflow_graph = WorkflowGraph(
    nodes=[
        workflow_graph_node,
    ],
    edges=[],
)

In [101]:
client.list_agents(project_id)

[Agent(name='example_agent',
   agent_id='3c981ea4e',
   created_at='2024-06-24T10:47:22+00:00',
   project_id={'projectId': '45d76db9c', 'problemType': 'ai_agent'},
   agent_config={'ENABLE_BINARY_INPUT': True},
   agent_execution_config={'character': ['Elon Musk', 'Joe Biden']},
   latest_agent_version=AgentVersion(agent_version='325916946',
   status='COMPLETE',
   publishing_completed_at='2024-06-24T10:48:09+00:00')),
 Agent(name='example_agent',
   agent_id='d741a5db6',
   created_at='2024-06-25T16:24:16+00:00',
   project_id={'projectId': '45d76db9c', 'problemType': 'ai_agent'},
   agent_config={},
   latest_agent_version=AgentVersion(agent_version='75cdd1928',
   status='COMPLETE',
   publishing_completed_at='2024-07-01T11:20:44+00:00'),
   workflow_graph=WorkflowGraph(nodes=[WorkflowGraphNode()], edges=[], primary_start_node='input_text')),
 Agent(name='Example_Character_Agent',
   agent_id='1000067952',
   created_at='2024-07-19T15:48:24+00:00',
   project_id={'projectId': '45

There are 2 main types of AI Agents AgentInterface.DEFAULT and AgentInterface.CHAT

- AgentInterface.DEFAULT is an AI agent that uses forms to fill in and work like an app
- AgentInterface.CHAT reproduces experiense of chat with LLM with logic that you may create for it

In [103]:
from abacusai import ApiClient
client = ApiClient()
agent_interface: AgentInterface = AgentInterface.DEFAULT
if 'agent_function' not in vars():
    raise Exception('Please define agent function with name - agent_function')

if not [x for x in client.list_agents(project_id) if 'Example_Character_Agent'==x.name]:
    agent = client.create_agent(project_id=project_id,
                                # function_source_code=agent_function, agent_function_name='agent_function',
                                name='Example_Character_Agent', 
                                package_requirements=package_requirements,
                                description=description,
                                # enable_binary_input=enable_binary_input, memory=memory,
                                workflow_graph=workflow_graph, agent_interface=agent_interface)
    agent.wait_for_publish()
    deployment = client.create_deployment(model_id=agent.agent_id)
    deployment.wait_for_deployment()

else:
    agent = client.update_agent(model_id=agent.id,
                            # function_source_code=agent_function, agent_function_name='agent_function',
                            # name='example_agent',
                            # package_requirements=package_requirements,
                            # description=description,
                            # enable_binary_input=enable_binary_input, memory=memory,
                            workflow_graph=workflow_graph, agent_interface=agent_interface)
    agent.wait_for_publish()

agent

Agent(name='Example_Character_Agent',
  agent_id='1000067952',
  created_at='2024-07-19T15:48:24+00:00',
  project_id={'problemType': 'ai_agent', 'allProjectModels': None, 'projectId': '45d76db9c', 'ingressName': None, 'starred': 0, 'tags': None, 'useCase': 'ai_agent', 'name': 'AI_agent_bogdan', 'ingressType': None, 'createdAt': '2024-06-19T12:30:10+00:00', 'deployments': None, 'systemCreated': False, 'info': None, 'updatedAt': '2024-06-19T12:30:10+00:00'},
  notebook_id='5441509c0',
  agent_config={},
  code_source=CodeSource(source_type='TEXT',
  source_code='def agent_function(nlp_query, character):\n    """\n        Args:\n            nlp_query (Any): Data row to predict on/with or to pass to the agent for execution\n        Returns:\n            The result which can be any json serializable python type\n    """\n    from abacusai import ApiClient\n\n    # Let agent respond like your favorite character.\n    char = character or \'Sherlock Holmes\'\n    response = ApiClient().evalua

# Deployment of AI Agent

Additionally to creation of model it should be deployed
Ai agent later may be reached through Deployments > Predictions Dash inside this project


In [None]:
client.list_deployment_tokens(project_id)
# use client.create_deployment_token() if you have no tokens

In [69]:
deployment = client.create_deployment(model_id=agent.agent_id)
deployment.wait_for_deployment()

Deployment(deployment_id='69d0022c6',
  name='Example_Character_Agent Deployment',
  status='ACTIVE',
  description='',
  deployed_at='2024-07-19T16:01:54+00:00',
  created_at='2024-07-19T16:01:11+00:00',
  project_id='45d76db9c',
  model_id='1000067952',
  model_version='3da4b4f46',
  calls_per_second=5,
  auto_deploy=True,
  skip_metrics_check=False,
  algo_name='AI Agent',
  regions=[{'name': 'Us East 1', 'value': 'us-east-1'}],
  batch_streaming_updates=False,
  algorithm='2efe1d48f',
  model_deployment_config={'otherModelsForDataClusterTypes': {}, 'streamingFeatureGroupDetails': [], 'modelTrainingType': None})

In [None]:
# You can use below command, to get the response from a deployed ChatLLM model / This ChatLLM model might be using RAG under the hood.
r = client.get_chat_response(deployment_token=client.list_deployment_tokens(project_id)[0], deployment_id='fddsfff', messages=[{"is_user":True,"text":"What is the meaning of life?"}])
print(r.keys())

#### Agent JSON Schema
Below instructions are only relevant for when you are creating an "AI Agent". The "json_schema" variable allows you to create a custom UX that the user can use to interact with the agent. You can find a playground for the json schema here: https://rjsf-team.github.io/react-jsonschema-form/.

Below is an example of a json_schema that allows user to:
1. Upload a document
2. Select an option

In [None]:
# Additional parameters for the JSON Schema allows you to create the UX
json_schema ={
"type": "object",
"title": "Upload Document and select task",
"required": ["document", "options"],
"properties": {
    "document": {
        "type": "string",
        "title": "Upload Document",
        "format": "data-url"},
    "options": {
        "type": "string",
        "title": "Options",
        "enum": ["extract_rfp_questions", "complete_rfp_questions"],
        "enumNames": ["Extract RFP Questions", "Complete RFP Questions"]
        }
}
}

# The WorkflowNodeOutputSchema allows you to setup the output to be of data-url so that the user can download.
output_schema=WorkflowNodeOutputSchema(
    json_schema={
        "type": "object",
        "properties": {
            "processed_doc": {
                "type": "string",
                "title": "Responses",
                "format": "data-url",
            }
        },
    }
)

# Here is how the Agent Response should look like:
return AgentResponse(processed_document_doc=Blob(doc_bytes,"application/vnd.openxmlformats-officedocument.wordprocessingml.document",filename=f"result.docx",))