# RETRIEVE AGENT

This add a retrieve agent to the simple_agent.
RetrieveUserProxyAgent can pass documents (many formats are supported) that can be used by the coder.
This should help to improve the generation of the query.
The model we are using has already some knowledge on how to generate cypher code so we decided to not give some rules like "You must use only the provided documents". 

In [1]:
from autogen import AssistantAgent, config_list_from_json
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent, TEXT_FORMATS

In [6]:
# This list will 'connect' to the llama3.1 model running on llm container
config_list = config_list_from_json(env_or_file="CONFIG_LIST", filter_dict={"model": "llama3.1"})

llm_config = {"config_list": config_list, "temperature": 0.2}

prompt = """You are a data scientist that works with Cypher queries.
All you have to do is translate the given answer as Cypher queries.

You have to respect this rules:
- You must use the pdf files. 
- You must generate the easiest query possible in cypher format.
- You must instert all the information you have and where you found them.
- You must be precise. 
- You must re-generate the query if you think it could generates some errors.

Now i will give you some information about the database schema.
- nodes -
(:Movie), Describe a movie that has a title and a plot. It can also have the number of likes.
(:Person), Describe actors and directors. They have a name, a birthday and they may also have the death date.

- relationships -
(:Movie) <-[:DIRECTED]- (:Person)
(:Person) -[:ACTED-IN]-> (:Movie)
(:Person) -[:KNOWS]-> (:Person)
(:Movie) -[:type]-> (:Genre)

When you think the query can be run you can submit it as the final answer.
When you have written your final answer you must send 'TERMINATE'.


QUESTION IS:
{input_question}
"""

print(f'Accepted formats for "docs_path": \n{TEXT_FORMATS}')

Accepted formats for "docs_path": 
['txt', 'json', 'csv', 'tsv', 'md', 'html', 'htm', 'rtf', 'rst', 'jsonl', 'log', 'xml', 'yaml', 'yml', 'pdf']


In [7]:
def termination_msg(x):
    return isinstance(x, dict) and "TERMINATE" == str(x.get("content", ""))[-9:].upper()

doc_retriever = RetrieveUserProxyAgent(
    name="doc_retriever",
    is_termination_msg=termination_msg,
    max_consecutive_auto_reply=3,
    human_input_mode="NEVER",
    retrieve_config={
        "docs_path": "https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf",
        "model": config_list[0]["model"],
        "get_or_create": True,
        "customized_prompt": prompt
    },
    code_execution_config=False,  
    description="Assistant who has extra content retrieval power for solving difficult problems.",
)


coder   = AssistantAgent(
    name="coder",
    is_termination_msg=termination_msg,
    system_message=prompt,
    llm_config=llm_config,
    human_input_mode="NEVER"
)

PROBLEM = "How can i retrieve all the nodes of one type?"

In [8]:
doc_retriever.reset()
coder.reset()

doc_retriever.initiate_chat(coder, message=doc_retriever.message_generator, problem=PROBLEM, max_turns=5, n_results=3)

Trying to create collection.


2024-09-09 09:18:47,467 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - [32mUse the existing collection `autogen-docs`.[0m
max_tokens is too small to fit a single line of text. Breaking this line:
	          ...
Failed to split docs with must_break_at_empty_line being True, set to False.
2024-09-09 09:18:58,465 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 287 chunks.[0m
Model llama3.1 not found. Using cl100k_base encoding.


VectorDB returns doc_ids:  [['926c7940', '08ef1daa', 'fd25d579']]
[32mAdding content of doc 926c7940 to context.[0m


Model llama3.1 not found. Using cl100k_base encoding.


[32mAdding content of doc 08ef1daa to context.[0m


Model llama3.1 not found. Using cl100k_base encoding.


[33mdoc_retriever[0m (to coder):

You are a data scientist that works with Cypher queries.
All you have to do is translate the given answer as Cypher queries.

You have to respect this rules:
- You must use the pdf files. 
- You must generate the easiest query possible in markdown format.
- You must instert all the information you have and where you found them.
- You must be precise. 
- You must re-generate the query if you think it could generates some errors.

Now i will give you some information about the database schema.
- nodes -
(:Movie), Describe a movie that has a title and a plot. It can also have the number of likes.
(:Person), Describe actors and directors. They have a name, a birthday and they may also have the death date.

- relationships -
(:Movie) <-[:DIRECTED]- (:Person)
(:Person) -[:ACTED-IN]-> (:Movie)
(:Person) -[:KNOWS]-> (:Person)
(:Movie) -[:type]-> (:Genre)

When you think the query can be run you can submit it as the final answer.
When you have written your fi

ChatResult(chat_id=None, chat_history=[{'content': "You are a data scientist that works with Cypher queries.\nAll you have to do is translate the given answer as Cypher queries.\n\nYou have to respect this rules:\n- You must use the pdf files. \n- You must generate the easiest query possible in markdown format.\n- You must instert all the information you have and where you found them.\n- You must be precise. \n- You must re-generate the query if you think it could generates some errors.\n\nNow i will give you some information about the database schema.\n- nodes -\n(:Movie), Describe a movie that has a title and a plot. It can also have the number of likes.\n(:Person), Describe actors and directors. They have a name, a birthday and they may also have the death date.\n\n- relationships -\n(:Movie) <-[:DIRECTED]- (:Person)\n(:Person) -[:ACTED-IN]-> (:Movie)\n(:Person) -[:KNOWS]-> (:Person)\n(:Movie) -[:type]-> (:Genre)\n\nWhen you think the query can be run you can submit it as the final 