# Retrieve chat

__RetrieveChat__ is a conversational system for retrieval-augmented code generation and question answering. __RetrieveChat__ uses the `RetrieveAssistantAgent` and `RetrieveUserProxyAgent`, which is similar to the usage of `AssistantAgent` and `UserProxyAgent` in other notebooks (e.g., [Automated Task Solving with Code Generation, Execution & Debugging](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_auto_feedback_from_code_execution.ipynb)). Essentially, `RetrieveAssistantAgent` and `RetrieveUserProxyAgent` implement a different auto-reply mechanism corresponding to the __RetrieveChat__ prompts.



In [1]:
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
import json
import os

import chromadb

import autogen
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent

# Accepted file formats for that can be stored in
# a vector database instance
from autogen.retrieve_utils import TEXT_FORMATS

config_list = [
    {
        "model": "gpt-3.5-turbo", 
        "api_key": os.environ['OPENAI_API_KEY'],
        "api_type": "openai"
    },
]

assert len(config_list) > 0
print("models to use: ", [config_list[i]["model"] for i in range(len(config_list))])

models to use:  ['gpt-3.5-turbo']


#### Construct agents for RetrieveChat

In [3]:
print("Accepted file formats for `docs_path`:")
print(TEXT_FORMATS)

Accepted file formats for `docs_path`:
['txt', 'json', 'csv', 'tsv', 'md', 'html', 'htm', 'rtf', 'rst', 'jsonl', 'log', 'xml', 'yaml', 'yml', 'pdf']


In [7]:
assistant = RetrieveAssistantAgent(
    name="assistant",
    system_message="You are a helpful assistant.",  
    llm_config={
        "timeout": 120,
        "cache_seed": 82,
        "config_list": config_list,
    },
)

Create the `RetrieveUserProxyAgent` instance named "ragproxyagent"
- By default, the human_input_mode is "ALWAYS", which means the agent will ask for human input at every step. We set it to "NEVER" here.
- `docs_path` is the path to the docs directory. It can also be the path to a single file, or the url to a single file. By default,
- it is set to None, which works only if the collection is already created.
- `task` indicates the kind of task we're working on. In this example, it's a `code` task.
- `chunk_token_size` is the chunk token size for the retrieve chat. By default, it is set to `max_tokens * 0.6`, here we set it to 2000.
- `custom_text_types` is a list of file types to be processed. Default is `autogen.retrieve_utils.TEXT_FORMATS`.
- This only applies to files under the directories in `docs_path`. Explicitly included files and urls will be chunked regardless of their types.
- In this example, we set it to ["non-existent-type"] to only process markdown files. Since no "non-existent-type" files are included in the `websit/docs`,
- no files there will be processed. However, the explicitly included urls will still be processed.

In [4]:
ragproxyagent = RetrieveUserProxyAgent(
    name="ragproxyagent",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=3,
    retrieve_config={
        "task": "code",
        "docs_path": [
            "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Examples/Integrate%20-%20Spark.md",
            "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Research.md",
            os.path.join(os.path.abspath(""), "..", "website", "docs"),
        ],
        "custom_text_types": ["non-existent-type"],
        "chunk_token_size": 2000,
        "model": config_list[0]["model"],
        # "client": chromadb.PersistentClient(path="/tmp/chromadb"),  # deprecated, use "vector_db" instead
        "vector_db": "chroma",  # to use the deprecated `client` parameter, set to None and uncomment the line above
        "overwrite": False,  # set to True if you want to overwrite an existing collection
    },
    code_execution_config=False,  # set to False if you don't want to execute the code
)

  from .autonotebook import tqdm as notebook_tqdm


In [9]:
# reset the assistant. Always reset the assistant before starting a new conversation.
assistant.reset()

# given a problem, we use the ragproxyagent to generate a prompt to be sent to the assistant as the initial message.
# the assistant receives the message and generates a response. The response will be sent back to the ragproxyagent for processing.
# The conversation continues until the termination condition is met, in RetrieveChat, the termination condition when no human-in-loop is no code block detected.
# With human-in-loop, the conversation will continue until the user says "exit".
code_problem = "How can I use FLAML to perform a classification task and use spark to do parallel training. Train 30 seconds and force cancel jobs if time limit is reached."
chat_result = ragproxyagent.initiate_chat(
    assistant, message=ragproxyagent.message_generator, problem=code_problem, search_string="spark"
)  # search_string is used as an extra filter for the embeddings search, in this case, we only want to search documents that contain "spark".

Trying to create collection.


File c:\Users\ADMIN\OneDrive\Máy tính\Project\Python\LangChain\llm-agent\..\website\docs does not exist. Skipping.
2024-05-10 21:19:55,404 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 2 chunks.[0m
2024-05-10 21:19:55,411 - autogen.agentchat.contrib.vectordb.chromadb - INFO - No content embedding is provided. Will use the VectorDB's embedding function to generate the content embedding.[0m
Number of requested results 20 is greater than number of elements in index 2, updating n_results = 2


VectorDB returns doc_ids:  [['bdfbc921']]
[32mAdding content of doc bdfbc921 to context.[0m
[33mragproxyagent[0m (to assistant):

You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the
context provided by the user.
If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.
For code generation, you must obey the following rules:
Rule 1. You MUST NOT install any packages because all the packages needed are already installed.
Rule 2. You must follow the formats below to write your code:
```language
# your code
```

User's question is: How can I use FLAML to perform a classification task and use spark to do parallel training. Train 30 seconds and force cancel jobs if time limit is reached.

Context is: # Integrate - Spark

FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:

- Use Spark ML estimators for AutoML.
- Use Spark 

Number of requested results 60 is greater than number of elements in index 2, updating n_results = 2
Number of requested results 100 is greater than number of elements in index 2, updating n_results = 2
Number of requested results 140 is greater than number of elements in index 2, updating n_results = 2
Number of requested results 180 is greater than number of elements in index 2, updating n_results = 2


VectorDB returns doc_ids:  [['bdfbc921']]
VectorDB returns doc_ids:  [['bdfbc921']]
VectorDB returns doc_ids:  [['bdfbc921']]
VectorDB returns doc_ids:  [['bdfbc921']]
[32mNo more context, will terminate.[0m
[33mragproxyagent[0m (to assistant):

TERMINATE

--------------------------------------------------------------------------------


In [10]:
assistant.reset()

qa_problem = 'Who is the author of FLAML?'
ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem=qa_problem)

Number of requested results 20 is greater than number of elements in index 2, updating n_results = 2


VectorDB returns doc_ids:  [['7968cf3c', 'bdfbc921']]
[32mAdding content of doc 7968cf3c to context.[0m
[32mAdding content of doc bdfbc921 to context.[0m
[33mragproxyagent[0m (to assistant):

You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the
context provided by the user.
If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.
For code generation, you must obey the following rules:
Rule 1. You MUST NOT install any packages because all the packages needed are already installed.
Rule 2. You must follow the formats below to write your code:
```language
# your code
```

User's question is: Who is the author of FLAML?

Context is: # Research

For technical details, please check our research publications.

- [FLAML: A Fast and Lightweight AutoML Library](https://www.microsoft.com/en-us/research/publication/flaml-a-fast-and-lightweight-automl-library/). Chi Wang, Qingyun W

ChatResult(chat_id=None, chat_history=[{'content': 'You\'re a retrieve augmented coding assistant. You answer user\'s questions based on your own knowledge and the\ncontext provided by the user.\nIf you can\'t answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\nFor code generation, you must obey the following rules:\nRule 1. You MUST NOT install any packages because all the packages needed are already installed.\nRule 2. You must follow the formats below to write your code:\n```language\n# your code\n```\n\nUser\'s question is: Who is the author of FLAML?\n\nContext is: # Research\n\nFor technical details, please check our research publications.\n\n- [FLAML: A Fast and Lightweight AutoML Library](https://www.microsoft.com/en-us/research/publication/flaml-a-fast-and-lightweight-automl-library/). Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu. MLSys 2021.\n\n```bibtex\n@inproceedings{wang2021flaml,\n    title={FLAML: A Fast and Lightweigh

Use `RetrieveChat` to answer questions for `NaturalQuestion` dataset.

First, we will create a new document collection which includes all the contextual corpus. Then, we will choose some questions and utilize `RetrieveChat` to answer them. For this particular example, we will be using the `gpt-3.5-turbo` model, and we will demonstrate `RetrieveChat`’s feature of automatically updating context in case the documents retrieved do not contain sufficient information.

In [None]:

corpus_file = "https://huggingface.co/datasets/thinkall/NaturalQuestionsQA/resolve/main/corpus.txt"


embedding_model = "all-MiniLM-L6-v2"

ragproxyagent = RetrieveUserProxyAgent(
    name='ragproxyagent',
    human_input_mode='NEVER',
    max_consecutive_auto_reply=3,
    retrieve_config={
        'task': 'qa',
        'docs_path': corpus_file,
        'embedding_model': embedding_model,
        'chunk_mode': 'one_line',
        'model': config_list[0]['model'],
        'client': chromadb.PersistentClient(path='tmp/chromadb'),
        'chunk_token_size': 2000
    }
)