In [41]:
# !pip install pyautogen==0.2.20
# !pip install openai==1.14.2

# or pip install -r requirements.txt (from the github repo)

Quick explanation for the retrieval-augmented generation example:

[source](https://microsoft.github.io/autogen/blog/2023/10/18/RetrieveChat#:~:text=Retrieval%20augmentation%20has,for%20the%20context.)

Retrieval augmentation has emerged as a practical and effective approach for mitigating the intrinsic limitations of LLMs by incorporating external documents. In this blog post, we introduce RAG agents of AutoGen that allows retrieval-augmented generation. The system consists of two agents: a Retrieval-augmented User Proxy agent, called RetrieveUserProxyAgent, and a Retrieval-augmented Assistant agent, called RetrieveAssistantAgent, both of which are extended from built-in agents from AutoGen. The overall architecture of the RAG agents is shown in the figure above.

To use Retrieval-augmented Chat, one needs to initialize two agents including Retrieval-augmented User Proxy and Retrieval-augmented Assistant. Initializing the Retrieval-Augmented User Proxy necessitates specifying a path to the document collection. Subsequently, the Retrieval-Augmented User Proxy can download the documents, segment them into chunks of a specific size, compute embeddings, and store them in a vector database. Once a chat is initiated, the agents collaboratively engage in code generation or question-answering adhering to the procedures outlined below:

The Retrieval-Augmented User Proxy retrieves document chunks based on the embedding similarity, and sends them along with the question to the Retrieval-Augmented Assistant.
The Retrieval-Augmented Assistant employs an LLM to generate code or text as answers based on the question and context provided. If the LLM is unable to produce a satisfactory response, it is instructed to reply with “Update Context” to the Retrieval-Augmented User Proxy.
If a response includes code blocks, the Retrieval-Augmented User Proxy executes the code and sends the output as feedback. If there are no code blocks or instructions to update the context, it terminates the conversation. Otherwise, it updates the context and forwards the question along with the new context to the Retrieval-Augmented Assistant. Note that if human input solicitation is enabled, individuals can proactively send any feedback, including Update Context”, to the Retrieval-Augmented Assistant.
If the Retrieval-Augmented Assistant receives “Update Context”, it requests the next most similar chunks of documents as new context from the Retrieval-Augmented User Proxy. Otherwise, it generates new code or text based on the feedback and chat history. If the LLM fails to generate an answer, it replies with “Update Context” again. This process can be repeated several times. The conversation terminates if no more documents are available for the context.

In [42]:
from autogen import config_list_from_json
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent
import os

In [43]:
config_list = config_list_from_json("./OAI_CONFIG_LIST")

In [44]:
assistant = RetrieveAssistantAgent(
      name="assistant",
      system_message="You are a helpful assistant.",
      llm_config={
          "timeout": 600,
          "cache_seed": 42,
          "config_list": config_list,
      },
  )

In [45]:
assistant

<autogen.agentchat.contrib.retrieve_assistant_agent.RetrieveAssistantAgent at 0x1638fca90>

In [46]:

type(assistant)

autogen.agentchat.contrib.retrieve_assistant_agent.RetrieveAssistantAgent

In [47]:
ragproxyagent = RetrieveUserProxyAgent(
      name="ragproxyagent",
      human_input_mode="NEVER",
      max_consecutive_auto_reply=3,
      retrieve_config={
          "task": "code",
          "docs_path": [
              "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Examples/Integrate%20-%20Spark.md",
              "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Research.md",
              os.path.join(os.path.abspath(""), "..", "website", "docs")],
      },
      code_execution_config=False,  # set to False if you don't want to execute the code
  )

In [48]:
assistant.reset()

In [49]:
code_problem = "How can I use FLAML to perform a classification task and use spark to do parallel training. Train 30 seconds and force cancel jobs if time limit is reached."
ragproxyagent.initiate_chat(assistant, problem=code_problem, search_string="spark")

Trying to create collection.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

File /Users/greatmaster/Desktop/projects/oreilly-live-trainings/oreilly_live_training_autogen/notebooks/../website/docs does not exist. Skipping.
Number of requested results 20 is greater than number of elements in index 2, updating n_results = 2


doc_ids:  [['doc_0']]
[32mAdding doc_id doc_0 to context.[0m
[33mragproxyagent[0m (to assistant):

You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the
context provided by the user.
If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.
For code generation, you must obey the following rules:
Rule 1. You MUST NOT install any packages because all the packages needed are already installed.
Rule 2. You must follow the formats below to write your code:
```language
# your code
```

User's question is: How can I use FLAML to perform a classification task and use spark to do parallel training. Train 30 seconds and force cancel jobs if time limit is reached.

Context is: # Integrate - Spark

FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:
- Use Spark ML estimators for AutoML.
- Use Spark to run training in parallel spar