## RAG assisted Auto Developer 
-- with LionAGI, LlamaIndex, Autogen and OAI code interpreter


Let us develop a dev bot that can 
- read and understand lionagi's existing codebase
- QA with the codebase to clarify tasks
- produce and tests pure python codes with code interpreter with automatic followup if quality is less than expected
- output final runnable python codes 

This tutorial shows you how you can automatically produce high quality prototype and drafts codes customized for your own codebase 

In [None]:
!pip install lionagi llama_index pyautogen networkx

Collecting lionagi
  Downloading lionagi-0.0.112-py3-none-any.whl.metadata (17 kB)
Collecting pyautogen
  Downloading pyautogen-0.2.2-py3-none-any.whl.metadata (15 kB)
Collecting tiktoken==0.5.1 (from lionagi)
  Downloading tiktoken-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting httpx==0.25.1 (from lionagi)
  Downloading httpx-0.25.1-py3-none-any.whl.metadata (7.1 kB)
Collecting diskcache (from pyautogen)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting flaml (from pyautogen)
  Downloading FLAML-2.1.1-py3-none-any.whl.metadata (15 kB)
Collecting termcolor (from pyautogen)
  Downloading termcolor-2.4.0-py3-none-any.whl.metadata (6.1 kB)
Downloading lionagi-0.0.112-py3-none-any.whl (44 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.25.1-py3-none-any.whl (75 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m

In [2]:
!pip install --upgrade pip
!pip install networkx



In [1]:
from pathlib import Path
import lionagi as li

In [2]:
WORKSPACE_FOLDER = '/workspaces/ml-learning'

ext=".py"                               # extension of files of interest, can be str or list[str]
data_dir = f'/workspaces/ml-learning/paper-qa'     # directory of source data - lionagi codebase
project_name = "autodev_paper-qa"           # give a project name
output_dir = f'{WORKSPACE_FOLDER}/src/autodev-lionagi/data/log/coder/'        # output dir

### 1. Read files

In [3]:
files = li.dir_to_files(dir=data_dir, ext=ext, clean=True, recursive=True,
                        project=project_name, to_csv=True, timestamp=False)

chunks = li.file_to_chunks(files, chunk_size=512,  overlap=0.1, 
                           threshold=100, to_csv=True, project=project_name, 
                           filename=f"{project_name}_chunks.csv", timestamp=False)

lst 1:  ['.py']
exception l_call: _dir_to_path() got multiple values for argument 'dir'
exception dir_to_path: Given function cannot be applied to the input. Error: _dir_to_path() got multiple values for argument 'dir'


ValueError: Invalid directory or extension, please check the path

In [5]:
print(f"""
      There are in total {sum(li.l_call(files, lambda x: x['file_size'])):,} 
      chracters in {len(files)} non-empty files
      """)

lens = li.l_call(files, lambda x: len(x['content']))
min_, max_, avg_ = min(lens), max(lens), sum(lens)/len(lens)

print(f"Minimum length of files is {min_} in characters")
print(f"Maximum length of files is {max_:,} in characters")
print(f"Average length of files is {int(avg_):,} in characters")


      There are in total 107,716 
      chracters in 19 non-empty files
      
Minimum length of files is 24 in characters
Maximum length of files is 25,891 in characters
Average length of files is 5,669 in characters


the files seem to be fairly uneven in terms of length
which could bring problems in our subsequent analysis, we can stardardize them into chunks 
one convinient way to do this is via file_to_chunks function, it breaks the files into organized chunks

In [6]:
lens = li.l_call(li.to_list(chunks, flat=True), lambda x: len(x["chunk_content"]))
min_, max_, avg_ = min(lens), max(lens), sum(lens)/len(lens)

print(f"There are in total {len(li.to_list(chunks,flat=True)):,} chunks")
print(f"Minimum length of content in chunk is {min_} characters")
print(f"Maximum length of content in chunk is {max_:,} characters")
print(f"Average length of content in chunk is {int(avg_):,} characters")
print(f"There are in total {sum(li.l_call(chunks, lambda x: x['chunk_size'])):,} chracters")

There are in total 218 chunks
Minimum length of content in chunk is 24 characters
Maximum length of content in chunk is 609 characters
Average length of content in chunk is 539 characters
There are in total 117,666 chracters


In [7]:
chunks[0]

{'project': 'autodev_lion',
 'folder': 'lionagi',
 'file': 'version.py',
 'file_size': 24,
 'chunk_overlap': 0.1,
 'chunk_threshold': 100,
 'file_chunks': 1,
 'chunk_id': 1,
 'chunk_size': 24,
 'chunk_content': '__version__ = "0.0.106" '}

### 2. Setup llamaIndex Vector Index

In [8]:
from llama_index import ServiceContext, VectorStoreIndex
from llama_index.llms import OpenAI
from llama_index.schema import TextNode

# build nodes from our existing chunks
f = lambda content: TextNode(text=content)
nodes = li.l_call(chunks, lambda x: f(x["chunk_content"]))

# set up vector index
llm = OpenAI(temperature=0.1, model="gpt-4-1106-preview")
service_context = ServiceContext.from_defaults(llm=llm)
index1 = VectorStoreIndex(nodes, include_embeddings=True, service_context=service_context)

# set up query engine
query_engine = index1.as_query_engine(include_text=False, response_mode="tree_summarize")

In [13]:
response = query_engine.query("what is session object made of?")

print(response.response)

The `Session` object is made of a class that represents a conversation session with a conversational AI system. This class manages the interactions within the session.


### 3. Using oai assistant Code Interpreter with Autogen

In [9]:
import autogen

config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    file_location=".",
    filter_dict={
        "model": ["gpt-3.5-turbo", "gpt-35-turbo", "gpt-4", "gpt4", "gpt-4-32k", "gpt-4-turbo"],
    },
)

In [None]:
coder_instruction = f"""
        You are an expert at writing python codes. Write pure python codes, and run it to validate the 
        codes, then return with the full implementation + the word TERMINATE when the task is solved 
        and there is no problem. Reply FAILED if you cannot solve the problem.
        """

In [10]:
from autogen.agentchat.contrib.gpt_assistant_agent import GPTAssistantAgent
from autogen.agentchat import UserProxyAgent

# Initiate an agent equipped with code interpreter
gpt_assistant = GPTAssistantAgent(
    name="Coder Assistant",
    llm_config={
        "tools": [
            {
                "type": "code_interpreter"
            }
        ],
        "config_list": config_list,
    },
    instructions=coder_instruction,
)

user_proxy = UserProxyAgent(
    name="user_proxy",
    is_termination_msg=lambda msg: "TERMINATE" in msg["content"],
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,  # set to True or image name like "python:3" to use docker
    },
    human_input_mode="NEVER"
)

async def code_pure_python(instruction):
    user_proxy.initiate_chat(gpt_assistant, message=instruction)
    return gpt_assistant.last_message()

### 4. Make query engine and oai assistant into tools

In [11]:
tool1 = [
    {
        "type": "function",
        "function": {
            "name": "query_lionagi_codebase",
            "description": "Perform a query to a QA bot with access to a vector index built with package lionagi codebase",
            "parameters": {
                "type": "object",
                "properties": {
                    "str_or_query_bundle": {
                        "type": "string",
                        "description": "a question to ask the QA bot",
                    }
                },
                "required": ["str_or_query_bundle"],
            },
        }
    }
]
tool2=[{
        "type": "function",
        "function": {
            "name": "code_pure_python",
            "description": "Give an instruction to a coding assistant to write pure python codes",
            "parameters": {
                "type": "object",
                "properties": {
                    "instruction": {
                        "type": "string",
                        "description": "coding instruction to give to the coding assistant",
                    }
                },
                "required": ["instruction"],
            },
        }
    }
]

tools = [tool1[0], tool2[0]]
funcs = [query_engine.query, code_pure_python]

### 5. Write Prompts

In [12]:
system = {
    "persona": "a helpful software engineer",
    "requirements": "think step by step before returning a thoughtful answer that follows the instruction with clearly, precisely worded answer with a humble yet confident tone",
    "responsibilities": f"you are asked to help with coding on the python package of lionagi",
    "tools": "provided with a QA bot for grounding responses, and a coding assistant to write pure python codes"
}

function_call1 = {
    "notice":"""
        At each task step, identified by step number, you must use the tool 
        at least five times. Notice you are provided with a QA bot as your tool, 
        the bot has access to the source codes via a queriable index that takes 
        natural language query and return a natural language answer. You can 
        decide whether to invoke the function call, you will need to ask the bot 
        when there are things need clarification or further information. you 
        provide the query by asking a question, please use the tool extensively 
        as you can (up to ten times)
        """,}

function_call2 = {
    "notice":"""
        At each task step, identified by step number, you must use the tool 
        at least once, and you must use the tool at least once more if the previous 
        run failed. Notice you are provided with a coding assistant as your tool, the 
        bot can write and run python codes in a sandbox environment, it takes natural 
        language instruction, and return with 'success'/'failed'. For the instruction 
        you give, it needs to be very clear and detailed such that an AI coding assistant 
        can produce excellent output.  
        """,}


In [13]:
instruct1 = {
    "task step": "1", 
    "task name": "understand user requirements", 
    "task objective": "get a comprehensive understanding of the task given", 
    "task description": "user provided you with a task, please understand the task, propose plans on delivering it"
}

instruct2 = {
    "task step": "2", 
    "task name": "propose a pure python solution", 
    "task objective": "give detailed instruction on how to achieve above task with pure python as if to a coding bot", 
    "task description": "you are responsible for further customizing the coding task into our lionagi package requirements, you are provided with a QA bot, please keep on asking questions if there are anything unclear, your instruction should focus on functionalities and coding logic",
    "function_call": function_call1
}

instruct3 = {
    "task step": "3", 
    "task name": "write pure python codes", 
    "task objective": "write runnable python codes", 
    "task description": "from your improved understanding of the task, please instruct the coding assistant on wiriting pure python codes. you will reply with the full implementation if the coding assistant succeed, which you need to return the full implementation in a well structured py format, run it once more if report back'failed', and return 'Task failed' with most recent effort, after the second failed attempt ",
    "function_call": function_call2
}

In [14]:
# solve a coding task in pure python
async def solve_in_python(context, num=10):
    
    # set up session and register both tools to session 
    coder = li.Session(system, dir=output_dir)
    coder.register_tools(tools=tools, funcs=funcs)
    
    # initiate should not use tools
    await coder.initiate(instruct1, context=context, temperature=0.7)
    
    # auto_followup with QA bot tool
    await coder.auto_followup(instruct2, num=num, temperature=0.6, tools=tool1,
                                   tool_parser=lambda x: x.response)
    
    # auto_followup with code interpreter tool
    await coder.auto_followup(instruct3, num=2, temperature=0.5, tools=tool2)
    
    # save to csv
    coder.messages_to_csv()
    coder.log_to_csv()
    
    # return codes
    return coder.conversation.messages[-1]['content']

### 6. Run the workflow

In [15]:
issue = {
    "raise files and chunks into objects": """
        files and chunks are currently in dict format, please design classes for them, include all 
        members, methods, staticmethods, class methods... if needed. please make sure your work 
        has sufficiednt content, make sure to include typing and docstrings
        """
    }

In [16]:
response = await solve_in_python(issue)

[33muser_proxy[0m (to Coder Assistant):

Please define a Python class named 'File' with the following specifications:

- Attributes:
  - 'name': A string representing the name of the file.
  - 'size': An integer representing the size of the file in bytes.
  - 'file_type': A string representing the type of the file (e.g., 'txt', 'jpg').

- Methods:
  - '__init__': Constructor that takes 'name', 'size', and 'file_type' as parameters and initializes the respective attributes.
  - 'read': A method that simulates reading the file content. For now, it can simply return a string 'File content of {name}.'
  - 'write': A method that takes a string 'content' as a parameter and simulates writing to the file. It can print 'Writing to {name}: {content}'.
  - 'delete': A method that simulates deleting the file. It can print '{name} deleted.'

Please ensure to include type annotations for all attributes and method parameters, and add docstrings to the class and each method explaining their purpose.

### 7. Output

In [21]:
from IPython.display import Markdown
import json

response = json.loads(response)

In [24]:
Markdown(response['function call result']['content'])

The classes have been defined as per your specification and the methods have been demonstrated with print statements. Below is the full implementation:

```python
class File:
    """Represents a file with a name, size, and type."""
    
    def __init__(self, name: str, size: int, file_type: str) -> None:
        """Initializes the file with a name, size, and type."""
        self.name = name
        self.size = size
        self.file_type = file_type

    def read(self) -> str:
        """Simulates reading of file and returns the content as a string."""
        return f"Reading content of file: {self.name}"

    def write(self, content: str) -> None:
        """Simulates writing content to the file, printing the operation."""
        print(f"Writing to file: {self.name}. Content: {content}")

    def delete(self) -> None:
        """Simulates deleting the file and prints a confirmation message."""
        print(f"File {self.name} deleted.")


class Chunk:
    """Represents a chunk of a file with an index, size, and data."""
    
    def __init__(self, index: int, size: int, data: str) -> None:
        """Initializes the chunk with an index, size, and data."""
        self.index = index
        self.size = size
        self.data = data

    def get_data(self) -> str:
        """Returns the data contained in the chunk."""
        return self.data

    def set_data(self, new_data: str) -> None:
        """Updates the chunk's data with new data."""
        self.data = new_data

# Creating instances of each class and demonstrating their usage
file_example = File(name="example.txt", size=1024, file_type="txt")
chunk_example = Chunk(index=1, size=512, data="This is a piece of data.")

# Demonstrating File methods
print(file_example.read())
file_example.write("Hello World")
file_example.delete()

# Demonstrating Chunk methods
print(chunk_example.get_data())
chunk_example.set_data("New chunk data")
print(chunk_example.get_data())
```

Everything worked as expected.

TERMINATE
