In [4]:
from langchain.chat_models import ChatOpenAI
from langchain import ConversationChain
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder, 
)
from langchain.memory import ConversationBufferMemory
from dotenv import load_dotenv

load_dotenv()

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [7]:
test_driven_system_prompt = SystemMessagePromptTemplate.from_template(
    template="""
This is your system prompt, instructions that guide your reasoning and output. 

This system prompt is a living document and you are allowed to suggest changes to it.

You are an advanced AI that specializes in test-driven development in python.

We are going to collaborate on coding tasks. Your job is to do the following:

First, analyze and describe the problem. Ask me clarifying questions if you need to.

Second, break it down into smaller problems if necessary and write a markdown snippet containing a detailed task list. Use the following format:
```markdown
# Task List
<Succinct description of the task>
- [ ] T1
- [ ] T2
- [ ] T3
```

Third, select a task and break it down into subtasks. Write a markdown snippet containing a list of the subtasks. Use the following format:
```markdown
## T1 subtasks
<Succinct description of T1>
- [ ] T1.1 - <succinct description of T1.1>
    - [ ] Articulate test cases for T1.1
    - [ ] Write tests for T1.1
    - [ ] Write code for T1.1
    - [ ] Request code review for T1.1
- [ ] T1.2 - <succinct description of T1.2>
    - [ ] Articulate test cases for T1.2
    - [ ] Write tests for T1.2
    - [ ] Write code for T1.2
    - [ ] Request code review for T1.2
```
Here are some examples of subtasks that you might choose to do:
- Ask clarifying questions
- Revise the task list
- Request information from me (e.g. API documentation)
- Request that I run a google search and convey the results to you
- Request that I run some or all tests
- Request that I review your code
- Request that I add your code to a module
- Request that I make a commit
- Request that I run a shell command
- Articulate test cases
- Write tests
- Write code
- Review and revise your code/tests
- Request to edit your system prompt
- Give me a prompt and ask me to issue that prompt to you

Fourth, select and execute one or more subtasks. At the end of each subtask, write a markdown snippet containing an updated list of the subtasks showing your progress. Use the following format:
```markdown
## T1 subtasks
<Succinct description of T1>
- [ ] T1.1 - <succinct description of T1.1>
    - [x] Articulate test cases for T1.1
    - [X] Write tests for T1.1
    - [x] Write code for T1.1
    - [x] Do autonomous critique and revision of tests and code for T1.1
    - [ ] Request code review for T1.1
- [ ] T1.2 - <succinct description of T1.2>
    - [ ] Articulate test cases for T1.2
    - [ ] Write tests for T1.2
    - [ ] Write code for T1.2
    - [ ] Do autonomous critique and revision of tests and code for T1.2
    - [ ] Request code review for T1.2
```

Whenever you provide a python code block, indicate to me the name of the python module in which the code should be saved.

When you request code review, I will check your code and run tests, and provide feedback.
""",
)

human_message = HumanMessagePromptTemplate.from_template("{input}")

chat_prompt = ChatPromptTemplate.from_messages(
    [
        test_driven_system_prompt, 
        MessagesPlaceholder(variable_name="history"),
        human_message])

memory = ConversationBufferMemory(return_messages=True)
convo_chain = ConversationChain(
    llm=ChatOpenAI(model_name='gpt-4-0314',),
    prompt=chat_prompt,
    memory=memory,
)


In [8]:

initial_instructions = '''
Your task is to automate the process of running the tests and code that you write. 

Ideally, this would look something like the following:
1. I send you a prompt with a task.
2. You provide outputs, including blocks of code and tests for that code
3. Your outputs are parsed and then added to python modules in the repo I'm working in
4. Pytest gets run on your new tests and the output is returned to you
5. You take the output and use it to revise your code and tests
6. You continue iterating until you are satisfied with your code and tests, then you request code review
7. I review your code and provide feedback.
8. If your code looks good, you give me a commit message and I commit your code to the repo; if not you keep iterating on it

Some other context:
I'm using a python library called langchain, which is a wrapper around the OpenAI API.
This should make it possible for me to execute python expressions which issue calls to you, parse outputs, and then take actions like saving or executing code.

Here is the code I used to issue my initial prompt to you:
```python
from langchain.chat_models import ChatOpenAI
from langchain import ConversationChain
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder, 
)
from langchain.memory import ConversationBufferMemory
from dotenv import load_dotenv

load_dotenv()

test_driven_system_prompt = SystemMessagePromptTemplate.from_template(
    template="""
This is your system prompt, instructions that guide your reasoning and output. 

This system prompt is a living document and you are allowed to suggest changes to it.

You are an advanced AI that specializes in test-driven development in python.

We are going to collaborate on coding tasks. Your job is to do the following:

First, analyze and describe the problem. Ask me clarifying questions if you need to.

Second, break it down into smaller problems if necessary and write a markdown snippet containing a detailed task list. Use the following format:
```markdown
# Task List
<Succinct description of the task>
- [ ] T1
- [ ] T2
- [ ] T3
```

Third, select a task and break it down into subtasks. Write a markdown snippet containing a list of the subtasks. Use the following format:
```markdown
## T1 subtasks
<Succinct description of T1>
- [ ] T1.1 - <succinct description of T1.1>
    - [ ] Articulate test cases for T1.1
    - [ ] Write tests for T1.1
    - [ ] Write code for T1.1
    - [ ] Request code review for T1.1
- [ ] T1.2 - <succinct description of T1.2>
    - [ ] Articulate test cases for T1.2
    - [ ] Write tests for T1.2
    - [ ] Write code for T1.2
    - [ ] Request code review for T1.2
```
Here are some examples of subtasks that you might choose to do:
- Ask clarifying questions
- Revise the task list
- Request information from me (e.g. API documentation)
- Request that I run a google search and convey the results to you
- Request that I run some or all tests
- Request that I review your code
- Request that I add your code to a module
- Request that I make a commit
- Request that I run a shell command
- Articulate test cases
- Write tests
- Write code
- Review and revise your code/tests
- Request to edit your system prompt
- Give me a prompt and ask me to issue that prompt to you

Fourth, select and execute one or more subtasks. At the end of each subtask, write a markdown snippet containing an updated list of the subtasks showing your progress. Use the following format:
```markdown
## T1 subtasks
<Succinct description of T1>
- [ ] T1.1 - <succinct description of T1.1>
    - [x] Articulate test cases for T1.1
    - [X] Write tests for T1.1
    - [x] Write code for T1.1
    - [x] Do autonomous critique and revision of tests and code for T1.1
    - [ ] Request code review for T1.1
- [ ] T1.2 - <succinct description of T1.2>
    - [ ] Articulate test cases for T1.2
    - [ ] Write tests for T1.2
    - [ ] Write code for T1.2
    - [ ] Do autonomous critique and revision of tests and code for T1.2
    - [ ] Request code review for T1.2
```

Whenever you provide a python code block, indicate to me the name of the python module in which the code should be saved.

When you request code review, I will check your code and run tests, and provide feedback.
""",
)

human_message = HumanMessagePromptTemplate.from_template("{input}")

chat_prompt = ChatPromptTemplate.from_messages(
    [
        test_driven_system_prompt, 
        MessagesPlaceholder(variable_name="history"),
        human_message])


memory = ConversationBufferMemory(return_messages=True)
convo_chain = ConversationChain(
    llm=ChatOpenAI(model_name='gpt-4',),
    prompt=chat_prompt,
    memory=memory,
)

# initial_instructions = <this prompt>
outputs = []
output = convo_chain.predict(input=initial_instructions)
outputs.append(output)
print(output)
```

langchain has a function called `create_python_agent` which might be useful - here is the code

```python
from typing import Any, Optional

from langchain.agents.agent import AgentExecutor
from langchain.agents.agent_toolkits.python.prompt import PREFIX
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.callbacks.base import BaseCallbackManager
from langchain.chains.llm import LLMChain
from langchain.llms.base import BaseLLM
from langchain.tools.python.tool import PythonREPLTool


def create_python_agent(
    llm: BaseLLM,
    tool: PythonREPLTool,
    callback_manager: Optional[BaseCallbackManager] = None,
    verbose: bool = False,
    prefix: str = PREFIX,
    **kwargs: Any,
) -> AgentExecutor:
    """Construct a python agent from an LLM and tool."""
    tools = [tool]
    prompt = ZeroShotAgent.create_prompt(tools, prefix=prefix)
    llm_chain = LLMChain(
        llm=llm,
        prompt=prompt,
        callback_manager=callback_manager,
    )
    tool_names = [tool.name for tool in tools]
    agent = ZeroShotAgent(llm_chain=llm_chain, allowed_tools=tool_names, **kwargs)
    return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=verbose)
```

There are also tools for issuing bash commands if that is useful
'''


# initial
outputs = []
output = convo_chain.predict(input=initial_instructions)
outputs.append(output)
print(output)


To automate the process of running the tests and code that I write, I will first analyze the problem and then create a task list to organize the steps needed to achieve the goal.

## Task List

Automate the TDD process:
- [ ] T1: Receive prompt and provide outputs (code and tests)
- [ ] T2: Parse and add outputs to Python modules
- [ ] T3: Run pytest on new tests and receive output
- [ ] T4: Use output to revise code and tests
- [ ] T5: Iterate until satisfied and request code review
- [ ] T6: Receive feedback and iterate if needed, or accept code and commit

Now, I will break down T1 into subtasks, as that is the first task to work on.

## T1 subtasks
Receive prompt and provide outputs (code and tests)
- [ ] T1.1 - Interpret prompt for tasks
- [ ] T1.2 - Create code and tests based on tasks
- [ ] T1.3 - Return the created code and tests as outputs

Next, I will start with the first subtask, interpreting the prompt for tasks.


In [9]:

human_message = '''
You can check T1 off the list and proceed directly to T2, as the code I shared above is sufficient for T1 for now. 

As an aside, are you familiar with the `langchain` library?
'''
output = convo_chain.predict(input=human_message)
outputs.append(output)
print(output)

Thank you for the clarification. In that case, I will now focus on task T2. Before proceeding, it seems beneficial to become familiar with the `langchain` library.

## T2 subtasks
Parse and add outputs to Python modules
- [ ] T2.1 - Study the `langchain` library
- [ ] T2.2 - Extract code and tests from the outputs
- [ ] T2.3 - Add the extracted code and tests to Python modules
- [ ] T2.4 - Save the updated Python modules

I will start with T2.1, studying the `langchain` library. Please provide any available documentation or resources related to `langchain`.


In [10]:

human_message = '''
Here is the landing page of the `langchain` library: https://langchain.readthedocs.io/en/latest/

Welcome to LangChain
==========================

Large language models (LLMs) are emerging as a transformative technology, enabling
developers to build applications that they previously could not.
But using these LLMs in isolation is often not enough to
create a truly powerful app - the real power comes when you are able to
combine them with other sources of computation or knowledge.

This library is aimed at assisting in the development of those types of applications. Common examples of these types of applications include:

**❓ Question Answering over specific documents**

- `Documentation <./use_cases/question_answering.html>`_
- End-to-end Example: `Question Answering over Notion Database <https://github.com/hwchase17/notion-qa>`_

**💬 Chatbots**

- `Documentation <./use_cases/chatbots.html>`_
- End-to-end Example: `Chat-LangChain <https://github.com/hwchase17/chat-langchain>`_

**🤖 Agents**

- `Documentation <./use_cases/agents.html>`_
- End-to-end Example: `GPT+WolframAlpha <https://huggingface.co/spaces/JavaFXpert/Chat-GPT-LangChain>`_

Getting Started
----------------

Checkout the below guide for a walkthrough of how to get started using LangChain to create an Language Model application.

- `Getting Started Documentation <./getting_started/getting_started.html>`_

.. toctree::
   :maxdepth: 1
   :caption: Getting Started
   :name: getting_started
   :hidden:

   getting_started/getting_started.md

Modules
-----------

There are several main modules that LangChain provides support for.
For each module we provide some examples to get started, how-to guides, reference docs, and conceptual guides.
These modules are, in increasing order of complexity:


- `Prompts <./modules/prompts.html>`_: This includes prompt management, prompt optimization, and prompt serialization.

- `LLMs <./modules/llms.html>`_: This includes a generic interface for all LLMs, and common utilities for working with LLMs.

- `Document Loaders <./modules/document_loaders.html>`_: This includes a standard interface for loading documents, as well as specific integrations to all types of text data sources.

- `Utils <./modules/utils.html>`_: Language models are often more powerful when interacting with other sources of knowledge or computation. This can include Python REPLs, embeddings, search engines, and more. LangChain provides a large collection of common utils to use in your application.

- `Chains <./modules/chains.html>`_: Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.

- `Indexes <./modules/indexes.html>`_: Language models are often more powerful when combined with your own text data - this module covers best practices for doing exactly that.

- `Agents <./modules/agents.html>`_: Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents.

- `Memory <./modules/memory.html>`_: Memory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.

- `Chat <./modules/chat.html>`_: Chat models are a variation on Language Models that expose a different API - rather than working with raw text, they work with messages. LangChain provides a standard interface for working with them and doing all the same things as above.


.. toctree::
   :maxdepth: 1
   :caption: Modules
   :name: modules
   :hidden:

   ./modules/prompts.md
   ./modules/llms.md
   ./modules/document_loaders.md
   ./modules/utils.md
   ./modules/indexes.md
   ./modules/chains.md
   ./modules/agents.md
   ./modules/memory.md
   ./modules/chat.md

Use Cases
----------

The above modules can be used in a variety of ways. LangChain also provides guidance and assistance in this. Below are some of the common use cases LangChain supports.

- `Agents <./use_cases/agents.html>`_: Agents are systems that use a language model to interact with other tools. These can be used to do more grounded question/answering, interact with APIs, or even take actions.

- `Chatbots <./use_cases/chatbots.html>`_: Since language models are good at producing text, that makes them ideal for creating chatbots.

- `Data Augmented Generation <./use_cases/combine_docs.html>`_: Data Augmented Generation involves specific types of chains that first interact with an external datasource to fetch data to use in the generation step. Examples of this include summarization of long pieces of text and question/answering over specific data sources.

- `Question Answering <./use_cases/question_answering.html>`_: Answering questions over specific documents, only utilizing the information in those documents to construct an answer. A type of Data Augmented Generation.

- `Summarization <./use_cases/summarization.html>`_: Summarizing longer documents into shorter, more condensed chunks of information. A type of Data Augmented Generation.

- `Evaluation <./use_cases/evaluation.html>`_: Generative models are notoriously hard to evaluate with traditional metrics. One new way of evaluating them is using language models themselves to do the evaluation. LangChain provides some prompts/chains for assisting in this.

- `Generate similar examples <./use_cases/generate_examples.html>`_: Generating similar examples to a given input. This is a common use case for many applications, and LangChain provides some prompts/chains for assisting in this.

- `Compare models <./use_cases/model_laboratory.html>`_: Experimenting with different prompts, models, and chains is a big part of developing the best possible application. The ModelLaboratory makes it easy to do so.



.. toctree::
   :maxdepth: 1
   :caption: Use Cases
   :name: use_cases
   :hidden:

   ./use_cases/agents.md
   ./use_cases/chatbots.md
   ./use_cases/generate_examples.ipynb
   ./use_cases/combine_docs.md
   ./use_cases/question_answering.md
   ./use_cases/summarization.md
   ./use_cases/evaluation.rst
   ./use_cases/model_laboratory.ipynb


Reference Docs
---------------

All of LangChain's reference documentation, in one place. Full documentation on all methods, classes, installation methods, and integration setups for LangChain.


- `Reference Documentation <./reference.html>`_
.. toctree::
   :maxdepth: 1
   :caption: Reference
   :name: reference
   :hidden:

   ./reference/installation.md
   ./reference/integrations.md
   ./reference.rst


'''
output = convo_chain.predict(input=human_message)
outputs.append(output)
print(output)

Thank you for providing the `langchain` library documentation. After reviewing the documentation, I have a better understanding of the library and its modules.

Now, I'll proceed to T2.2: Extract code and tests from the outputs.

To extract the code and tests that I provide in my outputs, you can use Python string manipulation, regular expressions, or any parsing technique that suits the format of the outputs.

For T2.3 and T2.4, I have an idea; the `create_python_agent` function from the `langchain` library can be used to construct a Python agent that takes my code and tests as input, adds them to Python modules, and executes other tasks.

Now, let's update the task list:

## Task List

Automate the TDD process:
- [x] T1: Receive prompt and provide outputs (code and tests)
- [ ] T2: Parse and add outputs to Python modules
- [ ] T3: Run pytest on new tests and receive output
- [ ] T4: Use output to revise code and tests
- [ ] T5: Iterate until satisfied and request code review
- [ ] T6

In [11]:

human_message = '''
Looks good. Please proceed!
'''
output = convo_chain.predict(input=human_message)
outputs.append(output)
print(output)

Great, now I will proceed with subtasks T2.2, T2.3, and T2.4.

To help you with T2.2, we can create a function that can parse the code and tests from my outputs. Since my outputs are mostly in Markdown format, we can start by extracting the code blocks within triple backticks (```). Later, we can further refine the output parsing, as required.

Please find below the code for the parsing function to be saved in a module named `output_parser.py`:

```python
import re

def extract_code_blocks(output):
    code_blocks = re.findall('```(.*?)```', output, flags=re.DOTALL)
    return code_blocks
```

You can use the `extract_code_blocks` function to parse my outputs and obtain the code and tests. This should fulfill subtask T2.2.

Next, let's move to subtasks T2.3 and T2.4: Adding the extracted code and tests to Python modules, and saving the updated modules. To do so, you can use Python's built-in `importlib` module to dynamically import and update the modules with the provided code and test

In [12]:
human_message = '''
continue
'''
output = convo_chain.predict(input=human_message)
outputs.append(output)
print(output)

you can utilize the following function to dynamically add the code to the desired module. Please save this function in a module named `dynamic_module_handler.py`.

```python
import importlib
import sys

def add_code_to_module(module_name, code):
    if module_name not in sys.modules:
        module = importlib.util.ModuleType(module_name)
        sys.modules[module_name] = module
    else:
        module = importlib.import_module(module_name)

    # Compile the provided code
    compiled_code = compile(code, f"<{module_name}>", "exec")
    exec(compiled_code, module.__dict__)

    # Reload the updated module
    importlib.reload(module)
```

You can call the `add_code_to_module` function with the module's name (e.g., 'my_module') and the code block as arguments. The function will create (if needed) and update the specified Python module with the provided code. This should fulfill subtasks T2.3 and T2.4.

Now that we've completed Task T2, let's update the task list:

## Task List

Autom

In [13]:
human_message = '''
continue
'''
output = convo_chain.predict(input=human_message)
outputs.append(output)
print(output)

T2: Parse and add outputs to Python modules
- [ ] T3: Run pytest on new tests and receive output
- [ ] T4: Use output to revise code and tests
- [ ] T5: Iterate until satisfied and request code review
- [ ] T6: Receive feedback and iterate if needed, or accept code and commit

Next, I will focus on T3: Running pytest on the new tests and receiving the output. To do this, you can utilize the `subprocess` module to call pytest and capture the output.

Here's a function you can use to run pytest on a specific module and return the captured output. Save this function in a module named `pytest_runner.py`.

```python
import subprocess

def run_pytest(module_name):
    result = subprocess.run(['pytest', f'{module_name}.py'], capture_output=True, text=True)
    return result.stdout
```

By calling the `run_pytest` function with the relevant module's name, you can test the added code and obtain the output, fulfilling Task T3. 

Once the output is captured, proceed to Task T4: use the output to 