# Solve SWE-bench issues

This notebook demonstrates how to solve issues in the [SWE-bench dataset](https://www.swebench.com/) using Moatless tools. The philosophy behind Moatless tools is that at the current level of LLMs, reasoning is not the most important aspect, but rather providing the LLM with the correct context in prompts and handling responses in an appropriate manner. This can be resolved by focusing on building good tooling that an agent can use.

## SWE-bench
SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub. Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.

To execute a specific instance of the SWE-bench, specify the `instance_id` parameter. The dataset will then be downloaded, and the instance will be loaded.

In [1]:
from benchmark import swebench

data_dir = None # Set this to store the dataset to disk

instance_id = 'pytest-dev__pytest-5808'
instance_data = swebench.get_instance(instance_id, dataset_name='princeton-nlp/SWE-bench', split='test', data_dir=data_dir)

swebench.display_problem_statement(instance_data)

### Problem statement
Lexer "python3" in --pastebin feature causes HTTP errors
The `--pastebin` option currently submits the output of `pytest` to `bpaste.net` using `lexer=python3`: https://github.com/pytest-dev/pytest/blob/d47b9d04d4cf824150caef46c9c888779c1b3f58/src/_pytest/pastebin.py#L68-L73

For some `contents`, this will raise a "HTTP Error 400: Bad Request".

As an example:
~~~
>>> from urllib.request import urlopen
>>> with open("data.txt", "rb") as in_fh:
...     data = in_fh.read()
>>> url = "https://bpaste.net"
>>> urlopen(url, data=data)
HTTPError: Bad Request
~~~
with the attached [data.txt](https://github.com/pytest-dev/pytest/files/3561212/data.txt).

This is the underlying cause for the problems mentioned in #5764.

The call goes through fine if `lexer` is changed from `python3` to `text`. This would seem like the right thing to do in any case: the console output of a `pytest` run that is being uploaded is not Python code, but arbitrary text.



### Expected patch
```diff
diff --git a/src/_pytest/pastebin.py b/src/_pytest/pastebin.py
--- a/src/_pytest/pastebin.py
+++ b/src/_pytest/pastebin.py
@@ -65,7 +65,7 @@ def create_new_paste(contents):
     from urllib.request import urlopen
     from urllib.parse import urlencode
 
-    params = {"code": contents, "lexer": "python3", "expiry": "1week"}
+    params = {"code": contents, "lexer": "text", "expiry": "1week"}
     url = "https://bpaste.net"
     try:
         response = (

```


## Index the code base
The code is cloned from GitHub and subsequently indexed in a vector store. This enables semantic searches within the codebase. The idea is that this serves as the entry point where an initial, broad search is conducted to identify potentially relevant files.

In [2]:
from moatless.utils.repo import setup_github_repo

path = setup_github_repo(repo=instance_data['repo'], base_commit=instance_data['base_commit'], base_dir='/tmp/repos')

Repo 'git@github.com:pytest-dev/pytest.git' already exists in '/tmp/repos/pytest-dev_pytest'
HEAD is now at 404cf0c87 Merge pull request #5764 from goerz/pastebin


### Index the code base
Indexing is done using Llama index. During the indexing phase, the code is divided into relevant chunks. This is accomplished with a special code splitter, `EpicSplitter`, that segments code files at the function and class level, retaining contextual information from parent classes/functions.

Currently, a custom vector store, `SimpleFaissVectorStore`, which is a hybrid of Faiss and SimpleVectorStore in LlamaIndex, is being used. This can easily be replaced with other implementations available in Llama index.

Currently, the pipeline is configured to use OpenAI's embedding model `text-embedding-3-small`, so the environment variable `OPENAI_API_KEY` must be set.

In [3]:
from moatless.ingestion import CodeBaseIngestionPipeline
from moatless.splitters.epic_split import CommentStrategy, EpicSplitter
from moatless.store.simple_faiss import SimpleFaissVectorStore

from llama_index.core.storage.docstore import SimpleDocumentStore
from llama_index.embeddings.openai import OpenAIEmbedding
from dotenv import load_dotenv

persist_dir = None # Set this to persist and load data from disk 

vector_store = None
docstore = None

if persist_dir:
    try:
        vector_store = SimpleFaissVectorStore.from_persist_dir(persist_dir=persist_dir)
        docstore = SimpleDocumentStore.from_persist_dir(persist_dir=persist_dir)
    except Exception as e:
        print(f"Failed to load stores from {persist_dir}: {e}") 

if not vector_store:
    vector_store = SimpleFaissVectorStore.from_defaults()

if not docstore:
    docstore = SimpleDocumentStore()

embedding = OpenAIEmbedding(model="text-embedding-3-small")

ingestion_pipeline = CodeBaseIngestionPipeline(
    path=path,
    vector_store=vector_store,
    docstore=docstore,
    splitter=EpicSplitter(min_chunk_size=100, chunk_size=750, language="python", comment_strategy=CommentStrategy.ASSOCIATE),
    embed_model=embedding
)
vectors, tokens = ingestion_pipeline.run()
print(f"Indexed {vectors} vectors with {tokens} tokens")

if persist_dir:
    ingestion_pipeline.persist(persist_dir=persist_dir)

Parsing nodes:   0%|          | 0/113 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/881 [00:00<?, ?it/s]

Indexed 881 vectors with 235057 tokens


## Retrieve files
Code snippets are retrieved by identifying those snippets that are semantically similar to the problem statement specified in SWE-bench. Code snippets are retrieved until the context reaches 200k tokens.

In [4]:
from moatless.retriever import CodeSnippetRetriever

retriever = CodeSnippetRetriever(
    vector_store=vector_store,
    docstore=docstore,
    embed_model=embedding,
    max_context_tokens=200000
)

code_snippets = retriever.retrieve(query=instance_data['problem_statement'])

swebench.display_code_snippets(instance_data, code_snippets)

Unnamed: 0,File,Block,Start line,End line,Context length
0,src/_pytest/pastebin.py,1,1,38,268
1,src/_pytest/pastebin.py,pytest_terminal_summary,83,104,449
2,src/_pytest/pastebin.py,create_new_paste,57,80,660
3,src/_pytest/pastebin.py,pytest_unconfigure,41,54,805
4,src/pytest.py,1,1,107,1513
5,src/_pytest/faulthandler.py,io,1,36,1762
6,src/_pytest/pytester.py,pytest_addoption,32,54,1906
7,src/_pytest/python.py,pytest_addoption,56,108,2260
8,src/_pytest/skipping.py,pytest_configure,28,65,2623
9,src/_pytest/python.py,pytest_cmdline_main,111,129,2799


## Select relevant files
The retrieved code snippets are fed into an LLM, which selects the files that are most relevant based on the problem statement.

In this step, Claude 3 Haiku is used, which means you need an Anthropic API Key set in the environment variable `ANTHROPIC_API_KEY`.

In [10]:
from moatless.selector import FileSelector

file_selector = FileSelector(repo_path=path, model_name='claude-3-haiku-20240307', file_context_token_limit=50000)

selector_response = file_selector.select_files(query=instance_data['problem_statement'], code_snippets=code_snippets)

swebench.display_files(instance_data, selector_response)

Unnamed: 0,File,Context length
0,src/_pytest/pastebin.py,806
1,src/_pytest/config/__init__.py,3517


## Create development plan
Before coding begins, it is planned out in smaller tasks to develop parts of the codebase separately. This involves defining "code blocks" that need to be updated, where a code block can be, for example, a class or a function.

In [11]:
from moatless.planner import Planner

planner = Planner(repo_path=path, model_name='gpt-4-0125-preview')

file_paths = [file.file_path for file in selector_response.files]
response = planner.plan_development(instructions=instance_data['problem_statement'], files=file_paths)

swebench.display_plan(response)

### Thoughts

> To address the issue with the `--pastebin` option causing HTTP errors due to the `lexer=python3` parameter, we need to update the `create_new_paste` function in `src/_pytest/pastebin.py` to use `lexer=text` instead. This change will ensure that the console output of a `pytest` run, which is arbitrary text and not Python code, is correctly submitted to the pastebin service without causing HTTP 400 errors.

## Planned tasks
### Update src/_pytest/pastebin.py
        
Blocks to update: 

 * `create_new_paste`

#### Instructions
Change the value of the 'lexer' key in the 'params' dictionary from 'python3' to 'text'. This modification ensures that the pastebin service correctly interprets the submitted content as text, avoiding HTTP 400 errors.


## Write code
Finally, the code can be written. This is done task by task. We do not use any diff tool; instead, we expect only small parts of the codebase to be updated and for the LLM to return the entire code for this smaller part.

In [12]:
from moatless.coder import Coder

coder = Coder(repo_path=path, model_name='gpt-4-0125-preview')

for task in response.tasks:
    response = coder.write_code(main_objective=instance_data['problem_statement'],
                                instructions=task.instructions,
                                file_path=task.file_path,
                                block_paths=task.block_paths)

    swebench.display_code_response(response)

### Updated src/_pytest/pastebin.py
```diff
--- /tmp/repos/pytest-dev_pytest/src/_pytest/pastebin.py
+++ /tmp/repos/pytest-dev_pytest/src/_pytest/pastebin.py
@@ -65,7 +65,7 @@
     from urllib.request import urlopen
     from urllib.parse import urlencode
 
-    params = {"code": contents, "lexer": "python3", "expiry": "1week"}
+    params = {"code": contents, "lexer": "text", "expiry": "1week"}
     url = "https://bpaste.net"
     try:
         response = (

```


## Verify the change
Verify if the code is as expected by comparing it with the patch in SWE-bench. 

In [13]:
swebench.display_patches(data=instance_data, path=path)

### Expected patch
```diff
diff --git a/src/_pytest/pastebin.py b/src/_pytest/pastebin.py
--- a/src/_pytest/pastebin.py
+++ b/src/_pytest/pastebin.py
@@ -65,7 +65,7 @@ def create_new_paste(contents):
     from urllib.request import urlopen
     from urllib.parse import urlencode
 
-    params = {"code": contents, "lexer": "python3", "expiry": "1week"}
+    params = {"code": contents, "lexer": "text", "expiry": "1week"}
     url = "https://bpaste.net"
     try:
         response = (

```

### Generated patch


```diff
diff --git a/src/_pytest/pastebin.py b/src/_pytest/pastebin.py
index 38ff97f2d..77b4e2621 100644
--- a/src/_pytest/pastebin.py
+++ b/src/_pytest/pastebin.py
@@ -65,7 +65,7 @@ def create_new_paste(contents):
     from urllib.request import urlopen
     from urllib.parse import urlencode
 
-    params = {"code": contents, "lexer": "python3", "expiry": "1week"}
+    params = {"code": contents, "lexer": "text", "expiry": "1week"}
     url = "https://bpaste.net"
     try:
         response = (

```
