ml-mcp-repo-level-coding

This repository contains the public code for the paper “Agentic Tooling with Model Context Protocol Outperforms RAG and Long-Context Windows for Repository-Level Coding”. To support reproducibility, we provide a public codebase that mirrors our data preparation and tool exposure pipeline, which includes Abstract Syntax Tree (AST) parsing, structured JSON references, and MCP tool definitions. The ASK-compliant MCP tools in our public codebase empower a language model to answer questions about an open source repository readily available online rather than an internally developed repository.

Repository Description

The scikit-learn repository is required as input. For convenience, the scikit-learn/ repo is included as a git submodule. To obtain the repository contents, please run git submodule add https://github.com/scikit-learn/scikit-learn.git scikit-learn after cloning this repository. Note that the repository is referred to as sklearn throughout our code. Below is a breakdown of the other files found in our repository:

Reference Database Creation

reference_json_scraping_config.yaml: Configurations used to parse the sklearn repository. Must adhere to defined structure. Configuration options include 1. verbose (whether to print updates while running) 2. skip_private_functions (whether to skip/include private functions in the generated db) 3. include_classes (whether to include class information in the generated db) 4. sections in docstrings to skip, etc.
generate_reference_json.py: Processes the raw sklearn repository and outputs the reference database used for MCP tooling. Run via:
```
python generate_reference_json.py --config path_to_config_file
```
sklearn_function_reference.json: Output of generate_reference_json.py using the default configurations in reference_json_scraping_config.yaml. This JSON file serves as the reference database that an LLM can query through tool calls.

Querying a Model with MCP Tools

ask_AI_config.yaml: Configurations used to ask an AI model a query with the MCP tools.
ask_AI.py: Script to ask an OpenAI model a query while providing MCP tool descriptions. The default tools are defined in mcp_tools_sklearn.py while the parsed tool options are listed in ask_AI_config.yaml. The tools enable the LLM to quickly search through and retrieve documentation regarding the sklearn repository (without having to parse source code). To use this script, an OpenAI API key must be obtained and listed in the configuration file. Customizable options include the user query, name of config file, and an optional argument to save the AI model's response to a file. Run via:
```
python ask_AI.py --config_name ask_AI_config.yaml --query "User's sklearn query to ask model"
```

To change optional arguments, run/change options in the following template:

python ask_AI.py --config_name config_file_name --query "User's sklearn query to ask model" --output_file name_of_output_file

mcp_tools_sklearn.py: Implements two ASK-compliant tools similar to those exposed to the LLMs in our study.
sklearn_meta_prompt.py: Example of a system prompt used when running the experiments described in the paper. This prompt is prepended to each user query when running ask_AI.py.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
.gitmodules		.gitmodules
ACKNOWLEDGEMENTS		ACKNOWLEDGEMENTS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ask_AI.py		ask_AI.py
generate_reference_json.py		generate_reference_json.py
mcp_tools_sklearn.py		mcp_tools_sklearn.py
requirements.txt		requirements.txt
sklearn_function_reference.json		sklearn_function_reference.json
sklearn_meta_prompt.py		sklearn_meta_prompt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ml-mcp-repo-level-coding

Repository Description

Reference Database Creation

Querying a Model with MCP Tools

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

apple/ml-mcp-repo-level-coding

Folders and files

Latest commit

History

Repository files navigation

ml-mcp-repo-level-coding

Repository Description

Reference Database Creation

Querying a Model with MCP Tools

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages