This repository contains the public code for the paper “Agentic Tooling with Model Context Protocol Outperforms RAG and Long-Context Windows for Repository-Level Coding”. To support reproducibility, we provide a public codebase that mirrors our data preparation and tool exposure pipeline, which includes Abstract Syntax Tree (AST) parsing, structured JSON references, and MCP tool definitions. The ASK-compliant MCP tools in our public codebase empower a language model to answer questions about an open source repository readily available online rather than an internally developed repository.
The scikit-learn repository is required as input. For convenience, the scikit-learn/ repo is included as a git submodule. To obtain the repository contents, please run git submodule add https://github.com/scikit-learn/scikit-learn.git scikit-learn after cloning this repository. Note that the repository is referred to as sklearn throughout our code. Below is a breakdown of the other files found in our repository:
-
reference_json_scraping_config.yaml: Configurations used to parse the sklearn repository. Must adhere to defined structure. Configuration options include 1. verbose (whether to print updates while running) 2. skip_private_functions (whether to skip/include private functions in the generated db) 3. include_classes (whether to include class information in the generated db) 4. sections in docstrings to skip, etc. -
generate_reference_json.py: Processes the raw sklearn repository and outputs the reference database used for MCP tooling. Run via:python generate_reference_json.py --config path_to_config_file
-
sklearn_function_reference.json: Output ofgenerate_reference_json.pyusing the default configurations inreference_json_scraping_config.yaml. This JSON file serves as the reference database that an LLM can query through tool calls.
-
ask_AI_config.yaml: Configurations used to ask an AI model a query with the MCP tools. -
ask_AI.py: Script to ask an OpenAI model a query while providing MCP tool descriptions. The default tools are defined inmcp_tools_sklearn.pywhile the parsed tool options are listed inask_AI_config.yaml. The tools enable the LLM to quickly search through and retrieve documentation regarding the sklearn repository (without having to parse source code). To use this script, an OpenAI API key must be obtained and listed in the configuration file. Customizable options include the user query, name of config file, and an optional argument to save the AI model's response to a file. Run via:python ask_AI.py --config_name ask_AI_config.yaml --query "User's sklearn query to ask model"
To change optional arguments, run/change options in the following template:
python ask_AI.py --config_name config_file_name --query "User's sklearn query to ask model" --output_file name_of_output_file-
mcp_tools_sklearn.py: Implements two ASK-compliant tools similar to those exposed to the LLMs in our study. -
sklearn_meta_prompt.py: Example of a system prompt used when running the experiments described in the paper. This prompt is prepended to each user query when runningask_AI.py.