This repository contains the official code and experiments for the research paper: APILOT: Navigating Large Language Models to Generate Secure Code by Sidestepping Outdated API Pitfalls.
Large Language Models (LLMs) are often trained on static, outdated datasets. This leads to a "time gap" where they may recommend code that uses outdated APIs, which can be insecure or non-functional. These outdated APIs can be deprecated, have known vulnerabilities that are patched in later versions, or have modified usage patterns.
This project introduces APILOT, a system that uses a Retrieval-Augmented Generation (RAG) approach to solve this problem. APILOT maintains a real-time, updatable dataset of outdated APIs and uses it to navigate LLMs, guiding them to generate secure, version-aware code. Our evaluation shows that APILOT reduces outdated code recommendations by 89.42% on average across seven state-of-the-art LLMs.
-
artifacts/: Contains the core implementation and evaluation framework for the APILOT system. The main experiment script isexperiment.sh.- TIPS: It may cause several thousand dollars to reproduce the experiment. To provie the Reproducibility and Functionality, please try the demo script as belw.
-
artifacts/main.py: Contains the main script to run the experiment. To run the demo experiment, use the following code.--packageto denote which package you want to inverstigate into.--csv_nameto denote the temperature of the model.--data_nameto denote the outdated APIs。--modelto denote the model you want to investigate.--gen_timeto denote the potential candidates we let LLM generate per run.--iter_timeto denote the iteration time of regenerating with ban list.--thresto denote the similarity threshold for predicting related outdated API before input the user's prompt to LLM.--instrto decide whether we use prediction for related outdated API based on user input. The default value isTrue, if you use this parameter, it means that we do not use prediction.
cd artifacts python3.10 main.py --package "networkx" --csv_name "temp_1" --data_name "CaseStudy_networkx.csv" --model "gpt-4o-mini-2024-07-18" --gen_time 1 --iter_time 1 --thres 7 --instr- Result will be saved in
Final_EvalFolder.
-
artifacts/optimization/new_prompts: Includes the prompts that mimics user input. -
claims/analyze_result: Perform the statical analysis on the output result byartifacts/main.py. -
claims/eval: Includes the evaluation scripts for Functionality (functionality_eval.py) and Usability (usability_calculated.py).
This project requires Python 3.10 or higher. You can install the necessary packages using pip:
pip install pandas openai replicate mistralai huggingface_hub h5pyBefore running any experiments, you need to download the necessary data and configure your API keys.
- Configure API Keys: Navigate to
LLM-api/config.pyand replace the placeholder values with your actual API keys.
The main script to run the full suite of experiments and evaluate the APILOT system is artifacts/experiment.sh. This script will iterate through a predefined list of models and packages.