This repository contains the code and data to fully reproduce the results presented in the paper "CP-Model-Zoo: A Natural Language Query System for Constraint Programming Models".
data/input/csplib: Contains CP problems from CSPLib with MiniZinc implementations and specificationsdata/input/csplib_descriptions_obfuscated: Problem descriptions with problem names removeddata/input/csplib_models_concat: Compiled MiniZinc implementations for each problemdata/input/minizinc_source_codes: MiniZinc example files in both.mznand.txtformatsdata/input/merged_mzn_source_codes: Final database merging CSPLib and MiniZinc example implementations
data/output/generated_descriptions: Contains generated problem descriptions at three expertise levels:beginner.txt: Simplified problem descriptionsmedium.txt: Intermediate-level problem descriptionsexpert.txt: Technical problem descriptionssource_code.txt: Original source code
data/results/exp1: Leave-one-out experiment results with MRR metricsdata/results/exp2: CSPLib experiment results with MRR metrics
data/vector_dbs/code_as_text: Vector store indices for different combinations of expertise levels
# Create a virtual environment
python -m venv venv
# Activate virtual environment
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate# Install the required packages
pip install -r requirements.txtTo use this system, you'll need a Groq API key:
- Generate an API key from Groq
- Create a
.envfile in theapp/assets/envfolder with the following content:GROQ_API_KEY=your_groq_api_key
Alternatively, you can pass your API key directly as a command-line argument with the --groq_api_key parameter when
running the scripts.
To create or recreate the vector embedding databases (indices), run the indexing script:
python run_indexing.pyThis script performs two main operations:
- Generates problem descriptions at different expertise levels
- Creates vector stores from the generated descriptions
The indices will be saved in the ./data/vector_dbs/code_as_text/ directory with separate subdirectories for each
expertise level.
Tip
To add a new MiniZinc model into the database, all you need to do is to create a .txt file, containing the MiniZinc implementation into the data/input/merged_mzn_source_code folder. Once this is done, simply rerun the indexing as stated above.
This will create a new set of vector stores in the data/vector_dbs folder.
To run the experiments described in the paper, execute the experiments script:
python run_experiments.pyThis script automatically performs two main experiments:
- Leave-One-Out Experiment: Evaluates the system's ability to retrieve relevant models when given a query derived from a held-out model.
- CSPLib Experiment: Evaluates retrieval performance on the CSPLib problem collection.
Results will be saved in the data/results/exp1 and data/results/exp2 directories, including Mean Reciprocal Rank (
MRR) metrics and detailed retrieval analyses.
For debugging and interactive exploration of the system, use the Inference Tool:
python run_inference.py --storage_dir ./data/vector_dbs/code_as_text/mediumThe --storage_dir parameter specifies which embedding database (Index) you want to query:
- Beginner level:
--storage_dir ./data/vector_dbs/code_as_text/beginner - Medium level:
--storage_dir ./data/vector_dbs/code_as_text/medium - Expert level:
--storage_dir ./data/vector_dbs/code_as_text/expert - Combined levels:
--storage_dir ./data/vector_dbs/code_as_text/beginnermediumexpert
Once running, you can enter questions about constraint programming algorithms and problems. The tool will display ranked results based on relevance to your query. Type 'quit' to exit the program.