DoLQ: Discovering Ordinary Differential Equations with LLM-Based Qualitative and Quantitative Evaluation
DoLQ is a multi-agent framework for discovering governing ordinary differential equations (ODEs) from observational data. It combines language-model reasoning with numerical optimization so the search loop can rank candidate terms by both fit and physical plausibility.
Project site: https://bon99yun.github.io/DoLQ/
The figure below shows how the sampler, optimizer, and scientist components interact in the full loop.
Figure 2. DoLQ framework overview.
The code in this repository follows the paper’s iterative loop:
- Sampler prompt packages the system description, accumulated knowledge, term-by-term evaluation context, removed terms, required conditions, and output examples.
- Sampler LLM response proposes multiple hypotheses, with each hypothesis expressing per-dimension term lists and reasoning.
- Function generation and optimizer converts the sampled terms into executable ODE functions and fits the parameters with numerical optimization.
- Scientist prompt combines the system description, accumulated insights, removed terms, and the latest experiment results for the global best, previous best, and current attempt.
- Quantitative and qualitative evaluation scores term contribution impacts and semantic plausibility before the scientist summarizes the result.
- Feedback synthesis maps the evaluation signals into
Keep,Hold, andRemoveactions. - Next iteration carries the accumulated knowledge and removed terms into the next sampler prompt.
- Interpretability first: DoLQ searches over symbolic terms rather than opaque black-box models.
- Qualitative + quantitative evaluation: candidate terms are judged by both fit and domain knowledge.
- Iterative refinement: the system remembers removed terms and prior learnings while it searches.
- Multi-dimensional ODEs: the implementation tracks the full ODE system, not isolated equations.
- Practical experiment logging: each run writes per-iteration JSON plus a final report.
- Benchmark coverage: the provided data support the paper-table benchmark IDs in 2D and 4D in this checkout.
A single experiment follows the figure-backed loop above: load the selected benchmark split, optionally load a natural-language system description, initialize a coupled linear system, sample candidate terms, optimize their coefficients, evaluate terms with the Scientist Agent when enabled, and persist the best ODE system plus iteration artifacts.
Implementation entry points:
main.pyorchestrates the experiment.evolution.pybuilds the LangGraph workflow and agent nodes.optimization.pyruns parameter fitting.prompt.pyandwith_structured_output.pydefine the LLM prompts and structured outputs.io_utils.pywrites the iteration JSON, generated-equations log, and final reports.data_loader.pyloads the benchmark CSVs and optional variable descriptions.
DoLQ/
├── main.py # CLI entry point for experiments
├── evolution.py # LangGraph evolution loop and agent nodes
├── optimization.py # Differential Evolution / BFGS fitting
├── prompt.py # Prompt templates for Sampler and Scientist agents
├── with_structured_output.py # Pydantic schemas for structured LLM outputs
├── init_func_str.py # Initial linear system generation
├── compare.py # Best-system comparison helpers
├── data_loader.py # CSV and description loading
├── io_utils.py # Logging and report writers
├── utils.py # Term / equation conversion helpers
├── config.py # Hyperparameters and model defaults
├── Makefile # `make setup` / `make clean`
├── install_env.sh # Conda environment setup script
├── requirements.txt # Python dependencies
├── docs/ # GitHub Pages project site and shared figures
├── run_bash/ # Example batch-run scripts
└── data/
├── 2D/
├── 4D/
├── json/
└── source/
Note: this checkout currently includes the paper-table benchmark data under
2Dand4D; there is nodata/1D/ordata/3D/directory in the current benchmark layout.
The project page is available at https://bon99yun.github.io/DoLQ/.
The static site source lives in docs/, with local assets under docs/assets/.
- Python 3.11+
- Conda (Anaconda or Miniconda)
- An OpenRouter API key
- Internet access for dependency installation and model calls
make setupmake setup runs install_env.sh, which:
- creates a Conda environment named
DoLQ - activates that environment
- installs
uv - installs dependencies from
requirements.txt - falls back to
pip install -r requirements.txtif theuvinstall step fails
chmod +x install_env.sh
./install_env.shIf you prefer to manage the environment yourself, the script’s behavior is straightforward to reproduce: create a Python 3.11 Conda environment named DoLQ, activate it, and install the packages listed in requirements.txt.
If you want a Jupyter notebook / interactive kernel in the DoLQ environment, install it separately:
python -m pip install ipykernelDoLQ reads OPENROUTER_API_KEY from the environment or from a .env file in the repository root.
conda activate DoLQThe code loads that file automatically at startup, and evolution.py passes the key to langchain_openai.ChatOpenAI with the OpenRouter base URL.
You can change the models used by the Sampler and Scientist agents with:
--sampler_model_name--scientist_model_name
The defaults are google/gemini-2.5-flash-lite for both agents.
Create a .env file at the repository root:
OPENROUTER_API_KEY=your_openrouter_api_key_hereBoth main.py and evolution.py load .env automatically.
The default OpenRouter-compatible model in the code is google/gemini-2.5-flash-lite, but you can override the model names from the CLI.
The code reads benchmark data from:
data/{dim}D/{problem_name}/
Each benchmark folder contains:
{problem_name}_train.csv{problem_name}_test_id.csv{problem_name}_test_ood.csv- optional plot images for the generated trajectories and derivatives
The optional system description files live at:
data/json/{problem_name}.json
If --use_var_desc true is set and the description file is missing, the run stops with an error.
The repository snapshot includes the eight paper-table benchmark IDs: seven 2D systems and one 4D Glider variant.
The CLI is defined in main.py with argparse. The minimum required arguments are --problem_name, --dim, --max_params, and --evolution_num.
python main.py \
--problem_name ID_02 \
--dim 2 \
--max_params 8 \
--evolution_num 100 \
--use_var_desc true \
--use_scientist true \
--use_differential_evolution true \
--use_gt false \
--sampler_model_name "google/gemini-2.5-flash-lite" \
--scientist_model_name "google/gemini-2.5-flash-lite" \
--num_equations 3Boolean flags accept common truthy strings such as true, 1, yes, and y.
The run_bash/ directory contains shell wrappers for common benchmark runs:
run_bash/run_ID_01.bashrun_bash/run_ID_02.bashrun_bash/run_ID_03.bashrun_bash/run_ID_04.bashrun_bash/run_ID_05.bashrun_bash/run_ID_06.bashrun_bash/run_ID_07.bashrun_bash/run_ID_08.bash
These scripts are useful as templates for nohup runs, but note that the checked-in versions activate the DoLQ Conda environment created by make setup.
Each wrapper creates a unique shell log under run_bash/nohup_log/ and prints the exact path before launching the background process.
The log filename includes the benchmark ID, a UTC timestamp, and the shell PID, for example:
run_bash/nohup_log/ID_02_experiment_20260416T121500Z_12345.log
This avoids accidentally overwriting the previous nohup output while keeping the experiment output directories under logs/ unchanged.
| Argument | Description |
|---|---|
--problem_name |
Benchmark problem identifier, such as ID_02 or ID_08 |
--dim |
ODE system dimension for the bundled data (2 or 4) |
--max_params |
Maximum number of parameters per equation |
--evolution_num |
Number of evolution iterations |
| Argument | Default | Description |
|---|---|---|
--use_var_desc |
false |
Load the natural-language problem description from data/json/{problem_name}.json |
--use_differential_evolution |
true |
Enable Differential Evolution during optimization |
--use_scientist |
false |
Enable the Scientist Agent loop |
--recursion_limit |
15 |
LangGraph recursion limit |
--timeout |
180 |
Timeout in seconds for each LLM call |
--max_retries |
2 |
Maximum retry count for failed LLM calls |
--sampler_model_name |
google/gemini-2.5-flash-lite |
Model used by the Sampler Agent |
--scientist_model_name |
google/gemini-2.5-flash-lite |
Model used by the Scientist Agent |
--num_equations |
3 |
Number of candidate equations generated per iteration |
--de_tolerance |
1e-5 |
Differential Evolution tolerance |
--bfgs_tolerance |
1e-9 |
BFGS tolerance |
--use_gt |
false |
Use ground-truth targets instead of gradient-based targets |
--forget_prob |
0.01 |
Probability of re-exploring previously removed terms |
--use_var_desc trueonly helps when a matching JSON description exists.--use_gt trueswitches the target from the gradient-based trajectory to the ground-truth target used by the code.
Boolean flags accept strings such as true/false, yes/no, and 1/0.
Each run writes to a directory under:
logs/{problem_name}/{sampler_model_name_sanitized}/{run_flags_timestamp}/
The run-folder name encodes the experiment flags: desc, de, scientist, gt, forget_prob, and the start timestamp.
If you launch a checked-in run_bash/*.bash wrapper, the wrapper also writes the captured shell/nohup stream to a timestamped file under:
run_bash/nohup_log/{problem_name}_experiment_{YYYYMMDDTHHMMSSZ}_{shell_pid}.log
The wrapper prints this path at startup. These shell logs are separate from the machine-readable experiment artifacts below.
| Path | Content |
|---|---|
iteration_json/{problem_name}_{iteration}.json |
Serialized output for each iteration |
report/generated_equations.json |
Incrementally updated candidate-equation summary |
| Path | Content |
|---|---|
report/final_report.json |
Machine-readable experiment summary |
report/final_report.txt |
Human-readable summary |
The final report includes:
- experiment configuration
- best scores, equations, and parameters per dimension
- the best iteration found
- the research notebook / accumulated scientist notes
- runtime metrics summary
- execution-environment metadata such as CPU, memory, Python version, and NumPy version
data_loader.py expects this layout for each run:
data/{dim}D/{problem_name}/
├── {problem_name}_train.csv
├── {problem_name}_test_id.csv
└── {problem_name}_test_ood.csv
Optional variable descriptions live at:
data/json/{problem_name}.json
- 2D benchmark folders:
ID_01,ID_02,ID_03,ID_04,ID_05,ID_06,ID_07 - 4D benchmark folders:
ID_08 - Description JSON files:
ID_01,ID_02,ID_03,ID_04,ID_05,ID_06,ID_07,ID_08
A few practical notes:
- The repository does not currently ship
data/1D/ordata/3D/folders. - The bundled benchmark data keeps only the noise-free (
sigma=0) split. - If
--use_var_desc trueis set for a problem without a matching JSON file, the code warns and continues without the description. - The
data/source/directory stores benchmark source PDFs and is not required to run the CLI.
If the key is unset, the LLM client initialization in evolution.py fails. Add the key to .env or export it in your shell before running main.py.
Run:
make setupIf that still fails, check that Conda is installed and that the DoLQ environment name is available for creation.
Check all parts of the path:
--problem_name--dim- the problem directory under
data/{dim}D/ - the expected CSV filenames
If you enable --use_var_desc true for a problem without data/json/{problem_name}.json, the code prints a warning and continues without a description. Disable the flag or add the missing JSON file.
The checked-in run_bash/*.bash files call conda activate DoLQ, matching the environment created by this repository’s setup script.
Try one or more of the following:
- increase
--timeout - reduce
--evolution_num - lower
--num_equations - use a faster model name for
--sampler_model_nameand--scientist_model_name
Citation metadata is intentionally left as a skeleton until the final bibliographic record is ready.
@inproceedings{,
title = {},
author = {},
booktitle = {},
year = {},
url = {}
}This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
See: https://creativecommons.org/licenses/by-nc-sa/4.0/
If you run into a bug, open an issue in the repository and include:
- the problem name and dimension
- the command you ran
- the last few lines of the log directory for the run
