-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Describe the bug
When running PreProcess.reduce(extend=True) on the same model multiple times, the function returns different sets of removed reactions. This non-determinism prevents reproducibility for downstream analysis.
To Reproduce
Steps to reproduce the behavior:
- Go to your root directory.
- Create a test file.
- here is a minimal script using
e_coli_core.jsonto recreate the error
import numpy as np
import cobra.io
from dingo.preprocess import PreProcess
def test_extend_nondeterminism():
ecoli = cobra.io.load_json_model("ext_data/e_coli_core.json")
np.random.seed(42)
preprocessor_A = PreProcess(ecoli.copy(), tol=1e-6, verbose=False)
removed_A, _ = preprocessor_A.reduce(extend=True)
np.random.seed(42)
preprocessor_B = PreProcess(ecoli.copy(), tol=1e-6, verbose=False)
removed_B, _ = preprocessor_B.reduce(extend=True)
set_A, set_B = frozenset(removed_A), frozenset(removed_B)
if set_A != set_B:
print(f"Reactions removed ONLY in Run A: {sorted(set_A - set_B)}")
print(f"Reactions removed ONLY in Run B: {sorted(set_B - set_A)}")
if __name__ == "__main__":
test_extend_nondeterminism()
Expected behavior
Given the same metabolic model and the same algorithm parameters, reduce(extend=True) should always return the same set of removed reactions.
Screenshots
Proposed fix
Add an optional steady_states parameter to reduce().
When steady_states is provided by the caller, the internal sampling step is skipped entirely and the provided matrix is used directly for correlation estimation. When steady_states=None (the default), the current internal sampling behaviour is preserved as a fallback for convenience, but a UserWarning is emitted so the user is informed that results will not be reproducible.
Desktop
- OS: WSL / Ubuntu
- Browser : chrome