MAchine Learning CArbapenemase
Cécile EMERAUD1,2,3, Yahia BENZERARA4, Hippolyte DE SWARDT2,
Alexandra AUBRY5,6, Nicolas VEZIRIS4, Agnès B. JOUSSET1,2,3,
Inès REZZOUG1,2,3, Léna LATOUR2, Alice PAGÈS2,
Sarah RONSIN2, Corentin POIGNON5,6, Rémy A. BONNIN1,2,3,
Mariette MATONDO7, Quentin GIAI GIANETTO7,8, Laurent DORTET1,2,3,
Alexandre GODMER4,5,6,7*
*Corresponding author: alexandre.godmer@aphp.fr

- Bacteriology–Hygiene Unit, Bicêtre Hospital, AP-HP (Assistance Publique–Hôpitaux de Paris), Le Kremlin-Bicêtre, France
- Team “Resist”, UMR1184 Immunology of Viral, Auto-Immune, Hematological and Bacterial Diseases (IMVA-HB),
INSERM, Université Paris-Saclay, CEA, Le Kremlin-Bicêtre, France - Associated French National Reference Center for Antibiotic Resistance:
Carbapenemase-Producing Enterobacterales, Le Kremlin-Bicêtre, France - Department of Bacteriology, Saint-Antoine Hospital, AP-HP, Sorbonne Université, Paris, France
- Sorbonne Université, INSERM, U1135, Centre d’Immunologie et des Maladies Infectieuses (Cimi-Paris), Paris, France
- AP-HP, Sorbonne Université, Pitié-Salpêtrière Hospital,
National Reference Center for Mycobacteria and Mycobacterial Drug Resistance, Paris, France - Institut Pasteur, Université Paris Cité, Proteomics Platform,
Mass Spectrometry for Biology Unit, CNRS UAR 2024, Paris, France - Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, Paris, France
This GitHub repository provides a minimal, single-file R demonstration of a MALCA-like workflow, including:
- generation of synthetic disk diffusion diameter data,
- Random Forest–based Recursive Feature Elimination (RF-RFE),
- training of a Random Forest classifier,
- evaluation on a held-out test set,
- computation of a simple confidence score, defined as the maximum predicted class probability.
Important
This repository uses synthetic data only and is intended solely for methodological demonstration and tutorial purposes.
- R ≥ 4.0 (a recent version is recommended)
- R packages:
caretrandomForestpROCMLmetrics
From within R:
## Typical installation time is less than 5 minutes on a standard desktop computer.
install.packages(c("caret", "randomForest", "pROC", "MLmetrics"))
## Run the demo
## From the repository root, run:
Rscript malca_demo.R
## Expected demo run time: approximately 3-5 minutes on a standard desktop CPU (Windows 11 x64; R 4.3.2).The script outputs:
-
selected variables from RF-RFE,
-
model summary,
-
confusion matrix on the held-out test set,
-
performance on a high-confidence prediction subset,
-
a one-vs-rest AUC example (OXA-48 vs. others),
-
sessionInfo() for reproducibility.
This repository is distributed under the MALCA Software License — Evaluation and Non-Commercial Research Only
(see the LICENSE file for full terms).
- Internal evaluation
- Non-commercial academic research
- Clinical or diagnostic use
- Regulatory use
- Any commercial deployment
Redistribution, sublicensing, or making the Software available to third parties is not permitted without prior written authorization from the rightsholder(s).
No patent rights are granted under this license.
This includes, but is not limited to, FR2415430 and related patent applications.
For any use of MALCA in a product, clinical workflow, or commercial setting, please contact the rightsholder institution(s) to discuss a separate licensing agreement.
This public repository is a toy implementation designed to illustrate the methodological structure
(RF-RFE + Random Forest) without releasing clinical isolate-level data or the full patented implementation.
In the associated study, model development and validation were conducted on clinical collections under applicable data governance, ethical, and intellectual property constraints.
For scientific evaluation requests (e.g., reviewers or editors), please contact the corresponding author to discuss controlled access to additional materials, subject to institutional policies and IP agreements.