Skip to content

agodmer/MALCA_experiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MALCA - Toy Demonstration (RF-RFE + Random Forest)

Direct carbapenemase typing from disc diffusion antibiograms with MALCA

MAchine Learning CArbapenemase

Cécile EMERAUD1,2,3, Yahia BENZERARA4, Hippolyte DE SWARDT2,
Alexandra AUBRY5,6, Nicolas VEZIRIS4, Agnès B. JOUSSET1,2,3,
Inès REZZOUG1,2,3, Léna LATOUR2, Alice PAGÈS2,
Sarah RONSIN2, Corentin POIGNON5,6, Rémy A. BONNIN1,2,3,
Mariette MATONDO7, Quentin GIAI GIANETTO7,8, Laurent DORTET1,2,3,
Alexandre GODMER4,5,6,7*

*Corresponding author: alexandre.godmer@aphp.fr

ORCID iD icon https://orcid.org/0000-0002-5211-5796


Affiliations

  1. Bacteriology–Hygiene Unit, Bicêtre Hospital, AP-HP (Assistance Publique–Hôpitaux de Paris), Le Kremlin-Bicêtre, France
  2. Team “Resist”, UMR1184 Immunology of Viral, Auto-Immune, Hematological and Bacterial Diseases (IMVA-HB),
    INSERM, Université Paris-Saclay, CEA, Le Kremlin-Bicêtre, France
  3. Associated French National Reference Center for Antibiotic Resistance:
    Carbapenemase-Producing Enterobacterales, Le Kremlin-Bicêtre, France
  4. Department of Bacteriology, Saint-Antoine Hospital, AP-HP, Sorbonne Université, Paris, France
  5. Sorbonne Université, INSERM, U1135, Centre d’Immunologie et des Maladies Infectieuses (Cimi-Paris), Paris, France
  6. AP-HP, Sorbonne Université, Pitié-Salpêtrière Hospital,
    National Reference Center for Mycobacteria and Mycobacterial Drug Resistance, Paris, France
  7. Institut Pasteur, Université Paris Cité, Proteomics Platform,
    Mass Spectrometry for Biology Unit, CNRS UAR 2024, Paris, France
  8. Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, Paris, France

Repository content

This GitHub repository provides a minimal, single-file R demonstration of a MALCA-like workflow, including:

  • generation of synthetic disk diffusion diameter data,
  • Random Forest–based Recursive Feature Elimination (RF-RFE),
  • training of a Random Forest classifier,
  • evaluation on a held-out test set,
  • computation of a simple confidence score, defined as the maximum predicted class probability.

Important
This repository uses synthetic data only and is intended solely for methodological demonstration and tutorial purposes.


Requirements

  • R ≥ 4.0 (a recent version is recommended)
  • R packages:
    • caret
    • randomForest
    • pROC
    • MLmetrics

Install dependencies

From within R:

## Typical installation time is less than 5 minutes on a standard desktop computer.
install.packages(c("caret", "randomForest", "pROC", "MLmetrics"))
## Run the demo
## From the repository root, run:
Rscript malca_demo.R
## Expected demo run time: approximately 3-5 minutes on a standard desktop CPU (Windows 11 x64; R 4.3.2).

The script outputs:

  • selected variables from RF-RFE,

  • model summary,

  • confusion matrix on the held-out test set,

  • performance on a high-confidence prediction subset,

  • a one-vs-rest AUC example (OXA-48 vs. others),

  • sessionInfo() for reproducibility.

License, intellectual property, and patent notice

This repository is distributed under the MALCA Software License — Evaluation and Non-Commercial Research Only
(see the LICENSE file for full terms).

Permitted use

  • Internal evaluation
  • Non-commercial academic research

Prohibited use

  • Clinical or diagnostic use
  • Regulatory use
  • Any commercial deployment

Redistribution

Redistribution, sublicensing, or making the Software available to third parties is not permitted without prior written authorization from the rightsholder(s).

Patent notice

No patent rights are granted under this license.
This includes, but is not limited to, FR2415430 and related patent applications.

For any use of MALCA in a product, clinical workflow, or commercial setting, please contact the rightsholder institution(s) to discuss a separate licensing agreement.


Notes on transparency

This public repository is a toy implementation designed to illustrate the methodological structure
(RF-RFE + Random Forest) without releasing clinical isolate-level data or the full patented implementation.

In the associated study, model development and validation were conducted on clinical collections under applicable data governance, ethical, and intellectual property constraints.

For scientific evaluation requests (e.g., reviewers or editors), please contact the corresponding author to discuss controlled access to additional materials, subject to institutional policies and IP agreements.

About

MALCA

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages