This repository contains the data used to train and to test the models from the DeepMDC repository. It also includes the supplementary material of the related article.
All genomic data provided here is from Klebsiella pneumoniae and was retrieved from this article (Training and Test 1) and this project (Test 2).
Training Samples: k_pneumoniae_meropenem.tar.xz
| Class | n |
|---|---|
| R | 775 |
| S | 715 |
| I (S) | 60 |
| Total | 1550 |
Test 1 Samples: k_pneumoniae_meropenem_test1.tar.xz
| Class | n |
|---|---|
| R | 77 |
| S | 72 |
| I (S) | 6 |
| Total | 155 |
Test 2 Samples: k_pneumoniae_meropenem_test2.tar.xz
| Class | n |
|---|---|
| R | 51 |
| S | 46 |
| I (S) | 3 |
| Total | 100 |
Training Samples: k_pneumoniae_gentamicin.tar.xz
| Class | n |
|---|---|
| R | 975 |
| S | 858 |
| I (S) | 117 |
| Total | 1950 |
Test 1 Samples: k_pneumoniae_gentamicin_test1.tar.xz
| Class | n |
|---|---|
| R | 97 |
| S | 86 |
| I (S) | 12 |
| Total | 195 |
Test 2 Samples: k_pneumoniae_gentamicin_test2.tar.xz
| Class | n |
|---|---|
| R | 42 |
| S | 68 |
| I (S) | 3 |
| Total | 113 |
Training Samples: k_pneumoniae_ceftazidime.tar.xz
| Class | n |
|---|---|
| R | 1470 |
| S | 287 |
| I (S) | 38 |
| Total | 1795 |
Test 1 Samples: k_pneumoniae_ceftazidime_test1.tar.xz
| Class | n |
|---|---|
| R | 147 |
| S | 29 |
| I (S) | 4 |
| Total | 180 |
Test 2 Samples: k_pneumoniae_ceftazidime_test2.tar.xz
| Class | n |
|---|---|
| R | 80 |
| S | 18 |
| I (S) | 7 |
| Total | 105 |
Training Samples: k_pneumoniae_cefepime.tar.xz
| Class | n |
|---|---|
| R | 1030 |
| S | 519 |
| I (S) | 186 |
| Total | 1735 |
Test 1 Samples: k_pneumoniae_cefepime_test1.tar.xz
| Class | n |
|---|---|
| R | 103 |
| S | 51 |
| I (S) | 19 |
| Total | 173 |
Test 2 Samples: k_pneumoniae_cefepime_test2.tar.xz
| Class | n |
|---|---|
| R | 75 |
| S | 9 |
| I (S) | 10 |
| Total | 94 |
This folder contains all ORFs extracted from the data directory, stored in JSON format along with their respective annotations generated using Bakta. This data (all_ORFs.json.xz) were used to retrieve biological information from the generated models.