# Reproduction of DeepMicro: Deep Representation Learning for Disease Prediction based on Microbiome Data
#### Team Members
- Joaquin Ugarte (jugarte2@illinois.edu)
- Prasad Gole (gole2@illinois.edu)
- Ehit Agarwal (ehitda2@illinois.edu)

#### GitHub Repository
https://github.com/Prasad-py/DLH_Project

# Introduction
The gut microbiome is a vital component of human health.$^1$ Recent evidence suggests that it can be used to predict various diseases.$^2$ 16S rRNA gene sequencing technology can be used to profile the most common components of the human microbiome in a cost-effective way.$^3$ Similarly, a deeper, strain level, resolution profile of the microbial community can be obtained by shotgun metagenomic sequencing technology.$^4$ As the cost of obtaining microbiome data decreases, there is a novel opportunity for machine learning techniques to be employed for the early prediction of diseases. Detecting diseases early will not only decrease healthcare costs but also improve patient outcomes.

Microbiome datasets often contain samples in the range of 10-1000, while the data itself can have hundreds of thousands of dimensions. This poses a challenge to train machine learning models directly on the highly sparse data. The large number of features are computationally expensive and the relatively low number of samples makes the model less generalizabile to other datasets. As of April 2024, state-of-the-art techniques in this field make use of Conditional Generative Adversarial Networks (C-GANs)$^5$ to artificially augment the size of the dataset and Variational Information Bottlenecks (VIBs)$^{6,7}$ to extract only the relevant features for disease prediction while filtering out redundant information.

At the time of publication of the DeepMicro study$^8$, there had been little work on deep learning applications for microbiome data with a rigorous evaluation scheme. DeepMicro transforms high-dimensional microbiome data into a robust low-dimensional representation using an autoencoder and then applies machine learning classification on the learned representation. A thorough validation scheme optimizes hyper-parameters using a grid search, where the test set is excluded during cross-validation to ensure fairness. DeepMicro outperforms the current best approaches based on the strain-level marker profile$^9$ in five datasets, including IBD (AUC=0.955), EW-T2D (AUC=0.899), C-T2D (AUC=0.763), Obesity (AUC=0.659) and Cirrhosis (AUC=0.940). For the Colorectal dataset, DeepMicro has slightly lower performance than the best approach (DeepMicro's AUC=0.803 vs. MetAML's AUC=0.811) Additionally, reducing the dimensionality has sped up model training and hyperparameter tuning buy 8-30 times.$^8$

# Scope of Reproducibility:
- Hypothesis 1: Training classifiers using a lower dimensional representation will result in more accurate predictions, as evaluated by area under the ROC.
- Hypothesis 2: Training classifiers using a lower dimensional representation will speed up the model training and hyperparameter tuning process.

The DeepMicro paper thoroughly describes their procedures. The datasets contain about 1000 samples in total and the datasets are all publically available on their GitHub repository$^{10}$. The models are implemented in a clear and straightforward manner. There is no mention of time required for training or evaluation in the paper, however the computations took about 2 hours on our slightly more powerful machine. For these reasons, the scope of replicating the paper and testing the hypotheses is expected to be high.

# Methodology

In the early stages of our project a DAE autoencoder and an MLP classifier has been used on the IDB cohort to test the hypotheses. The IBD cohort contains 25 IBD patients and 85 healthy controls. The dataset is split into a balanced training set containing 88 samples (80%). A shallow autoencoders with 64 hidden units is trained using strain-level marker profile. The original dataset along with the encoded dataset is used to train 2 MLP models with 2 hidden layers each, containing 32 and 16 units. The models are evaluated on the remaining 22 samples (20%).

In [None]:
import numpy as np
import pandas as pd
import torch
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

# Packages for unused portions of code
'''
import matplotlib
matplotlib.use('agg')

import torch.nn.functional as F
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# importing sklearn
from sklearn.model_selection import StratifiedKFold
from sklearn.decomposition import PCA
from sklearn.random_projection import GaussianRandomProjection
from sklearn import cluster
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score


# importing util libraries
import datetime
import time
import math
import os
import importlib

import sys
'''

##  Data

Publicly available human gut metagenomic samples from six disease cohorts (inflammatory bowel disease, type 2 diabetes, obesity, liver cirrhosis, and colorectal cancer) are obtained from the MetAML study. Data can downloaded from the DeepMicro GitHub repository.$^{10}$ Train/test split is chosen to be 80/20. The abundance and marker datasets are preprocessed accordingly and the features and label are extracted.

In [None]:
# A dict to select a dataset
datasets = {
    0: 'abundance_Cirrhosis.txt',
    1: 'abundance_Colorectal.txt',
    2: 'abundance_IBD.txt',
    3: 'abundance_Obesity.txt',
    4: 'abundance_T2D.txt',
    5: 'abundance_WT2D.txt',
    6: 'marker_Cirrhosis.txt',
    7: 'marker_Colorectal.txt',
    8: 'marker_IBD.txt',
    9: 'marker_Obesity.txt',
    10: 'marker_T2D.txt',
    11: 'marker_WT2D.txt'
}
dataset_file = 'data/' + datasets[8]

# Train/test split is 80/20
test_size = 0.2

# Load the dataset into a dataframe
def load_raw_data(dataset_file):
    with open(dataset_file, 'r') as fo:
        df = pd.read_csv(fo, sep='\t', header=None, index_col=0, low_memory=False)
        return df

raw_data = load_raw_data(dataset_file)

# Preprocessing for abundance and marker datasets
feature_string = ''
file_name = dataset_file.split('/')[-1].split('.')[0]
if file_name.split('_')[0] == 'abundance':
    feature_string = 'k__'
elif file_name.split('_')[0] == 'marker':
    feature_string = 'gi|'

# Compute descriptive statistics
def calculate_stats(raw_data):
    if file_name.split('_')[1] == 'Obesity':
        negative_samples = raw_data.loc['disease'].value_counts()['leaness']
    else:
        negative_samples = raw_data.loc['disease'].value_counts()['n']
    positive_samples = raw_data.shape[1] - negative_samples

    print(file_name)
    print(f'\tTotal Samples: {raw_data.shape[1]}')
    print(f'\tPositive Samples: {positive_samples}')
    print(f'\tHealthy Controls: {negative_samples}')
    print(f'\tFraction of Positive Samples: {round(positive_samples/raw_data.shape[1], 3)}')
    print(f'\tNumber of Features: {raw_data.index.str.contains(feature_string, regex=False).size}')

calculate_stats(raw_data)

# Extract features and labels into tensors
def process_data(raw_data):
    label_dict = {
        # Controls
        'n': 0,
        # Chirrhosis
        'cirrhosis': 1,
        # Colorectal Cancer
        'cancer': 1, 'small_adenoma': 0,
        # IBD
        'ibd_ulcerative_colitis': 1, 'ibd_crohn_disease': 1,
        # T2D and WT2D
        't2d': 1,
        # Obesity
        'leaness': 0, 'obesity': 1,
    }

    X = raw_data.loc[raw_data.index.str.contains(feature_string, regex=False)].T
    y = raw_data.loc['disease']
    y = y.replace(label_dict)

    X_train, X_test, y_train, y_test = train_test_split(X.values.astype('float64'), y.values.astype('int'), test_size=test_size, stratify=y.values)

    return torch.Tensor(X_train), torch.Tensor(X_test), torch.Tensor(y_train), torch.Tensor(y_test)

processed_data = process_data(raw_data)

##   Model

There is an Autoencoder model and a Classifier model.

### Autoencoders

4 types of autoencoders will be used in the project: Shallow Autoencoder, Deep Autoencoder, Convolutional Autoenconder and Variational Autoenconder. Although all the autoencoders have been prepared, currently only SAE and DAE have been implemented (SAE is a special case of DAE). The encoder contains 3 layers with 256, 128 and 64 units and the decoder is symmetric with 128, 256 and 'input_dim' units. ReLU is the activation function for all layers except the last layer, which uses Sigmoid. MSE is the loss function for the abundance data, while BCE is used for the marker data. Adam is used as the optimizer.

In [None]:
# Deep AutoEncoder
class DAE(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim=32, degree=0):
        super().__init__()

        self.encoder = torch.nn.Sequential()
        # First layer
        self.encoder.append(torch.nn.Linear(input_dim, hidden_dim*2**degree))
        self.encoder.append(torch.nn.ReLU())

        # Iteratively add layers to the encoder
        for i in range(degree, 0, -1):
            self.encoder.append(torch.nn.Linear(hidden_dim*2**i, hidden_dim*2**(i - 1)))
            self.encoder.append(torch.nn.ReLU())

        # Iteratively add layers to the decoder
        self.decoder = torch.nn.Sequential()
        for i in range(degree):
            self.decoder.append(torch.nn.Linear(hidden_dim*2**i, hidden_dim*2**(i + 1)))
            self.decoder.append(torch.nn.ReLU())

        # Last layer
        self.decoder.append(torch.nn.Linear(hidden_dim*2**degree, input_dim))
        self.decoder.append(torch.nn.Sigmoid())

    def forward(self, x):
        return self.decoder(self.encoder(x))



In [None]:
# CAE and VAE have not been implemented yet
'''
class ConvAutoencoder(nn.Module):
    def __init__(self, channels, kernel_sizes, strides, paddings, latent_dims):
        super(ConvAutoencoder, self).__init__()
        self.encoder = nn.Sequential()
        for i in range(len(channels) - 1):
            self.encoder.add_module(f"conv_{i}", nn.Conv2d(channels[i], channels[i+1], kernel_sizes[i], strides[i], paddings[i]))
            self.encoder.add_module(f"relu_{i}", nn.ReLU(True))

        self.decoder = nn.Sequential()
        rev_channels = channels[::-1]
        rev_kernel_sizes = kernel_sizes[::-1]
        rev_strides = strides[::-1]
        rev_paddings = paddings[::-1]
        for i in range(len(rev_channels) - 1):
            self.decoder.add_module(f"deconv_{i}", nn.ConvTranspose2d(rev_channels[i], rev_channels[i+1], rev_kernel_sizes[i], rev_strides[i], rev_paddings[i]))
            self.decoder.add_module(f"relu_{i}", nn.ReLU(True))

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x



class VariationalAutoencoder(nn.Module):
    def __init__(self, input_dims, hidden_dims, latent_dims, device='cpu'):
        super(VariationalAutoencoder, self).__init__()
        self.device = torch.device(device)
        self.input_dims = input_dims
        self.fc1 = nn.Linear(input_dims, hidden_dims).to(self.device)
        self.fc2_mean = nn.Linear(hidden_dims, latent_dims).to(self.device)
        self.fc2_logvar = nn.Linear(hidden_dims, latent_dims).to(self.device)
        self.fc3 = nn.Linear(latent_dims, hidden_dims).to(self.device)
        self.fc4 = nn.Linear(hidden_dims, input_dims).to(self.device)

    def encode(self, x):
        x = torch.tensor(x, dtype=torch.float32)
        h1 = F.relu(self.fc1(x))
        return self.fc2_mean(h1), self.fc2_logvar(h1)

    def reparameterize(self, mean, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mean + eps * std

    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        return torch.sigmoid(self.fc4(h3))

    def forward(self, x):
        # Adjust x if necessary (handled outside the model)
        mean, logvar = self.encode(x.to(self.device))
        z = self.reparameterize(mean, logvar)
        return self.decode(z), mean, logvar


def vae_loss(recon_x, x, mu, logvar):
    # Assuming x is your input tensor and it originally has a shape compatible with [batch_size, 200]
    # You need to ensure recon_x and x have the same shape for BCE calculation
    BCE = F.binary_cross_entropy(recon_x, x, reduction='sum')

    # Calculation of KL Divergence remains the same
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD
'''

### Classifier

The project uses 3 types of classifiers: Multi-Layer Perceptron, Support Vector Machines and Random Forests. SVM and RFs have been prepared but not implemented yet. The MLP contains two hidden layers with 32 and 16 units, while the output layer has only 1 unit. The activation function for the hidden layers is ReLU, while Sigmoid is used for the output layer. BCE is the loss function and Adam is the optimizer.

In [None]:
# Multi-Layer Perceptron
class MLP(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim=32, num_layers=1):
        super().__init__()

        self.net = torch.nn.Sequential()
        # Input layer
        self.net.append(torch.nn.Linear(input_dim, hidden_dim))
        self.net.append(torch.nn.ReLU())
        # Iteratively add hidden layers
        for i in range(num_layers - 1):
            self.net.append(torch.nn.Linear(hidden_dim, int(hidden_dim/2)))
            self.net.append(torch.nn.ReLU())
            hidden_dim = int(hidden_dim/2)
        # Output layer
        self.net.append(torch.nn.Linear(hidden_dim, 1))
        self.net.append(torch.nn.Sigmoid())

    def forward(self, x):
        return self.net(x)

In [None]:
# SVM and RFs have not been implemented yet
'''
# run exp function
def run_exp(seed):
    # Initialize the DeepMicrobiome instance
    data_file = args.data + '.txt' if args.data else args.custom_data
    dm = DeepMicrobiome(data=data_file, seed=seed, data_dir=args.data_dir)
            # Load data based on the specified dataset type
    if args.data:
        feature_string = "k__" if "abundance" in args.data else "gi|" if "marker" in args.data else ''
        dm.loadData(feature_string=feature_string, label_string='disease', label_dict=label_dict, dtype=dtypeDict[args.dataType])
    elif args.custom_data:
        if args.custom_data_labels:
            dm.loadCustomDataWithLabels(label_data=args.custom_data_labels, dtype=dtypeDict[args.dataType])
        else:
            dm.loadCustomData(dtype=dtypeDict[args.dataType])
    else:
        print("[Error] No input file specified. Use -h for help.")
        exit()
            # Representation learning (Dimensionality reduction)
    if args.pca:
        dm.pca()
    if args.ae:
        dm.ae(dims=[int(i) for i in args.dims.split(',')], act=args.act, epochs=args.max_epochs, loss=args.aeloss,
              latent_act=args.ae_lact, output_act=args.ae_oact, patience=args.patience, no_trn=args.no_trn)
    if args.vae:
        dm.vae(dims=[int(i) for i in args.dims.split(',')], act=args.act, epochs=args.max_epochs, loss=args.aeloss, output_act=args.ae_oact,
               patience=25 if args.patience == 20 else args.patience, beta=args.vae_beta, warmup=args.vae_warmup, warmup_rate=args.vae_warmup_rate, no_trn=args.no_trn)
    if args.cae:
        dm.cae(dims=[int(i) for i in args.dims.split(',')], act=args.act, epochs=args.max_epochs, loss=args.aeloss, output_act=args.ae_oact,
               patience=args.patience, rf_rate=args.rf_rate, st_rate=args.st_rate, no_trn=args.no_trn)
    if args.rp:
        dm.rp()
            # Write the learned representation to a file if required
    if args.save_rep and (args.pca or args.ae or args.rp or args.vae or args.cae):
        rep_file = f"{dm.data_dir}results/{dm.prefix}{dm.data}_rep.csv"
        X_train_flat = dm.X_train.view(dm.X_train.size(0), -1)  # or you could use numpy: dm.X_train.numpy().reshape(80, -1)
        # Convert the flattened tensor to a numpy array and then to a DataFrame
        X_train_df = pd.DataFrame(X_train_flat.numpy())
        # Save the DataFrame to CSV
        X_train_df.to_csv(rep_file, header=None, index=None)
        print(f"The learned representation of the training set has been saved in '{rep_file}'")
    else:
        print("Warning: No representation learning performed, so nothing was saved.")

def classification(self, hyper_parameters, method='svm', cv=5, scoring='roc_auc', n_jobs=1, cache_size=10000):
    clf_start_time = time.time()
    # Convert PyTorch tensors to numpy arrays for sklearn models
    X_train_np = self.X_train.cpu().detach().numpy()
    y_train_np = self.y_train.cpu().detach().numpy()
    X_test_np = self.X_test.cpu().detach().numpy()
    y_test_np = self.y_test.cpu().detach().numpy()
    print("# Tuning hyper-parameters")
    print(X_train_np.shape, y_train_np.shape)

    if method == 'all' or method == 'svm_rf':
        methods = ['svm', 'rf'] if method == 'svm_rf' else ['svm', 'rf', 'mlp']
    else:
        methods = [method]

    for m in methods:
        if m == 'svm':
            clf = GridSearchCV(SVC(probability=True, cache_size=cache_size), hyper_parameters, cv=StratifiedKFold(n_splits=cv, shuffle=True), scoring=scoring, n_jobs=n_jobs, verbose=1)
            clf.fit(X_train_np, y_train_np)
        elif m == 'rf':
            clf = GridSearchCV(RandomForestClassifier(n_jobs=-1, random_state=0), hyper_parameters, cv=StratifiedKFold(n_splits=cv, shuffle=True), scoring=scoring, n_jobs=n_jobs, verbose=1)
            clf.fit(X_train_np, y_train_np)
        elif m == 'mlp':
            model = KerasClassifier(build_fn=mlp_model, input_dim=X_train_np.shape[1], verbose=0)
            clf = GridSearchCV(estimator=model, param_grid=hyper_parameters, cv=StratifiedKFold(n_splits=cv, shuffle=True), scoring=scoring, n_jobs=n_jobs, verbose=1)
            clf.fit(X_train_np, y_train_np)

        print(f"Best parameters set found on development set for {m}:", clf.best_params_)
        y_pred = clf.predict(X_test_np)
        y_prob = clf.predict_proba(X_test_np)[:, 1] if m != 'mlp' else clf.predict_proba(X_test_np)[:, 1]

        metrics = [round(roc_auc_score(y_test_np, y_prob), 4),
                   round(accuracy_score(y_test_np, y_pred), 4),
                   round(recall_score(y_test_np, y_pred), 4),
                   round(precision_score(y_test_np, y_pred), 4),
                   round(f1_score(y_test_np, y_pred), 4)]

        print(f'Metrics for {m} [AUC, ACC, Recall, Precision, F1]:', metrics)

        # Save metrics to a file
        metrics.append(str(datetime.datetime.now()))
        metrics.append(round((time.time() - self.t_start), 2))
        metrics.append(round((time.time() - clf_start_time), 2))
        metrics.append(str(clf.best_params_))

        res = pd.DataFrame([metrics], index=[self.prefix + m])
        with open(os.path.join(self.data_dir, "results", f"{self.data}_result.txt"), 'a') as f:
            res.to_csv(f, header=None)
'''

### DeepMicro Framework

The DeepMicro class is used for training and evaluation.

In [None]:
class DeepMicro():
    def __init__(self, processed_data):
        # Load preprocessed data into tensors
        self.X_train, self.X_test, self.y_train, self.y_test = processed_data

    def encode(self, hidden_dim=32, degree=0, rho=0.90):
        # Create AE object
        ae = DAE(input_dim=self.X_train.shape[1], hidden_dim=hidden_dim, degree=degree)
        # Select loss function according to dataset
        if file_name.split('_')[0] == 'abundance':
            loss_func = torch.nn.MSELoss()
        elif file_name.split('_')[0] == 'marker':
            loss_func = torch.nn.BCELoss()
        optimizer = torch.optim.Adam(ae.parameters())
        # Train the AE
        print('Training AutoEncoder')
        ae.train()
        losses = []
        for i in range(31):
            optimizer.zero_grad()
            X_hat = ae(self.X_train)
            loss = loss_func(X_hat, self.X_train)
            loss.backward()
            optimizer.step()
            losses.append(loss.detach().numpy())
            # Early Stopping
            if i%5 == 0:
                print(f'Epoch {i} Loss: {loss}')
                X_hat = ae(self.X_test)
                val_loss = loss_func(X_hat, self.X_test)
                if loss < rho*val_loss and i > 5:
                    break

        # Update features
        self.X_train = ae.encoder(self.X_train).detach()
        self.X_test = ae.encoder(self.X_test).detach()

        return losses

    def classify(self, hidden_dim=32, num_layers=1, rho=0.99):
        # Create classifier object
        self.clf = MLP(input_dim=self.X_train.shape[1], hidden_dim=hidden_dim, num_layers=num_layers)
        loss_func = torch.nn.BCELoss()
        optimizer = torch.optim.Adam(self.clf.parameters(), lr=1e-4)
        # Train the classifier
        print('Training Classifier')
        self.clf.train()
        losses = []
        min_val_loss = 1e10
        for i in range(501):
            optimizer.zero_grad()
            y_hat = self.clf(self.X_train)
            loss = loss_func(y_hat.squeeze(dim=-1), self.y_train)
            loss.backward()
            optimizer.step()
            losses.append(loss.detach().numpy())
            # Early Stopping
            if i%5 == 0:
                print(f'Epoch {i} Loss: {loss}')
                y_hat = self.clf(self.X_test)
                val_loss = loss_func(y_hat.squeeze(-1), self.y_test)
                if val_loss < rho*min_val_loss:
                    min_val_loss = val_loss
                elif i >= 50:
                    break
        return losses

    def evaluate(self):
        # Evaluate on test set with ROC AUC
        self.clf.eval()
        with torch.no_grad():
            y_hat = self.clf(self.X_test).squeeze(-1)
            y_pred = (y_hat > 0.5).type(torch.int)
            auc = round(roc_auc_score(self.y_test, y_hat), 4)
            print(f'\tROC AUC: {auc}')
        return auc

In [None]:
# Training the model
dm = DeepMicro(processed_data)

In [None]:
%%time
# Classify directly on the features
clf_losses = dm.classify(32, 2)
clf_auc = dm.evaluate()

In [None]:
%%time
# Train the AutoEncoder and update the feature representation
en_losses = dm.encode(64, 2)

In [None]:
%%time
# Classify using the encoded representation
en_clf_losses = dm.classify(32, 2)
en_clf_auc = dm.evaluate()

# Results

The results of evaluating the models on the test set are displayed below. Area under the Reciever Operating Characterisitic is used as the evaluation metric. Figure 1. shows the training loss for the autoencoder and Figure 2. shows the training loss for the 2 classifiers.

In [None]:
# Print AUC metric
print(f'AUC for classifier: {clf_auc}')
print(f'AUC for encoder-classifier: {en_clf_auc}')

In [None]:
# Plot loss for Autoencoder
plt.style.use('fivethirtyeight')
plt.title('Figure 1. AE Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.plot(en_losses, label='AutoEncoder')
plt.legend()
plt.show()

In [None]:
# Plot loss for Classifiers
plt.style.use('fivethirtyeight')
plt.title('Figure 2. CLF vs. En-CLF')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.plot(clf_losses, label='Classifier')
plt.plot(en_clf_losses, label='Encoder-Classifier')
plt.legend()
plt.show()

# Discussion

The training loss after encoding is consistently higher than the training loss from running the classifier directly on the dataset. The area under the ROC curve is also consistently higher without using the encoder. Moreover, although the classifier trains faster on the learned representation, the total time taken for training both the encoder and the classifier is much higher than just using the classifier. These results are the opposite of the findings of the original study. The following points discuss possible explanations for these results.

- Selection of the best hyperparameters using cross-validation and grid search has not been implemented yet in our project. As a result the models are inferior to those in the DeepMicro study.
- CAE, VAE, SVM and RFs have also not been implemented yet. Their implementation could greatly improve performance of the models.
- It appears that the encoder trains well but the MLP classifier does not train well with the encoded features (training loss does not decrease by much). The original study suggests that RFs work best with a DAE for the IBD cohort.
- The DeepMicro study makes use of machine learning models in Keras, whereas our models are implemented in PyTorch. Hyperparameters related to model training (e.g. learning rate and early stopping) need to be customized for each dataset. These tasks are handled by Keras automatically.

After the complete implementation of the project, it is expected to replicate the results of the original study.

It can be seen that the models run in under a minute. The computational requirements for this project are low. The entire computation of the DeepMirco study took about 2 hours on our machine. Even without a GPU we expect to finish all computations in a reasonable time period (less than 24 hours).

Implementing all the models in PyTorch was a straightforward task. Tuning the hyperparameters manually is a difficult task and needs to be automated. Additionally, we need to deterimine how to best decide on the training specific hyperparameters.

Going forward we plan to implement all the missing models and a proper hyperparameter search. Additionally, we plan to add new datasets obtained from the phylaGAN$^5$ and MV-CVIB$^6$ studies.

# References
1. Cho, I. & Blaser, M. J. The human microbiome: at the interface of health and disease. Nature Reviews Genetics 13, 260 (2012).
2. Eloe-Fadrosh, E. A. & Rasko, D. A. The human microbiome: from symbiosis to pathogenesis. Annual review of medicine 64, 145-163 (2013).
3. Hamady, M. & Knight, R. Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome research 19, 1141-1152 (2009).
4. Scholz, M. et al. Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nature methods 13, 435 (2016).
5. Divya Sharma, Wendy Lou, Wei Xu, phylaGAN: Data augmentation through conditional GANs and autoencoders for improving disease prediction accuracy using microbiome data, Bioinformatics, 2024;, btae161, https://doi.org/10.1093/bioinformatics/btae161
6. Cui Z, Wu Y, Zhang Q-H, Wang S-G, He Y and Huang D-S (2023) MV-CVIB: a microbiome-based multi-view convolutional variational information bottleneck for predicting metastatic colorectal cancer. Front. Microbiol. 14:1238199. doi: 10.3389/fmicb.2023.1238199
7. U. Gülfem Elgün Çiftcioğlu, O. Ufuk Nalbanoglu, DeepGum: Deep feature transfer for gut microbiome analysis using bottleneck models, Biomedical Signal Processing and Control, Volume 91, 2024, 105984, ISSN 1746-8094, https://doi.org/10.1016/j.bspc.2024.105984.
8. Oh, Min, and Liqing Zhang. "DeepMicro: deep representation learning for disease prediction based on microbiome data." *Scientific Reports* 10.1 (2020): 1-9. https://doi.org/10.1038/s41598-020-63159-5.
9. Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS computational biology 12, e1004977 (2016).
10. https://github.com/minoh0201/DeepMicro