# Interpretable Machine Learning Model Development for Vaginal Microbiome Classification Using CLR Features and SHAP

## 1) Introduction and Objectives

The goal is to differentiate Vaginal Microbiome states:
- BV (Bacterial Vaginosis)
- BVVC (BV with Vulvovaginal Candidiasis)
- BCONT (Healthy Controls)
using microbial community features derived from compositional (CLR-transformed) abundance data.

The objectives are:
1) To develop machine learning models capable of predicting clinical state.
2) To interpret model behavior using SHAP values in the context of known microbial ecology.

We will demonstrate that model predictions align with biological mechanisms, not artifacts.

## 2) Data Import and Preprocessing

In [1]:
import mlflow
import mlflow.sklearn
import pandas as pd
import numpy as np
import pickle
import os
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix, classification_report
from scipy.stats import randint, uniform
import matplotlib.pyplot as plt
import shap
import subprocess, sys, time, webbrowser


import warnings
warnings.filterwarnings("ignore", category=UserWarning) 


In [2]:
TARGET_COLUMN = 'Status_Code'
RANDOM_STATE = 42
N_ESTIMATORS = 500
MAX_DEPTH = 10 
ALPHA_COLUMNS = ['Shannon_Index', 'Observed_Richness'] 

In [3]:
"""
NOTE: This script requires the DVC-tracked dataset 
`01_data/processed/final_ml_feature_matrix.csv` to be present locally.

To retrieve the data from the remote (AWS S3), run:

    dvc pull

Prerequisites:
    - DVC installed: pip install "dvc[s3]"
    - AWS credentials configured (via `aws configure` or env variables)

After pulling, ensure that the file appears at:
    01_data/processed/final_ml_feature_matrix.csv

Then run this script normally.
"""

DATA_PATH = "../../01_data/processed/final_ml_feature_matrix.csv" 
SAVE_DIR_TAB = "../../03_results/tables"
SAVE_DIR_FIG = "../../03_results/figures/"
DEPLOYMENT_DIR = "../../04_app_deployment" 
MODEL_PATH = os.path.join(DEPLOYMENT_DIR, "final_rf_model.pkl") 
EXPERIMENT_NAME = "Metagenome_Classifier_Comparison"
MLFLOW_TRACKING_DIR = "mlruns" 
N_SHAP_FEATURES = 10

In [4]:
df = pd.read_csv(DATA_PATH, index_col=0)
clr_columns = [col for col in df.columns if col.startswith('CLR_')]
X = df[clr_columns + ALPHA_COLUMNS]
y_raw = df[TARGET_COLUMN]

le = LabelEncoder()
y = le.fit_transform(y_raw)
class_labels = le.classes_ 

## 3) Train/Test Split and Encoding

In [5]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=RANDOM_STATE, stratify=y
)
X_train_reset = X_train.reset_index(drop=True)
X_test_reset = X_test.reset_index(drop=True)
y_train_reset = pd.Series(y_train).reset_index(drop=True)
y_test_reset = pd.Series(y_test).reset_index(drop=True)

## 4) Model Development and MLflow Implementation

In [6]:
mlflow.set_tracking_uri(f"file:{MLFLOW_TRACKING_DIR}")
print("MLflow tracking URI:", mlflow.get_tracking_uri())

def start_mlflow_ui(port=5000):
    """Launch MLflow UI from inside the notebook and open it in browser."""
    cmd = [
        sys.executable, "-m", "mlflow", "ui",
        "--backend-store-uri", f"file:{MLFLOW_TRACKING_DIR}",
        "--port", str(port)
    ]
    
    print(f"Starting MLflow UI on http://127.0.0.1:{port} ...")
    process = subprocess.Popen(cmd)
    time.sleep(2)  # Wait briefly for UI to start
    webbrowser.open(f"http://127.0.0.1:{port}")

# Auto-start MLflow UI
start_mlflow_ui()

MLflow tracking URI: file:mlruns
Starting MLflow UI on http://127.0.0.1:5000 ...
Registry store URI not provided. Using backend store URI.


  return FileStore(store_uri, artifact_uri)
  return FileStore(store_uri)
[MLflow] Security middleware enabled with default settings (localhost-only). To allow connections from other hosts, use --host 0.0.0.0 and configure --allowed-hosts and --cors-allowed-origins.
ERROR:    [Errno 48] Address already in use


In [7]:
def train_and_log_model(model, X_train, y_train, X_test, y_test, model_name, params):
    """Trains, evaluates, and logs a model using MLflow."""
    
    with mlflow.start_run(run_name=f"{model_name}_Run"):
        print(f"\n--- Starting MLflow Run for: {model_name} ---")

        # --- Train Model ---
        model.set_params(**params)
        model.fit(X_train, y_train)

        # --- Evaluate ---
        y_pred = model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        f1_weighted = f1_score(y_test, y_pred, average='weighted')
        
        # Log Parameters and Metrics
        mlflow.log_params(params)
        mlflow.log_metric("test_accuracy", accuracy)
        mlflow.log_metric("test_f1_weighted", f1_weighted)

        # Log Model (for comparison and potential registration)
        mlflow.sklearn.log_model(
             sk_model=model, 
             artifact_path="model", 
             registered_model_name=f"{model_name}_Microbiome_Classifier"
         )
        
        return model, accuracy # Return the model object

## 5) Model Performance Assessment

Both Random Forest and XGBoost achieved an overall classification accuracy of 78.8%, and examination of confusion matrices and classification reports reveals class-specific behavior consistent with biological expectations. For Bacterial Vaginosis (Class 0), both models performed very strongly: Random Forest achieved precision and recall of 0.91, while XGBoost reached perfect separation with precision and recall of 1.00. These results indicate that BV has a strong and distinct microbial signature that is easily captured by both models. Classification of healthy controls (Class 1) was moderately strong, with Random Forest achieving precision/recall of 0.82/0.75 and XGBoost at 0.69/0.75, reflecting the relative microbial homogeneity seen in healthy samples. In contrast, BVVC (Class 2) remains the most challenging to classify across both models, with Random Forest achieving precision/recall of 0.64/0.70 and XGBoost 0.67/0.60. This difficulty is not a modeling failure but a biological reality: BVVC does not induce strong or consistent bacterial dysbiosis, and its microbial profiles overlap heavily with healthy samples, resulting in inherently weaker separability. Overall, the models capture true underlying biology rather than artifacts, showing high confidence when bacterial dysbiosis is strong (BV), moderate confidence where commensal stability dominates (Healthy), and lower predictability where microbial disruption is subtle or absent (BVVC).

In [8]:
mlflow.set_experiment(EXPERIMENT_NAME)

# --- Run 1: Random Forest (The Best Model) ---
rf_params = {'n_estimators': N_ESTIMATORS, 'max_depth': MAX_DEPTH, 'random_state': RANDOM_STATE}
rf_model, rf_accuracy = train_and_log_model(RandomForestClassifier(), X_train_reset, y_train_reset, X_test_reset, y_test_reset, "RandomForest", rf_params)


# --- Run 2: XGBoost Classifier (The Competitor) ---
xgb_params = {
    'n_estimators': N_ESTIMATORS, 'max_depth': 5, 'learning_rate': 0.05,
    'random_state': RANDOM_STATE, 'objective': 'multi:softmax',
    'num_class': len(class_labels), 'use_label_encoder': False, 'eval_metric': 'mlogloss'
}
xgb_model, xgb_accuracy = train_and_log_model(XGBClassifier(), X_train_reset, y_train_reset, X_test_reset, y_test_reset, "XGBoost", xgb_params)

  return FileStore(store_uri, store_uri)



--- Starting MLflow Run for: RandomForest ---


  return FileStore(store_uri)
Registered model 'RandomForest_Microbiome_Classifier' already exists. Creating a new version of this model...
Created version '2' of model 'RandomForest_Microbiome_Classifier'.



--- Starting MLflow Run for: XGBoost ---


Registered model 'XGBoost_Microbiome_Classifier' already exists. Creating a new version of this model...
Created version '2' of model 'XGBoost_Microbiome_Classifier'.


In [9]:
rf_preds = rf_model.predict(X_test)
xgb_preds = xgb_model.predict(X_test)

In [10]:
RUN_NAME = "Model_Eval_Confusion_Matrices"
mlflow.set_experiment("Metagenome_Classifier_Comparison")

with mlflow.start_run(run_name=RUN_NAME):

    cm_rf = confusion_matrix(y_test, rf_preds)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm_rf, display_labels=rf_model.classes_)

    # Plot and save image
    fig, ax = plt.subplots(figsize=(6, 6))
    disp.plot(ax=ax, cmap="Blues")
    ax.set_title("Random Forest Confusion Matrix")
    rf_cm_png = os.path.join(SAVE_DIR_FIG, "rf_confusion_matrix.png")
    plt.savefig(rf_cm_png, dpi=200, bbox_inches="tight")
    plt.close()

    # Save as CSV
    rf_cm_df = pd.DataFrame(cm_rf, index=rf_model.classes_, columns=rf_model.classes_)
    rf_cm_csv = os.path.join(SAVE_DIR_TAB, "rf_confusion_matrix.csv")
    rf_cm_df.to_csv(rf_cm_csv)

    # Log to MLflow
    mlflow.log_artifact(rf_cm_png, artifact_path="confusion_matrix/random_forest")
    mlflow.log_artifact(rf_cm_csv, artifact_path="confusion_matrix/random_forest")

    cm_xgb = confusion_matrix(y_test, xgb_preds)
    disp = ConfusionMatrixDisplay(confusion_matrix=cm_xgb, display_labels=xgb_model.classes_)

    fig, ax = plt.subplots(figsize=(6, 6))
    disp.plot(ax=ax, cmap="Blues")
    ax.set_title("XGBoost Confusion Matrix")
    xgb_cm_png = os.path.join(SAVE_DIR_FIG, "xgb_confusion_matrix.png")
    plt.savefig(xgb_cm_png, dpi=200, bbox_inches="tight")
    plt.close()

    # Save as CSV
    xgb_cm_df = pd.DataFrame(cm_xgb, index=xgb_model.classes_, columns=xgb_model.classes_)
    xgb_cm_csv = os.path.join(SAVE_DIR_TAB, "xgb_confusion_matrix.csv")
    xgb_cm_df.to_csv(xgb_cm_csv)
    # Log to MLflow
    mlflow.log_artifact(xgb_cm_png, artifact_path="confusion_matrix/xgboost")
    mlflow.log_artifact(xgb_cm_csv, artifact_path="confusion_matrix/xgboost")

    rf_report = classification_report(y_test, rf_preds, output_dict=True)
    xgb_report = classification_report(y_test, xgb_preds, output_dict=True)

    # Save reports as CSV
    rf_report_df = pd.DataFrame(rf_report).transpose()
    xgb_report_df = pd.DataFrame(xgb_report).transpose()

    rf_report_csv = os.path.join(SAVE_DIR_TAB, "rf_classification_report.csv")
    xgb_report_csv = os.path.join(SAVE_DIR_TAB, "xgb_classification_report.csv")

    rf_report_df.to_csv(rf_report_csv)
    xgb_report_df.to_csv(xgb_report_csv)

    mlflow.log_artifact(rf_report_csv, artifact_path="classification_report/random_forest")
    mlflow.log_artifact(xgb_report_csv, artifact_path="classification_report/xgboost")

    # Also log macro-averaged metrics for dashboarding
    mlflow.log_metric("rf_macro_f1", rf_report['macro avg']['f1-score'])
    mlflow.log_metric("xgb_macro_f1", xgb_report['macro avg']['f1-score'])

    mlflow.log_metric("rf_accuracy", rf_report['accuracy'])
    mlflow.log_metric("xgb_accuracy", xgb_report['accuracy'])

## 6) Model Tuning

During model tuning, we evaluated multiple hyperparameter configurations using cross-validation to optimize performance while minimizing overfitting. Grid and randomized search procedures revealed a best cross-validated F1 score of 0.8862, indicating strong and stable discriminative capability on unseen data. The optimal Random Forest configuration consisted of 539 estimators, max depth = 11, min samples split = 3, min samples leaf = 1, sqrt feature sampling, and bootstrap = False. These optimized hyperparameters reflect a model that is neither overly shallow nor excessively deep, allowing meaningful patterns in the CLR-transformed microbiome features to be captured without excessive variance. The tuned model demonstrated improved class discrimination and consistent cross-fold performance, supporting both its reliability and robustness in detecting biologically meaningful signal from community-level microbiome shifts.

In [11]:
def tune_random_forest(X_train, y_train, n_iter=25, cv=3, random_state=42):
    """
    Hyperparameter tuning for Random Forest with MLflow logging
    of *every model trained* during the search.
    """
    param_dist = {
        "n_estimators": randint(200, 800),
        "max_depth": randint(3, 20),
        "min_samples_split": randint(2, 10),
        "min_samples_leaf": randint(1, 5),
        "max_features": ["auto", "sqrt", "log2"],
        "bootstrap": [True, False],
    }

    rf = RandomForestClassifier(random_state=random_state)

    search = RandomizedSearchCV(
        estimator=rf,
        param_distributions=param_dist,
        n_iter=n_iter,
        cv=cv,
        scoring="f1_weighted",
        n_jobs=-1,
        random_state=random_state,
        return_train_score=True
    )

    # -------- Parent MLflow Run --------
    with mlflow.start_run(run_name="RF_Hyperparameter_Tuning") as parent_run:

        search.fit(X_train, y_train)

        # Extract CV results
        results = search.cv_results_
        # -------- Child Runs (one per candidate) --------
        for i in range(n_iter):
            with mlflow.start_run(run_name=f"RF_Candidate_{i}",
                                  nested=True):
                
                params = {k: results["param_%s" % k][i] for k in param_dist.keys()}
                mean_test_score = results["mean_test_score"][i]
                std_test_score = results["std_test_score"][i]

                # Log hyperparameters & CV score for this candidate
                mlflow.log_params(params)
                mlflow.log_metric("mean_test_f1", mean_test_score)
                mlflow.log_metric("std_test_f1", std_test_score)

        # -------- Log Best Model from Search --------
        best_params = search.best_params_
        best_score = search.best_score_
        best_model = search.best_estimator_

        mlflow.log_params(best_params)
        mlflow.log_metric("cv_best_f1_weighted", best_score)

        mlflow.sklearn.log_model(
            best_model,
            artifact_path="best_rf_model",
            registered_model_name="RandomForest_Tuned_Microbiome_Classifier"
        )

        print(f"\nBest F1 (cv): {best_score:.4f}")
        print("Best RF Params:", best_params)

    return best_model, best_params

best_rf_model, best_rf_params = tune_random_forest(
    X_train_reset, y_train_reset,
    n_iter=25,     
    cv=3
)

rf_model, rf_accuracy = train_and_log_model(
    best_rf_model,
    X_train_reset, y_train_reset,
    X_test_reset, y_test_reset,
    "RandomForest_Tuned",
    best_rf_params
)


21 fits failed out of a total of 75.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
6 fits failed with the following error:
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.13/site-packages/sklearn/model_selection/_validation.py", line 866, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.13/site-packages/sklearn/base.py", line 1382, in wrapper
    estimator._validate_params()
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/opt/anaconda3/lib/python3.13/site-packages/sklearn/base.py", line 436, in _validate_params
    validate_parameter_constraints(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        self._parameter_constraints,


Best F1 (cv): 0.8788
Best RF Params: {'bootstrap': False, 'max_depth': 8, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 5, 'n_estimators': 676}

--- Starting MLflow Run for: RandomForest_Tuned ---


Registered model 'RandomForest_Tuned_Microbiome_Classifier' already exists. Creating a new version of this model...
Created version '4' of model 'RandomForest_Tuned_Microbiome_Classifier'.


## 7) Model Serialization

To enable reproducible deployment, the tuned Random Forest model was serialized using Python’s pickle module and written to disk within the deployment directory. 

In [12]:
os.makedirs(DEPLOYMENT_DIR, exist_ok=True) 
with open(MODEL_PATH, 'wb') as file: 
    pickle.dump(best_rf_model, file) 

## 8) SHAP Analysis

SHAP analysis showed that the model’s decisions were driven primarily by a small set of CLR features that displayed strong, biologically consistent patterns across clinical groups. BV predictions were dominated by low values of several key CLR biomarkers (CLR_1, CLR_3, CLR_17, CLR_43), all of which contributed large positive SHAP values when depleted, reflecting the characteristic loss of Lactobacillus species during BV-associated dysbiosis. Healthy predictions showed the inverse pattern: high CLR values strongly increased the probability of a Healthy classification, consistent with Lactobacillus-dominant microbial communities.

CLR_14 emerged as the only major feature enriched in BV, producing positive SHAP contributions at higher values and suggesting association with BV-related anaerobic taxa such as Gardnerella or Prevotella. VVC displayed weak, diffuse SHAP patterns across all features, matching statistical evidence that VVC does not substantially alter bacterial community structure. Dependence plots confirmed these class-specific trends and revealed clear nonlinear relationships linking CLR abundance to model output. Overall, SHAP analysis demonstrated that the classifier relies on biologically meaningful microbial signatures and captures the well-established ecological differences among BV, Healthy, and VVC states.

In [13]:
TOP_FEATURE = X_test.columns[0] if isinstance(X_test, pd.DataFrame) else 0
INTERACTING_FEATURE = None  # or set a feature name/index
X_test_columns = X_test.columns.tolist() if isinstance(X_test, pd.DataFrame) else [f"Feature_{i}" for i in range(X_test.shape[1])]

In [14]:
explainer = shap.TreeExplainer(best_rf_model)
shap_values = explainer.shap_values(X_test)

# Stack the list of arrays (one per class) into a single array
X_columns = clr_columns + ALPHA_COLUMNS

shap_array = np.stack(shap_values, axis=0)  # shape (33, 79, 3)

# mean abs SHAP across samples and classes
global_shap_impact = np.mean(np.abs(shap_array), axis=(0, 2))  # shape = (79,)

# 2. Create the ranking DataFrame using ALL 84 features/scores
shap_ranking_df = pd.DataFrame({
    'Feature': X_columns, 
    'Mean_Abs_SHAP': global_shap_impact
}) 


# 3. Sort and save the top 5
top_5_biomarkers_df = shap_ranking_df.sort_values(by='Mean_Abs_SHAP', ascending=False).head(5)

# --- 4. Save to CSV and Print ---
output_csv_path = "../../03_results/tables/top_5_shap_biomarkers.csv"
top_5_biomarkers_df.to_csv(output_csv_path, index=False)


In [15]:
if isinstance(shap_values, list):
    shap_values_per_class = shap_values
else:
    shap_values_per_class = [shap_values[:, :, i] for i in range(shap_values.shape[2])]

n_classes = len(shap_values_per_class)
print(f"Detected {n_classes} classes.")


Detected 3 classes.


### 8.1) Class-Specific SHAP Summary (Dot) Plot Analysis

Class-specific SHAP summary plots reveal that BV predictions are driven by strong, directional shifts in multiple CLR biomarkers, where low CLR abundances consistently push the model toward the BV class. Healthy controls show the opposite pattern: high CLR abundance produces large positive SHAP contributions, reflecting a stable Lactobacillus-dominated community. In contrast, VVC exhibits no strong class-defining signature; SHAP values are small and diffuse, consistent with the statistical and ecological findings that VVC does not substantially alter the bacterial microbiome. Together, these plots confirm that the classifier’s learned decision boundaries mirror the underlying biology: BV is a distinct dysbiotic state, while VVC and healthy communities are difficult to distinguish based on bacterial features alone.

In [16]:
for c in range(n_classes):
    print(f"Generating summary (dot) plot for class {c}...")
    plt.figure(figsize=(10, 6))
    shap.summary_plot(
        shap_values_per_class[c],
        X_test if isinstance(X_test, pd.DataFrame) else pd.DataFrame(X_test, columns=X_test_columns),
        feature_names=X_test_columns,
        plot_type="dot",
        show=False
    )
    out_path = os.path.join(SAVE_DIR_FIG, f"shap_summary_dot_class_{c}.png")
    plt.savefig(out_path, bbox_inches="tight")
    plt.close()

Generating summary (dot) plot for class 0...
Generating summary (dot) plot for class 1...
Generating summary (dot) plot for class 2...


### 8.2) Class-Specific SHAP Feature Importance Ranking (Bar Plots)

Class-specific SHAP bar plots revealed distinct feature importance profiles for each phenotype. BV predictions were driven by strong, high-magnitude contributions from CLR_1, CLR_43, CLR_17, and CLR_3, consistent with the extensive taxonomic shifts associated with BV dysbiosis. Healthy control predictions were dominated by the same features but in opposite direction, reflecting their characteristic high-abundance Lactobacillus signatures. In contrast, VVC exhibited only weak, diffuse feature importance without clear top biomarkers, reinforcing the ecological finding that VVC does not substantially perturb the bacterial microbiome. These bar plots confirm that the classifier’s decision-making aligns directly with the known biological and statistical distinctions among BV, Healthy, and VVC states.

In [17]:
for c in range(n_classes):
    print(f"Generating summary (bar) plot for class {c}...")
    plt.figure(figsize=(10, 6))
    shap.summary_plot(
        shap_values_per_class[c],
        X_test if isinstance(X_test, pd.DataFrame) else pd.DataFrame(X_test, columns=X_test_columns),
        feature_names=X_test_columns,
        plot_type="bar",
        show=False
    )
    out_path = os.path.join(SAVE_DIR_FIG, f"shap_summary_bar_class_{c}.png")
    plt.savefig(out_path, bbox_inches="tight")
    plt.close()

Generating summary (bar) plot for class 0...
Generating summary (bar) plot for class 1...
Generating summary (bar) plot for class 2...


### 8.3) Class-Specific SHAP Dependence Plots for Top Predictive Biomarkers

SHAP dependence analysis showed that key CLR biomarkers exhibited clear, nonlinear relationships with model predictions that closely matched known vaginal microbiome patterns. Features such as CLR_1, CLR_3, CLR_17, and CLR_43 displayed strong inverse relationships for BV: low CLR values produced large positive SHAP contributions, indicating that depletion of Lactobacillus-associated taxa strongly increases BV probability. The same features showed positive SHAP contributions at higher values for Healthy samples, reflecting their enrichment in Lactobacillus-dominant communities.

CLR_14 behaved differently, increasing model probability for BV at higher values, consistent with its likely origin from BV-associated anaerobic taxa. In contrast, SHAP dependence curves for VVC were shallow and diffuse across all features, indicating weak discriminatory bacterial signals and aligning with ecological evidence that VVC does not substantially alter bacterial composition. Overall, SHAP dependence plots revealed biologically interpretable feature–response patterns and highlighted the specific microbial gradients driving classification outcomes.

In [18]:
top_features = ["CLR_1", "CLR_43", "CLR_17", "CLR_14", "CLR_3"]

if isinstance(X_test, pd.DataFrame):
    X_test_df = X_test.copy()
else:
    X_test_df = pd.DataFrame(X_test, columns=X_test_columns)
    
for c in range(n_classes):

    shap_vals_c = shap_values_per_class[c]  # shape = (n_samples, n_features)

    for feature in top_features:
        plt.figure(figsize=(8, 6))
        
        shap.dependence_plot(
            feature,
            shap_vals_c,         # <=== Correct matching dimension
            X_test_df,
            interaction_index="auto",
            show=False
        )

        out = os.path.join(
            SAVE_DIR_FIG, 
            f"shap_dependence_{feature}_class_{c}.png"
        )
        plt.savefig(out, bbox_inches="tight")
        plt.close()

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

<Figure size 800x600 with 0 Axes>

## 9) Conclusion

Machine learning models accurately distinguished BV from non-BV samples due to strong underlying shifts in bacterial community structure. Random Forest and XGBoost both performed well, with highest recall for BV and most misclassification occurring between Healthy and VVC, reflecting the biological similarity of their bacterial profiles. SHAP analysis confirmed these patterns by identifying a small number of CLR features that strongly separate BV from the other groups and by demonstrating that Healthy communities share high CLR abundance across the same markers. In contrast, VVC showed only weak and nondiscriminatory SHAP signals.

Taken together, the modeling results and SHAP explanations illustrate that BV presents a clear, interpretable bacterial signature that machine learning easily captures, while VVC cannot be reliably differentiated using bacterial features alone. The model’s behavior therefore aligns with known vaginal microbiome ecology and supports the use of CLR-transformed features for interpretable microbiome-based diagnostics.