# StrepSuis-AMRVirKM: K-Modes Clustering Analysis

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MK-vet/strepsuis-amrvirkm/blob/main/notebooks/AMRVirKM_Analysis.ipynb)

**Professional bioinformatics tool for clustering analysis of antimicrobial resistance and virulence profiles**

---

## Overview

This notebook provides an interactive interface for running StrepSuis-AMRVirKM cluster analysis in Google Colab.

### Features:
- ✅ K-Modes clustering with automatic optimization
- ✅ Multiple Correspondence Analysis (MCA)
- ✅ Feature importance ranking
- ✅ Association rule discovery
- ✅ Bootstrap confidence intervals
- ✅ Publication-quality reports

### Usage:
1. Run the installation cell
2. Upload your CSV files
3. Configure parameters (optional)
4. Run analysis
5. Download results

---

## 1. Installation

Install the package directly from GitHub (no code duplication):

In [None]:
%%capture
# Install StrepSuis-AMRVirKM from GitHub
!pip install git+https://github.com/MK-vet/strepsuis-amrvirkm.git

## 2. Import Libraries

In [None]:
from strepsuis_amrvirkm import ClusterAnalyzer, Config, __version__
from google.colab import files
import os
import zipfile
from pathlib import Path

print(f"StrepSuis-AMRVirKM v{__version__} loaded successfully!")

## 3. Upload Data Files

Upload your CSV files:
- **Required**: `MIC.csv`, `AMR_genes.csv`, `Virulence.csv`
- **Optional**: `MLST.csv`, `Serotype.csv`, `Plasmid.csv`, `MGE.csv`

**Data Format:**
- Must have `Strain_ID` column
- Binary features: 0 = absence, 1 = presence

In [None]:
# Create data directory
!mkdir -p data

print("Please upload your CSV files:")
uploaded = files.upload()

# Move uploaded files to data directory
for filename in uploaded.keys():
    !mv "{filename}" data/
    print(f"✓ {filename} uploaded successfully")

# List uploaded files
print("\nUploaded files:")
!ls -lh data/

## 4. Configure Analysis Parameters

Adjust parameters as needed:

In [None]:
# Create configuration
config = Config(
    data_dir="data",
    output_dir="output",
    max_clusters=10,          # Maximum clusters to test
    min_clusters=2,           # Minimum clusters to test
    bootstrap_iterations=500, # Bootstrap iterations for CIs
    fdr_alpha=0.05,          # FDR correction level
    random_seed=42,          # For reproducibility
    mca_components=2,        # MCA dimensions
    generate_html=True,      # Generate HTML report
    generate_excel=True,     # Generate Excel report
    save_png_charts=True     # Save PNG charts
)

print("Configuration:")
print(f"  Data directory: {config.data_dir}")
print(f"  Output directory: {config.output_dir}")
print(f"  Cluster range: {config.min_clusters}-{config.max_clusters}")
print(f"  Bootstrap iterations: {config.bootstrap_iterations}")
print(f"  Random seed: {config.random_seed}")

## 5. Run Analysis

Execute the cluster analysis pipeline:

In [None]:
# Create analyzer
analyzer = ClusterAnalyzer(config)

# Run analysis
print("Starting cluster analysis...")
print("This may take several minutes depending on data size.")
print()

results = analyzer.run()

print("\n" + "="*50)
print("✓ Analysis completed successfully!")
print("="*50)

## 6. View Results

Check generated files:

In [None]:
print("Generated files:")
print()
!ls -lh output/

if os.path.exists("output/png_charts"):
    print("\nGenerated charts:")
    !ls -lh output/png_charts/

## 7. Download Results

Download all results as a ZIP file:

In [None]:
# Create ZIP archive of all results
import shutil
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
zip_filename = f"AMRVirKM_Results_{timestamp}.zip"

# Create zip file
shutil.make_archive(zip_filename.replace('.zip', ''), 'zip', 'output')

print(f"Results packaged: {zip_filename}")
print("Downloading...")

# Download
files.download(zip_filename)

print("\n✓ Download complete!")

---

## Help & Documentation

- **Documentation**: [https://github.com/MK-vet/strepsuis-amrvirkm](https://github.com/MK-vet/strepsuis-amrvirkm)
- **Issues**: [https://github.com/MK-vet/strepsuis-amrvirkm/issues](https://github.com/MK-vet/strepsuis-amrvirkm/issues)
- **Example Data**: Download from repository

### Citation

```bibtex
@software{strepsuis_amrvirkm2025,
  title = {StrepSuis-AMRVirKM: K-Modes Clustering of Antimicrobial Resistance and Virulence Profiles},
  author = {MK-vet},
  year = {2025},
  url = {https://github.com/MK-vet/strepsuis-amrvirkm},
  version = {1.0.0}
}
```

---