# NanoMamba - Interspeech 2026 Full Training (GPU)

**NanoMamba: Noise-Robust KWS with SA-SSM**

| Cell | 내용 | 예상 시간 |
|:----:|------|:---------:|
| 1 | 환경설정 + GSC V2 다운로드 | ~5분 |
| 2 | **전체 학습 + 평가 한번에** | ~8-12시간 |
| 3 | 결과 다운로드 | 즉시 |

⚠️ **런타임 → 런타임 유형 변경 → GPU (T4)** 선택 필수!

In [4]:
#@title Cell 1: 환경 설정 + 데이터 다운로드
import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    raise RuntimeError("GPU not available! Change runtime type to GPU.")

# Ensure we are in /content before cloning
import os
os.chdir('/content') # Change to /content

# Clean up existing repo to prevent nesting issues
if os.path.exists('NanoMamba-Interspeech2026'):
    print("Removing existing NanoMamba-Interspeech2026 directory...")
    !rm -rf NanoMamba-Interspeech2026
    print("Removed.")

# Clone repo
!git clone https://github.com/DrJinHoChoi/NanoMamba-Interspeech2026.git
%cd NanoMamba-Interspeech2026

# Download Google Speech Commands V2
DATA_DIR = './data'
os.makedirs(DATA_DIR, exist_ok=True)

if not os.path.exists(os.path.join(DATA_DIR, 'speech_commands_v0.02')):
    print("\n Downloading Google Speech Commands V2...")
    !wget -q http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz -O /tmp/gsc_v2.tar.gz
    !mkdir -p {DATA_DIR}/speech_commands_v0.02
    !tar -xzf /tmp/gsc_v2.tar.gz -C {DATA_DIR}/speech_commands_v0.02
    !rm /tmp/gsc_v2.tar.gz
    print("Download complete!")
else:
    print("Data already exists.")

# Verify
classes = [d for d in os.listdir(f'{DATA_DIR}/speech_commands_v0.02')
           if os.path.isdir(f'{DATA_DIR}/speech_commands_v0.02/{d}') and not d.startswith('_')]
print(f"\nFound {len(classes)} keyword classes")
print("Ready to train!")

PyTorch: 2.10.0+cu128
CUDA available: True
GPU: Tesla T4
VRAM: 15.6 GB
Cloning into 'NanoMamba-Interspeech2026'...
remote: Enumerating objects: 115, done.[K
remote: Counting objects: 100% (115/115), done.[K
remote: Compressing objects: 100% (89/89), done.[K
remote: Total 115 (delta 51), reused 77 (delta 24), pack-reused 0 (from 0)[K
Receiving objects: 100% (115/115), 1.03 MiB | 27.81 MiB/s, done.
Resolving deltas: 100% (51/51), done.
/content/NanoMamba-Interspeech2026

 Downloading Google Speech Commands V2...
Download complete!

Found 35 keyword classes
Ready to train!


In [9]:
# 환경 설정
!git clone https://github.com/DrJinHoChoi/NanoMamba-Interspeech2026.git 2>/dev/null; \
cd /content/NanoMamba-Interspeech2026 && git pull && \
pip install -q torch torchaudio numpy

# TinyConv2D 모델 학습 (Tiny-TC + Tiny-WS-TC)
!cd /content/NanoMamba-Interspeech2026 && python -u train_all_models.py \
  --data_dir /content/speech_commands \
  --output_dir /content/drive/MyDrive/NanoMamba \
  --models NanoMamba-Tiny-TC,NanoMamba-Tiny-WS-TC \
  --epochs 30 --batch_size 128 --lr 0.002


Already up to date.
usage: train_all_models.py [-h] [--data_dir DATA_DIR]
                           [--checkpoint_dir CHECKPOINT_DIR] [--epochs EPOCHS]
                           [--batch_size BATCH_SIZE] [--lr LR] [--eval_only]
                           [--models MODELS] [--quick] [--seed SEED]
                           [--noise_types NOISE_TYPES] [--snr_range SNR_RANGE]
                           [--per_class] [--teacher TEACHER]
                           [--teacher_checkpoint TEACHER_CHECKPOINT]
                           [--kd_alpha KD_ALPHA]
                           [--kd_temperature KD_TEMPERATURE]
train_all_models.py: error: unrecognized arguments: --output_dir /content/drive/MyDrive/NanoMamba


In [None]:
# 환경 설정
!git clone https://github.com/DrJinHoChoi/NanoMamba-Interspeech2026.git 2>/dev/null; \
cd /content/NanoMamba-Interspeech2026 && git pull && \
pip install -q torch torchaudio numpy

# TinyConv2D 모델 학습 (Tiny-TC + Tiny-WS-TC)
!cd /content/NanoMamba-Interspeech2026 && python -u train_all_models.py \
  --data_dir /content/speech_commands \
  --checkpoint_dir /content/drive/MyDrive/NanoMamba \
  --models NanoMamba-Tiny-TC,NanoMamba-Tiny-WS-TC \
  --epochs 30 --batch_size 128 --lr 0.002


Already up to date.

  SmartEar KWS - Complete Training Pipeline
  Device: cuda
  Data: /content/speech_commands
  Epochs: 30, Seed: 42
  Noise types: factory,white,babble,street,pink
  SNR range: -15,-10,-5,0,5,10,15
  Time: 2026-02-23 03:11:39

  Loading Google Speech Commands V2...
  Downloading Google Speech Commands V2 to /content/speech_commands...
100% 2.26G/2.26G [01:43<00:00, 23.4MB/s]
  torchaudio download failed: unsupported operand type(s) for /: 'str' and 'str'
  Trying manual download...
  Manual download complete!
  [training] 86843 samples, 12 classes
  [validation] 10481 samples, 12 classes
  [testing] 11505 samples, 12 classes

  Train: 86843, Val: 10481, Test: 11505

  Model Summary:
  Name                   |     Params |  Size (KB)
  --------------------------------------------------
  NanoKWS-Tiny           |      1,354 |       5.3
  NanoKWS-Small          |      2,144 |       8.4
  NanoKWS-Base           |      4,428 |      17.3
  NanoMamba-Tiny         |      4,

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!find /content -name "best.pt" -path "*Tiny-TC*" 2>/dev/null

In [8]:
# 환경 설정
!git clone https://github.com/DrJinHoChoi/NanoMamba-Interspeech2026.git 2>/dev/null; \
cd /content/NanoMamba-Interspeech2026 && git pull && \
pip install -q torch torchaudio numpy

# TinyConv2D 모델 학습 (Tiny-TC + Tiny-WS-TC)
!cd /content/NanoMamba-Interspeech2026 && python -u train_all_models.py \
  --data_dir /content/speech_commands \
  --output_dir /content/drive/MyDrive/NanoMamba \
  --models NanoMamba-Tiny-TC,NanoMamba-Tiny-WS-TC \
  --epochs 30 --batch_size 128 --lr 0.002


Already up to date.
usage: train_all_models.py [-h] [--data_dir DATA_DIR]
                           [--checkpoint_dir CHECKPOINT_DIR] [--epochs EPOCHS]
                           [--batch_size BATCH_SIZE] [--lr LR] [--eval_only]
                           [--models MODELS] [--quick] [--seed SEED]
                           [--noise_types NOISE_TYPES] [--snr_range SNR_RANGE]
                           [--per_class] [--teacher TEACHER]
                           [--teacher_checkpoint TEACHER_CHECKPOINT]
                           [--kd_alpha KD_ALPHA]
                           [--kd_temperature KD_TEMPERATURE]
train_all_models.py: error: unrecognized arguments: --output_dir /content/drive/MyDrive/NanoMamba


In [6]:
!python -u train_all_models.py \
    --data_dir ./data \
    --checkpoint_dir ./checkpoints_moe \
    --epochs 30 --batch_size 64 --seed 42 \
    --models NanoMamba-Tiny-MoE,NanoMamba-Tiny-WS-MoE \
    --noise_types factory,white,babble \
    --snr_range=-15,-10,-5,0,5,10,15 \
    --per_class


  SmartEar KWS - Complete Training Pipeline
  Device: cuda
  Data: ./data
  Epochs: 30, Seed: 42
  Noise types: factory,white,babble
  SNR range: -15,-10,-5,0,5,10,15
  Time: 2026-02-23 02:26:17

  Loading Google Speech Commands V2...
  Downloading Google Speech Commands V2 to data...
100% 2.26G/2.26G [01:43<00:00, 23.6MB/s]
  torchaudio download failed: unsupported operand type(s) for /: 'str' and 'str'
  Trying manual download...
  Manual download complete!
  [training] 86843 samples, 12 classes
  [validation] 10481 samples, 12 classes
  [testing] 11505 samples, 12 classes

  Train: 86843, Val: 10481, Test: 11505

  Model Summary:
  Name                   |     Params |  Size (KB)
  --------------------------------------------------
  NanoKWS-Tiny           |      1,354 |       5.3
  NanoKWS-Small          |      2,144 |       8.4
  NanoKWS-Base           |      4,428 |      17.3
  NanoMamba-Tiny         |      4,636 |      18.1
  NanoMamba-Small        |     12,035 |      47.0
  Na

In [None]:
#@title Cell 2: 전체 학습 + 노이즈 평가 (한번에 실행)
#@markdown ### 학습 모델 (9종)
#@markdown - NanoMamba-Tiny (4,634), Small (12,032)
#@markdown - BC-ResNet-1 (7,464), BC-ResNet-3 (43,200), DS-CNN-S (23,756)
#@markdown - SA-SSM Ablation: Full, dt_only, b_only, Standard
#@markdown ### 노이즈 평가
#@markdown - 3 noise types (factory, white, babble) x 7 SNR (-15~+15dB)

import subprocess, sys

ALL_MODELS = ",".join([
    # Proposed
    "NanoMamba-Tiny", "NanoMamba-Small",
    # Baselines
    "BC-ResNet-1", "BC-ResNet-3", "DS-CNN-S",
    # SA-SSM Ablation
    "NanoMamba-Tiny-Full", "NanoMamba-Tiny-dtOnly",
    "NanoMamba-Tiny-bOnly", "NanoMamba-Tiny-Standard",
])

cmd = [
    sys.executable, "-u", "train_all_models.py",
    "--data_dir", "./data",
    "--checkpoint_dir", "./checkpoints_full",
    "--epochs", "30",
    "--batch_size", "64",
    "--seed", "42",
    "--models", ALL_MODELS,
    "--noise_types", "factory,white,babble",
    "--snr_range=-15,-10,-5,0,5,10,15",
    "--per_class",
]

print(f"Running: {' '.join(cmd)}\n")
process = subprocess.Popen(cmd, stdout=sys.stdout, stderr=sys.stderr)
process.wait()

if process.returncode == 0:
    print("\n" + "="*60)
    print("  ALL TRAINING + EVALUATION COMPLETE!")
    print("="*60)
else:
    print(f"\nProcess exited with code {process.returncode}")

In [1]:
cd /content/NanoMamba-Interspeech2026 && git pull && python -u train_all_models.py \
    --data_dir ./data \
    --checkpoint_dir ./checkpoints_freqconv \
    --epochs 30 --batch_size 64 --seed 42 \
    --models NanoMamba-Tiny-FC \
    --noise_types factory,white,babble \
    --snr_range=-15,-10,-5,0,5,10,15 \
    --per_class

SyntaxError: invalid syntax (ipython-input-1468492428.py, line 1)

In [3]:
# 핵심 모델: WS-MoE (BC-ResNet-1의 절반)
!cd /content/NanoMamba-Interspeech2026 && git pull && python -u train_all_models.py \
    --data_dir ./data \
    --checkpoint_dir ./checkpoints_moe \
    --epochs 30 --batch_size 64 --seed 42 \
    --models NanoMamba-Tiny-MoE,NanoMamba-Tiny-WS-MoE \
    --noise_types factory,white,babble \
    --snr_range=-15,-10,-5,0,5,10,15 \
    --per_class

/bin/bash: line 1: cd: /content/NanoMamba-Interspeech2026: No such file or directory


In [None]:
#@title Cell 3: 결과 확인 + 다운로드
import json
import glob

# Find result files
result_files = glob.glob('checkpoints_full/results/*.json')
print(f"Found {len(result_files)} result files:")
for f in sorted(result_files):
    print(f"  {f}")

# Show latest results
if result_files:
    latest = sorted(result_files)[-1]
    with open(latest) as f:
        results = json.load(f)

    print(f"\n{'='*70}")
    print(f"  Results from: {latest}")
    print(f"{'='*70}")

    # Clean accuracy table
    if 'model_results' in results:
        print(f"\n{'Model':<30} {'Params':>8} {'Val':>8} {'Test':>8}")
        print('-' * 58)
        for name, data in results['model_results'].items():
            val = data.get('best_val_acc', '-')
            test = data.get('test_acc', '-')
            params = data.get('params', '-')
            val_str = f"{val:.2f}%" if isinstance(val, (int, float)) else str(val)
            test_str = f"{test:.2f}%" if isinstance(test, (int, float)) else str(test)
            print(f"{name:<30} {str(params):>8} {val_str:>8} {test_str:>8}")

    # Noise robustness table
    if 'noise_results' in results:
        print(f"\n{'='*70}")
        print("  Noise Robustness Results")
        print(f"{'='*70}")
        noise_data = results['noise_results']
        for model_name, model_noise in noise_data.items():
            print(f"\n  {model_name}:")
            for noise_type, snr_results in model_noise.items():
                snr_str = ", ".join([f"{snr}dB:{acc:.1f}%"
                                     for snr, acc in sorted(snr_results.items(),
                                     key=lambda x: float(x[0]) if x[0] != 'clean' else 999)])
                print(f"    {noise_type}: {snr_str}")

# Zip all results for download
!zip -r /content/smartear_results.zip checkpoints_full/

from google.colab import files
files.download('/content/smartear_results.zip')
print("\nResults downloaded!")

In [None]:
from google.colab import files
files.download('/content/NanoMamba-Interspeech2026/checkpoints_full/NanoMamba-Small/best.pt')


In [None]:
# ============================================================
# BC-ResNet-3 Teacher 학습 (새 Colab 세션)
# ============================================================

# 1. 저장소 클론 + 데이터 준비
!git clone https://github.com/DrJinHoChoi/NanoMamba-Interspeech2026.git
%cd /content/NanoMamba-Interspeech2026

# 2. BC-ResNet-3 학습 (~2시간)
!python -u train_all_models.py \
    --data_dir /content/NanoMamba-Interspeech2026/data \
    --checkpoint_dir /content/NanoMamba-Interspeech2026/checkpoints_teacher \
    --epochs 30 --batch_size 64 --seed 42 \
    --models BC-ResNet-3


Cloning into 'NanoMamba-Interspeech2026'...
remote: Enumerating objects: 99, done.[K
remote: Counting objects: 100% (99/99), done.[K
remote: Compressing objects: 100% (76/76), done.[K
remote: Total 99 (delta 40), reused 72 (delta 21), pack-reused 0 (from 0)[K
Receiving objects: 100% (99/99), 1.02 MiB | 18.97 MiB/s, done.
Resolving deltas: 100% (40/40), done.
/content/NanoMamba-Interspeech2026

  SmartEar KWS - Complete Training Pipeline
  Device: cuda
  Data: /content/NanoMamba-Interspeech2026/data
  Epochs: 30, Seed: 42
  Noise types: factory,white,babble,street,pink
  SNR range: -15,-10,-5,0,5,10,15
  Time: 2026-02-22 02:31:10

  Loading Google Speech Commands V2...
  Downloading Google Speech Commands V2 to /content/NanoMamba-Interspeech2026/data...
100% 2.26G/2.26G [00:10<00:00, 242MB/s]
  torchaudio download failed: unsupported operand type(s) for /: 'str' and 'str'
  Trying manual download...
  Manual download complete!
  [training] 86843 samples, 12 classes
  [validation] 10

In [None]:
# FF + WS 학습
!cd /content/NanoMamba-Interspeech2026 && python -u train_all_models.py \
    --data_dir ./data \
    --checkpoint_dir ./checkpoints_variants \
    --epochs 30 --batch_size 64 --seed 42 \
    --models NanoMamba-Tiny-FF,NanoMamba-Small-FF,NanoMamba-Tiny-WS,NanoMamba-Tiny-WS-FF \
    --noise_types factory,white,babble \
    --snr_range=-15,-10,-5,0,5,10,15 \
    --per_class



  SmartEar KWS - Complete Training Pipeline
  Device: cuda
  Data: ./data
  Epochs: 30, Seed: 42
  Noise types: factory,white,babble
  SNR range: -15,-10,-5,0,5,10,15
  Time: 2026-02-22 07:37:43

  Loading Google Speech Commands V2...
  [training] 86843 samples, 12 classes
  [validation] 10481 samples, 12 classes
  [testing] 11505 samples, 12 classes

  Train: 86843, Val: 10481, Test: 11505

  Model Summary:
  Name                   |     Params |  Size (KB)
  --------------------------------------------------
  NanoKWS-Tiny           |      1,354 |       5.3
  NanoKWS-Small          |      2,144 |       8.4
  NanoKWS-Base           |      4,428 |      17.3
  NanoMamba-Tiny         |      4,636 |      18.1
  NanoMamba-Small        |     12,035 |      47.0
  NanoMamba-Base         |     40,738 |     159.1
  NanoMamba-Tiny-FF      |      4,893 |      19.1
  NanoMamba-Small-FF     |     12,292 |      48.0
  NanoMamba-Tiny-WS      |      3,761 |      14.7
  NanoMamba-Tiny-WS-FF   |      4