# Dementia Classification (Audio + ASR Text)

## Abstract
*(Write one paragraph: problem, data source, methods, and the insights you plan to extract.)*



## Introduction

## Problem Addressed

## Motivation

## Previous Work

## Dataset + EDA

## Project Schedule and Budget

## Technical Approach

## Main Results

## Explainability + Robustness

## Discussion

## Future Work



In [None]:
# Keep code minimal in the notebook; import from dementia_project/ modules.
import json
from pathlib import Path

import pandas as pd


def load_metrics(run_dir: str) -> dict:
    return json.loads(Path(run_dir, "metrics.json").read_text())


runs = {
    "nonml_scaled": "runs/nonml_baseline_scaled",
    "wav2vec2_full_cuda": "runs/wav2vec2_baseline_full_cuda",
    "densenet_full_cuda": "runs/densenet_spec_full_cuda",
}

rows = []
for name, rdir in runs.items():
    m = load_metrics(rdir)
    for split in ["train", "valid", "test"]:
        rows.append(
            {
                "model": name,
                "split": split,
                "accuracy": m[split].get("accuracy"),
                "f1": m[split].get("f1"),
                "roc_auc": m[split].get("roc_auc"),
            }
        )

df_results = pd.DataFrame(rows)
df_results



## Step-by-step: What code runs (module-by-module)

This section is a **walkthrough of every Python module** in `dementia_project/`, in the order we run them.

### 0) Project entrypoints (where things live)
- Code package: `dementia_project/`
- Config: `configs/default.yaml`
- Processed artifacts: `data/processed/`
- Experiment outputs: `runs/`



### 1) Build metadata (audio inventory + join to CSV)
**Module**: `dementia_project/data/build_metadata.py`

**What it does**
- Scans both class folders for `.wav`
- Computes audio duration/sample rate
- Joins dementia-side subjects to `DementiaNet - dementia.csv`
- Assigns control subjects from folder names

**Produces**
- `data/processed/metadata.csv`
- `data/processed/dropped.csv`
- `data/processed/metadata_report.json`

**Command**
```bash
poetry run python -m dementia_project.data.build_metadata \
  --dementia_dir "dementia-20251217T041331Z-1-001" \
  --control_dir "nodementia-20251217T041501Z-1-001" \
  --dementia_csv "DementiaNet - dementia.csv" \
  --out_dir "data/processed"
```

**Helper used**
- `dementia_project/data/name_normalize.py`: `normalize_person_name()` used for robust matching.



### 2) Build splits (subject-level train/valid/test)
**Modules**
- `dementia_project/data/splitting.py`: implements the hybrid split logic.
- `dementia_project/data/build_splits.py`: CLI wrapper that writes outputs.

**What it does**
- Creates `train/valid/test` splits
- Enforces **subject-level separation** using `person_name_norm`
- Uses CSV `datasplit` when available; otherwise assigns deterministically

**Produces**
- `data/processed/splits.csv`
- `data/processed/splits_report.json`

**Command**
```bash
poetry run python -m dementia_project.data.build_splits \
  --metadata_csv "data/processed/metadata.csv" \
  --out_dir "data/processed"
```

**Small I/O helpers**
- `dementia_project/data/io.py`: `load_metadata()` and `load_splits()`.



### 3) Segmentation manifests (time windows)
**Modules**
- `dementia_project/segmentation/time_windows.py`: generates window start/end times.
- `dementia_project/segmentation/build_manifests.py`: CLI wrapper that writes outputs.

**What it does**
- Creates fixed-length windows (e.g., 2s with 0.5s hop) for audio baselines.

**Produces**
- `data/processed/time_segments.csv`

**Command**
```bash
poetry run python -m dementia_project.segmentation.build_manifests \
  --metadata_csv "data/processed/metadata.csv" \
  --splits_csv "data/processed/splits.csv" \
  --out_dir "data/processed" \
  --window_sec 2.0 \
  --hop_sec 0.5
```



### 4) Baseline 1 — Non-ML audio (MFCC + pause stats)
**Modules**
- `dementia_project/features/audio_features.py`: MFCC + RMS + pause proxy features
- `dementia_project/train/train_nonml.py`: trains/evaluates Logistic Regression baseline

**Produces**
- `runs/nonml_baseline_scaled/metrics.json`
- `runs/nonml_baseline_scaled/confusion_matrix_test.png`

**Command**
```bash
poetry run python -m dementia_project.train.train_nonml \
  --metadata_csv "data/processed/metadata.csv" \
  --splits_csv "data/processed/splits.csv" \
  --out_dir "runs/nonml_baseline_scaled"
```

**Plot helper**
- `dementia_project/viz/metrics.py`: writes the confusion matrix PNG.



### 5) Baseline 2 — Audio-only Wav2Vec2 embeddings
**Modules**
- `dementia_project/features/wav2vec2_embed.py`: loads Wav2Vec2 + mean-pools embeddings
- `dementia_project/train/train_wav2vec2_nonml.py`: trains/evaluates sklearn classifier on embeddings

**Produces**
- `runs/wav2vec2_baseline_full_cuda/metrics.json`
- `runs/wav2vec2_baseline_full_cuda/confusion_matrix_test.png`

**Command (full dataset)**
```bash
poetry run python -m dementia_project.train.train_wav2vec2_nonml \
  --metadata_csv "data/processed/metadata.csv" \
  --splits_csv "data/processed/splits.csv" \
  --out_dir "runs/wav2vec2_baseline_full_cuda" \
  --max_audio_sec 10
```

**Note on CUDA**
- We switched Poetry’s torch to CUDA (`torch 2.6.0+cu124`), so embedding extraction uses the GPU.



### 6) Baseline 3 — DenseNet on spectrograms
**Modules**
- `dementia_project/features/spectrograms.py`: creates log-mel spectrogram tensors
- `dementia_project/train/train_densenet_spec.py`: trains/evaluates DenseNet baseline

**Produces**
- `runs/densenet_spec_full_cuda/metrics.json`
- `runs/densenet_spec_full_cuda/confusion_matrix_test.png`

**Command (full dataset)**
```bash
poetry run python -m dementia_project.train.train_densenet_spec \
  --metadata_csv "data/processed/metadata.csv" \
  --splits_csv "data/processed/splits.csv" \
  --out_dir "runs/densenet_spec_full_cuda" \
  --epochs 5 \
  --batch_size 16 \
  --max_audio_sec 8
```



### 7) ASR (audio → transcript + word timestamps)
**Modules**
- `dementia_project/asr/transcribe.py`: Whisper ASR backend (transformers pipeline) producing `words.json`
- `dementia_project/asr/run_asr.py`: CLI runner + caching + `asr_manifest.csv`

**Produces**
- `data/processed/asr_whisper/<audio_id>/transcript.json`
- `data/processed/asr_whisper/<audio_id>/words.json`
- `data/processed/asr_whisper/asr_manifest.csv`

**Command (example sanity run)**
```bash
poetry run python -m dementia_project.asr.run_asr \
  --metadata_csv "data/processed/metadata.csv" \
  --out_dir "data/processed/asr_whisper" \
  --limit 5 \
  --model_name "openai/whisper-tiny" \
  --language en \
  --task transcribe
```

**Command (full run, resumable)**
```bash
poetry run python -m dementia_project.asr.run_asr \
  --metadata_csv "data/processed/metadata.csv" \
  --out_dir "data/processed/asr_whisper" \
  --model_name "openai/whisper-tiny" \
  --language en \
  --task transcribe
```



### 8) (Next) Text-only + Fusion model
We will add next:
- **Text-only baseline**: Transformer classifier on `transcript.json`
- **Fusion model**: cross-attention between text embeddings and word-level audio embeddings

Planned new modules will live under:
- `dementia_project/models/`
- `dementia_project/train/`
- `dementia_project/segmentation/` (word-level segments derived from `words.json`)

