Streamlit app for fine-tuning NVIDIA Cosmos Reason 2 on elder care safety classification. Supports Full SFT, LoRA, and QLoRA training methods with built-in dataset management.
Create a GPU VM on Nebius AI Cloud with the following specs:
| Setting | QLoRA / LoRA (recommended) | Full SFT |
|---|---|---|
| GPU | 1x H100 80GB or 1x H200 141GB | 8x H100 or 8x H200 |
| Preset | 1gpu-16vcpu-200gb |
8gpu-128vcpu-1600gb |
| vCPUs | 16 | 128 |
| RAM | 200 GiB | 1600 GiB |
| Boot disk | 200 GiB | 200 GiB |
| OS | Ubuntu 22.04 LTS (with CUDA) | Ubuntu 22.04 LTS (with CUDA) |
QLoRA on Cosmos Reason2 2B fits in ~5GB VRAM. A single H100 is more than enough. Full SFT via cosmos-rl needs multi-GPU parallelism (tp_size + dp_shard_size).
SSH into your Nebius instance, then:
git clone <repo-url> && cd post-train-app
chmod +x run.sh download_datasets.sh
./run.sh # installs deps, downloads datasets, clones cosmos libs
source .venv/bin/activate
streamlit run app.py --server.port 8501 --server.address 0.0.0.0Or step by step:
pip install -r requirements.txt # Python deps
./download_datasets.sh # GMDC + Harvard FallVision datasets
streamlit run app.pyDownloaded automatically by download_datasets.sh into datasets/:
| Dataset | Videos | Source | Auto-label |
|---|---|---|---|
| GMDC-SA24 | 160 | GitHub / Zenodo | CSV descriptions -> 8 classes |
| Harvard FallVision | 200+ | Harvard Dataverse | All fall (class 0) |
| Personal clips | - | Manual | bad/ = fall, good/ = daily |
| Custom | - | Upload in app | Manual annotation |
- Upload -- Upload video files
- Convert -- Transcode to MP4 via ffmpeg
- Annotate -- Label videos (freeform QA, MCQ, or safety classification)
- Dataset Builder -- Scan external datasets, merge with custom annotations, stratified train/test split, class distribution chart
- Post-train -- Train with Full SFT (cosmos-rl), LoRA, or QLoRA (TRL/PEFT). Merge adapter into base model for standalone inference.
- Evaluate -- Compare base vs fine-tuned model accuracy
- Export to HF -- Upload merged model to Hugging Face Hub
| Method | Backend | VRAM | Output |
|---|---|---|---|
| Full SFT | cosmos-rl | High (multi-GPU) | Full checkpoint |
| LoRA | TRL + PEFT | Medium | adapter/ + merged/ |
| QLoRA | TRL + PEFT + BitsAndBytes | Low (~5GB) | adapter/ + merged/ |
post-train-app/
├── app.py # Streamlit app
├── run.sh # One-command setup (deps + datasets + cosmos libs)
├── download_datasets.sh # Dataset downloader
├── requirements.txt
├── src/
│ ├── dataset_manager.py # Dataset scanning, LLaVA conversion, merge, split
│ ├── train_trl.py # LoRA/QLoRA training script (subprocess)
│ ├── post_train_cosmosrl.py # cosmos-rl SFT + TRL subprocess launchers
│ ├── llava_builder.py # LLaVA format builders (MCQ, freeform, AngelCare)
│ ├── paths.py # App directory structure
│ ├── annotations.py # Annotation persistence
│ ├── video_convert.py # ffmpeg conversion
│ ├── evaluate_cosmos.py # Evaluation runner
│ └── hf_export.py # HuggingFace upload
├── templates/
│ ├── sft_template.config.toml
│ └── eval_config.yaml
├── datasets/ # Downloaded by download_datasets.sh (gitignored)
└── tmp/ # Runtime artifacts (gitignored)
| ID | Label | Risk |
|---|---|---|
| 0 | Fall Detected | CRITICAL |
| 1 | Immobility Alert | HIGH |
| 2 | Unsteady Movement | MEDIUM |
| 3 | Distress Posture | HIGH |
| 4 | Normal Walking | SAFE |
| 5 | Normal Sitting | SAFE |
| 6 | Normal Daily Activity | SAFE |
| 7 | Resting or Sleeping | SAFE |
@article{alam2024,
title={GMDCSA24: A Dataset for Human Fall Detection in Videos},
author={Alam, Ekram and Sufian, Abu and Dutta, Paramartha and Leo, Marco},
journal={Data in Brief},
year={2024}
}
@data{DVN/75QPKK,
title={FallVision: A benchmark video dataset for fall detection},
publisher={Harvard Dataverse},
doi={10.7910/DVN/75QPKK}
}