AngelCare Post-Training App

Streamlit app for fine-tuning NVIDIA Cosmos Reason 2 on elder care safety classification. Supports Full SFT, LoRA, and QLoRA training methods with built-in dataset management.

Nebius Instance Setup

Create a GPU VM on Nebius AI Cloud with the following specs:

Setting	QLoRA / LoRA (recommended)	Full SFT
GPU	1x H100 80GB or 1x H200 141GB	8x H100 or 8x H200
Preset	`1gpu-16vcpu-200gb`	`8gpu-128vcpu-1600gb`
vCPUs	16	128
RAM	200 GiB	1600 GiB
Boot disk	200 GiB	200 GiB
OS	Ubuntu 22.04 LTS (with CUDA)	Ubuntu 22.04 LTS (with CUDA)

QLoRA on Cosmos Reason2 2B fits in ~5GB VRAM. A single H100 is more than enough. Full SFT via cosmos-rl needs multi-GPU parallelism (tp_size + dp_shard_size).

Quick Start

SSH into your Nebius instance, then:

git clone <repo-url> && cd post-train-app
chmod +x run.sh download_datasets.sh
./run.sh                # installs deps, downloads datasets, clones cosmos libs
source .venv/bin/activate
streamlit run app.py --server.port 8501 --server.address 0.0.0.0

Or step by step:

pip install -r requirements.txt   # Python deps
./download_datasets.sh            # GMDC + Harvard FallVision datasets
streamlit run app.py

Datasets

Downloaded automatically by download_datasets.sh into datasets/:

Dataset	Videos	Source	Auto-label
GMDC-SA24	160	GitHub / Zenodo	CSV descriptions -> 8 classes
Harvard FallVision	200+	Harvard Dataverse	All fall (class 0)
Personal clips	-	Manual	`bad/` = fall, `good/` = daily
Custom	-	Upload in app	Manual annotation

App Tabs

Upload -- Upload video files
Convert -- Transcode to MP4 via ffmpeg
Annotate -- Label videos (freeform QA, MCQ, or safety classification)
Dataset Builder -- Scan external datasets, merge with custom annotations, stratified train/test split, class distribution chart
Post-train -- Train with Full SFT (cosmos-rl), LoRA, or QLoRA (TRL/PEFT). Merge adapter into base model for standalone inference.
Evaluate -- Compare base vs fine-tuned model accuracy
Export to HF -- Upload merged model to Hugging Face Hub

Training Methods

Method	Backend	VRAM	Output
Full SFT	cosmos-rl	High (multi-GPU)	Full checkpoint
LoRA	TRL + PEFT	Medium	`adapter/` + `merged/`
QLoRA	TRL + PEFT + BitsAndBytes	Low (~5GB)	`adapter/` + `merged/`

Project Structure

post-train-app/
├── app.py                      # Streamlit app
├── run.sh                      # One-command setup (deps + datasets + cosmos libs)
├── download_datasets.sh        # Dataset downloader
├── requirements.txt
├── src/
│   ├── dataset_manager.py      # Dataset scanning, LLaVA conversion, merge, split
│   ├── train_trl.py            # LoRA/QLoRA training script (subprocess)
│   ├── post_train_cosmosrl.py  # cosmos-rl SFT + TRL subprocess launchers
│   ├── llava_builder.py        # LLaVA format builders (MCQ, freeform, AngelCare)
│   ├── paths.py                # App directory structure
│   ├── annotations.py          # Annotation persistence
│   ├── video_convert.py        # ffmpeg conversion
│   ├── evaluate_cosmos.py      # Evaluation runner
│   └── hf_export.py            # HuggingFace upload
├── templates/
│   ├── sft_template.config.toml
│   └── eval_config.yaml
├── datasets/                   # Downloaded by download_datasets.sh (gitignored)
└── tmp/                        # Runtime artifacts (gitignored)

Safety Classes

ID	Label	Risk
0	Fall Detected	CRITICAL
1	Immobility Alert	HIGH
2	Unsteady Movement	MEDIUM
3	Distress Posture	HIGH
4	Normal Walking	SAFE
5	Normal Sitting	SAFE
6	Normal Daily Activity	SAFE
7	Resting or Sleeping	SAFE

Citations

@article{alam2024,
  title={GMDCSA24: A Dataset for Human Fall Detection in Videos},
  author={Alam, Ekram and Sufian, Abu and Dutta, Paramartha and Leo, Marco},
  journal={Data in Brief},
  year={2024}
}

@data{DVN/75QPKK,
  title={FallVision: A benchmark video dataset for fall detection},
  publisher={Harvard Dataverse},
  doi={10.7910/DVN/75QPKK}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AngelCare Post-Training App

Nebius Instance Setup

Quick Start

Datasets

App Tabs

Training Methods

Project Structure

Safety Classes

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
templates		templates
.gitignore		.gitignore
NEBIUS_SETUP.md		NEBIUS_SETUP.md
README.md		README.md
app.py		app.py
download_datasets.sh		download_datasets.sh
requirements.txt		requirements.txt
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

AngelCare Post-Training App

Nebius Instance Setup

Quick Start

Datasets

App Tabs

Training Methods

Project Structure

Safety Classes

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages