Reliability-Aware Trimodal Disaster Severity Assessment Demo Package

This directory contains the self-contained demo and inference package for the Reliability-Aware Trimodal Disaster Severity Assessment project. It provides all model architectures, locked configurations, final checkpoints, and aligned test dataset features required to run the GUI demo on another machine.

1. What This Package Contains

Final Fusion B2 Model: The promoted calibrated reliability-weighted fusion model (checkpoints/fusion_b2_best_model.pt).
Branch Checkpoints: Best checkpoints for the Satellite (SiameseUNet), Social Text (BiLSTM & RoBERTa), and Social Image (ResNet34 & EfficientNet-B2) branches.
Aligned Precomputed Test Data ($N=368$): Extracted predictions, probabilities, reliability scores, and embeddings for all branches, enabling instant Quick Mode execution.
Quick Inference Script: demo_inference.py for evaluating samples, listing test cases, and explaining B2 gating decisions.
Package Verification Script: verify_demo_package.py to ensure package integrity, sizes, shapes, and checkpoints.

2. System Overview

Our trimodal framework integrates satellite imagery and social media streams into a unified severity classifier:

Pre/Post Satellite Pair ────→ [SiameseUNet] ───→ Probabilities & Reliability ┐
                                                                              │
Social Text Tweet Text ─────→ [BiLSTM] ────────→ Probabilities & Reliability ├─→ [Fusion B2] ─→ Final Severity
                                                                              │   (Gated Weighting)
Social Media Image ─────────→ [EfficientNet] ──→ Probabilities & Reliability ┘

Modality Gating

Fusion B2 (Calibrated Reliability-Aware Fusion) projects heterogeneous branch outputs into a unified fusion space.
Branches are dynamically weighted using their computed Reliability Scores adjusted by a temperature parameter ($\tau=0.5$) and a focal loss calibration layer to concentrate weight on more reliable branches.

3. Datasets

xBD (Satellite): Overlapping pre- and post-disaster RGB tile pairs for damage segmentation.
HumAID (Social Text): Crisis-related tweets categorized by humanitarian tasks.
CrisisMMD (Social Image + Text): Geotagged tweets containing text-image pairs with severity annotations.

Note

Why Harvey and Mexico Only? The trimodal alignment is constrained to Hurricane Harvey and the Mexico Earthquake because these are the only overlapping events represented across all three disjoint source datasets (xBD, HumAID, and CrisisMMD).

4. Locked Test Set ($N=368$)

The official locked evaluation set contains 368 aligned samples:

Hurricane Harvey: 309 samples
Mexico Earthquake: 59 samples
Sample IDs are listed in config/eval_sample_ids.txt (MD5: 1f6c42d86c5630841a35c5ccc741a079 when LF normalized).

5. Canonical Metrics

The B2 model was promoted based on the following Set B canonical evaluation metrics:

Metric	Canonical Value (Set B)	Legacy Value (Set A)*	Status vs Promotion Gate
Accuracy	0.8179	0.7989	✅ PASSED
Macro-F1	0.7678	0.7516	✅ PASSED (> 0.7403)
Weighted-F1	0.8153	0.7963	✅ PASSED
F1-None	0.7191	0.6974	✅ PASSED
F1-Minor	0.7039	0.6779	✅ PASSED
F1-Severe	0.8803	0.8647	✅ PASSED
Harvey Macro-F1	0.7755	0.7533	✅ PASSED
Mexico Macro-F1	0.7096	0.7096	✅ PASSED (> 0.50)

Warning

Canonical Results Warning: Always use the Set B metrics for publication and demo validation. The legacy training-loop metrics (Set A: Accuracy=0.7989, Macro-F1=0.7516) are obsolete.

6. Reliability Formulas

Branch reliability scores are derived directly from model confidence and annotation quality:

Satellite Branch Reliability: $$\text{Reliability} = \text{mean_confidence} \times (1 - \text{entropy_norm})$$ where $\text{entropy_norm}$ represents the Shannon entropy of pixel classification maps normalized by $\log_2(5)$.
Social Text Reliability: $$\text{Reliability} = 0.5 \times \text{model_confidence} + 0.5$$
Social Image Reliability: $$\text{Reliability} = 0.4 \times \text{annotation_confidence} + 0.4 \times \text{model_confidence} + 0.2$$

7. Quick-Start Commands

Environment Setup

pip install -r requirements_demo.txt

Package Verification

Run the verification suite to ensure all dimensions, hashes, and checkpoints are intact:

python verify_demo_package.py

Running Inference (Quick Mode - Default)

List samples in the test set:
```
python demo_inference.py --list-samples
```
Inference by Index (0 to 367):
```
python demo_inference.py --index 0
```

Inference by Sample ID:

python demo_inference.py --sample-id 905930890735439873

Filter by Event:

python demo_inference.py --event hurricane_harvey --limit 5

Running Inference (Checkpoints Mode)

To run the PyTorch forward pass on precomputed embeddings:

python demo_inference.py --run-fusion --index 0

8. GUI Integration Guide

For wrapping this package in a GUI frontend (e.g., Streamlit, Gradio, or Electron):

Default Examples: Query test_data/demo_sample_index.json to populate a select box with 6 hand-picked showcase samples representing Harvey and Mexico (covering successes, failures, and low-reliability edge-cases).
Table Views: Load test_data/aligned_fusion_test_368.csv to display the overall performance grid.
Visualization Dashboard: Plot the branch gating weights (fusion_weight_sat, fusion_weight_text, fusion_weight_image) as a pie chart or horizontal bar chart to illustrate the dynamic trimodal attention.

9. Folder Structure

demo_model/
├── README.md               # Presentation readme
├── requirements_demo.txt   # Demo requirements
├── fusion_model.py         # Fusion B2 model definition
├── fusion_dataset.py       # Aligned dataset loader
├── demo_inference.py       # CLI inference script
├── verify_demo_package.py  # Integrity verification script
├── social_text_baseline_b.py   # Text BiLSTM architecture
├── social_text_baseline_c.py   # Text RoBERTa architecture
├── social_image_baseline_a.py  # Image ResNet34 architecture
├── social_image_baseline_bc.py # Image EfficientNet architecture
├── models/                 # Satellite models
│   ├── siamese_unet.py
│   ├── satellite_unet_baseline.py
│   └── changeformer.py
├── config/                 # Locked config and ID files
│   ├── test_ids.json
│   ├── taxonomy_map.json
│   ├── reliability_weights.json
│   ├── blacklist_unique.json
│   ├── satellite_branch_locked.json
│   ├── branch_file_hashes.json
│   ├── eval_sample_ids.txt
│   └── eval_sample_ids_hash.txt
├── checkpoints/            # PyTorch best models weight checkpoints
│   ├── satellite_siamese_best_model.pt
│   ├── text_bilstm_best_model.pt
│   ├── text_roberta_best_model.pt
│   ├── image_resnet34_best_model.pt
│   ├── image_efficientnet_best_model.pt
│   └── fusion_b2_best_model.pt
├── test_data/              # Precomputed aligned test features (N=368)
│   ├── aligned_fusion_test_368.csv
│   ├── aligned_metadata_368.json
│   ├── demo_sample_index.json
│   ├── sat_embeddings_368.npy
│   ├── sat_predictions_368.csv
│   ├── text_bilstm_embeddings_368.npy
│   ├── text_bilstm_predictions_368.csv
│   ├── text_roberta_embeddings_368.npy
│   ├── text_roberta_predictions_368.csv
│   ├── image_resnet34_embeddings_368.npy
│   ├── image_resnet34_predictions_368.csv
│   ├── image_effnet_embeddings_368.npy
│   ├── image_effnet_predictions_368.csv
│   ├── labels_368.npy
│   └── sample_ids_368.txt
├── raw_branch_exports/     # Complete branch outputs for auditability
└── logs/                   # Verification logs
    ├── file_hashes.json
    ├── missing_files.json
    └── verification_report.md

10. Limitations

Small Test Set: Trimodal overlap limits the evaluation test set to 368 samples.
Geographic Domain Gap: Satellite damage detection fails on Mexico Earthquake due to training dominance on flood-damaged Harvey tiles (flood textures vs building collapses).
Quick Mode Dependency: Real-time feature extraction on raw tweets/images requires heavy encoders (RoBERTa, EfficientNet) which are bypassed by Quick Mode's pre-computed arrays.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reliability-Aware Trimodal Disaster Severity Assessment Demo Package

1. What This Package Contains

2. System Overview

Modality Gating

3. Datasets

4. Locked Test Set ($N=368$)

5. Canonical Metrics

6. Reliability Formulas

7. Quick-Start Commands

Environment Setup

Package Verification

Running Inference (Quick Mode - Default)

Running Inference (Checkpoints Mode)

8. GUI Integration Guide

9. Folder Structure

10. Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
checkpoints		checkpoints
config		config
instruction		instruction
real_report		real_report
recorrect_ids		recorrect_ids
test_data		test_data
web_demo		web_demo
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
demo_inference.py		demo_inference.py
fusion_dataset.py		fusion_dataset.py
fusion_model.py		fusion_model.py
requirements_demo.txt		requirements_demo.txt
social_image_baseline_a.py		social_image_baseline_a.py
social_image_baseline_bc.py		social_image_baseline_bc.py
social_text_baseline_b.py		social_text_baseline_b.py
social_text_baseline_c.py		social_text_baseline_c.py
test_inference.py		test_inference.py
verify_demo_package.py		verify_demo_package.py

Folders and files

Latest commit

History

Repository files navigation

Reliability-Aware Trimodal Disaster Severity Assessment Demo Package

1. What This Package Contains

2. System Overview

Modality Gating

3. Datasets

4. Locked Test Set ($N=368$)

5. Canonical Metrics

6. Reliability Formulas

7. Quick-Start Commands

Environment Setup

Package Verification

Running Inference (Quick Mode - Default)

Running Inference (Checkpoints Mode)

8. GUI Integration Guide

9. Folder Structure

10. Limitations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages