Hi CompVis team,
While auditing the PyTorch architecture in depthfm/dfm.py, I discovered two practical math/logic bugs that can cause silent NaN propagation and heavily bias the ensemble generation outputs.
I have included the exact line numbers, the fixes, and a zero-dependency pure-Python reproducer script below.
1. ZeroDivisionError / NaN propagation in per_sample_min_max_normalization
Location: depthfm/dfm.py line 156
If an image is passed to the model that results in a perfectly uniform depth prediction (e.g., a featureless sky, a solid color, or a temporary model collapse during fine-tuning), max_val will equal min_val. The denominator (max_val - min_val) becomes 0, resulting in a tensor flooded with NaNs that fails silently and crashes downstream operations.
Fix: Add a small epsilon guard to the division.
# depthfm/dfm.py
- x_ = (x_ - min_val) / (max_val - min_val)
+ x_ = (x_ - min_val) / (max_val - min_val + 1e-8)
2. Scale Bias in Ensemble Averaging
Location: depthfm/dfm.py lines 86-90 In the forward() method, the ensemble_size averaging happens before normalization.
python
if ensemble_size > 1:
depth = depth.mean(dim=0, keepdim=True)
depth = per_sample_min_max_normalization(depth.exp())
Latent diffusion outputs can have vastly different global logit scales depending on the noise seed. Because the code exponentiates and averages the raw outputs before normalizing them to [0, 1], whichever ensemble member happens to produce the largest raw logits completely dominates the mean. This functionally ruins the ensemble variance reduction. Fix: Normalize the ensemble members individually, then average them.
python
# depthfm/dfm.py
- if ensemble_size > 1:
- depth = depth.mean(dim=0, keepdim=True)
- depth = per_sample_min_max_normalization(depth.exp())
+ depth = per_sample_min_max_normalization(depth.exp())
+ if ensemble_size > 1:
+ depth = depth.mean(dim=0, keepdim=True)
🐍 Pure Python Reproducer
You can run this zero-dependency script to verify both mathematical flaws instantly:
Click to expand test_bugs.py
Let me know if you would like me to open a Pull Request with these fixes!
Best regards, Sivaaditya Panuganti
Hi CompVis team,
While auditing the PyTorch architecture in
depthfm/dfm.py, I discovered two practical math/logic bugs that can cause silent NaN propagation and heavily bias the ensemble generation outputs.I have included the exact line numbers, the fixes, and a zero-dependency pure-Python reproducer script below.
1. ZeroDivisionError / NaN propagation in
per_sample_min_max_normalizationLocation:
depthfm/dfm.pyline 156If an image is passed to the model that results in a perfectly uniform depth prediction (e.g., a featureless sky, a solid color, or a temporary model collapse during fine-tuning),
max_valwill equalmin_val. The denominator(max_val - min_val)becomes0, resulting in a tensor flooded withNaNs that fails silently and crashes downstream operations.Fix: Add a small epsilon guard to the division.