Skip to content

IGiotto12/SlotPhys

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SlotPhys: Object-Centric Physical Reasoning via DINOv2 Slot Attention

Unsupervised object discovery, trajectory modelling, and collision prediction from raw CLEVRER video — no annotated masks, no supervised detectors.


Pipeline

Raw CLEVRER Video (480×320, ~128 frames @ 25 fps)
    │
    ├─ [P1 Perception] DINOv2 slot attention (unsupervised)
    │       → K slot centroids (y, x) + velocities (vy, vx) per frame
    │
    ├─ [P2 Dynamics] NRI-style GNN over slot state sequences
    │       → Multi-step trajectory rollout (T_pred steps)
    │
    └─ [Collision Head] Pairwise MLP over predicted trajectories
            → Collision probability per object pair

Experiment matrix:

Condition Perception States GNN Rollout MSE Slot-CV MSE
E1 (oracle) GT 1.567 105.87
E2 (P1-Recon) DINOv2 slots Slot-derived 67.71 50.19
E2+ (P1-Motion) DINOv2 slots + motion loss Slot-derived 40.08 21.24
E3 (P1-JEPA) DINOv2 slots + JEPA Slot-derived 39.64 26.91
P1-SAM2 (reference) SAM2 video seg Slot-derived 8.25 tracking MSE

Data Setup

CLEVRER dataset layout:

data/
  clevrer/
    videos_train/video_00000-01000/video_00000.mp4 ...
    videos_validation/video_10000-11000/video_10000.mp4 ...
    annotations_train/annotation_00000.json ...
    annotations_validation/annotation_10000.json ...
  clevrer_states/val/state_video_XXXXX.pt   # GT states (centers, velocities)

Pre-cache DINOv2 features (speeds up P1 training ~5×):

python scripts/precompute_dino_features.py \
    --video_root data/clevrer/videos_train --out_dir data/clevrer_dino_cache_train

Training

P1 Perception (frame-level slot attention)

python scripts/train_slots_clevrer.py --config configs/p1_frame_slots_feature_v3.yaml

Key config options: motion_weight=3.0 (P1-Motion), num_slots=7, encoder_type=dinov2. Checkpoint → checkpoints/slots_clevrer_p1B_v3/best_slots.pt

Extract P1 slot states

python scripts/prepare_p1_slot_states.py \
    --p1_ckpt checkpoints/slots_clevrer_p1B_v3/best_slots.pt \
    --video_root data/clevrer/videos_validation \
    --output_root data/clevrer_p1B_v3_states_val

P2 GNN Dynamics

# Oracle (GT states)
python scripts/train_gnn_dynamics_clevrer.py --config configs/gnn_dynamics_gt.yaml

# Slot-derived states
python scripts/train_p2_gnn_from_p1.py \
    --p1_states_root data/clevrer_p1B_v3_states_train \
    --out_dir checkpoints/gnn_dynamics_p1B_v3

Collision Head

python scripts/train_collision_head_clevrer.py --config configs/collision_head.yaml

Evaluation

python scripts/evaluate_pipeline.py \
    --slots_ckpt checkpoints/slots_clevrer_p1B_v3/best_slots.pt \
    --p1_gnn_ckpt checkpoints/gnn_dynamics_p1B_v3/best_gnn_dynamics_p1.pt \
    --p1_states_root data/clevrer_p1B_v3_states_val \
    --val_videos_root data/clevrer/videos_validation \
    --val_states_root data/clevrer_states/val \
    --max_videos 200 \
    --output outputs/eval_p1B_v3_final.json

Metrics: recon_mse, tracking_traj_mse (Hungarian-matched, patch units), vel_error, gnn_rollout_mse, constant_vel_baseline, collision P/R/F1.


Repository Structure

models/
  slots.py            - SlotAttentionAutoencoder, SlotAttention, slot_grounding_loss
  video_slots.py      - SAViModel, RecurrentSlotAttention, SlotPredictor
  frame_encoder.py    - FrameObjectEncoder (DINOv2 + slot attention, P1)
  gnn_dynamics.py     - NRIEncoder, NRIDecoder, NRIDynamics
  collision_head.py   - CollisionHead, GeometricCollisionBaseline

metrics/
  slots_clevrer.py    - Centroid extraction, Hungarian matching, traj/vel MSE
  ari.py              - ARI excluding background
  consistency.py      - velocity_error, trajectory_mse, rollout_mse

scripts/
  train_slots_clevrer.py          - P1 frame slot training
  train_video_slots_clevrer.py    - SAVi video slot training
  train_gnn_dynamics_clevrer.py   - GNN dynamics (GT states)
  train_p2_gnn_from_p1.py         - GNN dynamics (P1 slot states)
  train_collision_head_clevrer.py - Collision head training
  prepare_p1_slot_states.py       - Extract P1 states for GNN training
  prepare_p1_sam2_states.py       - SAM2 video segmentation baseline
  precompute_dino_features.py     - Cache DINOv2 features to disk
  evaluate_pipeline.py            - End-to-end evaluation
  eval_sam2_tracking.py           - SAM2 tracking MSE evaluation
  visualize_example.py            - Single-video slot/trajectory visualization

configs/
  p1_frame_slots_feature_v3.yaml  - P1-Motion (motion-weighted, best)
  p1_frame_slots_feature_sc.yaml  - P1-SC (InfoNCE contrastive)
  gnn_dynamics_gt.yaml            - GNN on GT states (E1 oracle)
  gnn_dynamics_slots.yaml         - GNN on slot states (E2/E3)
  collision_head.yaml             - Collision head

data/
  clevrer_config.py   - Resolution constants (CLEVRER_H=320, CROP_H=CROP_W=320)

Requirements

pip install torch torchvision numpy scipy pyyaml pillow opencv-python

DINOv2: downloaded automatically via torch.hub on first run. SAM2: pip install git+https://github.com/facebookresearch/sam2.git

About

Project submission for Cogs181 - Jiaqi Wu

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors