Hi, thanks for the great work!
When selecting features for each object instance, the original 3D points are projected to 640×480 image coordinates, but the processor resizes and center-crops the RGB image to 224×224 before DINOv2 feature extraction (resulting in a 16×16 grid).
Could this cause misalignment when directly using the original (x, y) coordinates to select instance features? Should the same resize and crop be applied to (x, y) before mapping to patch indices?