Dear StreamVLN authors,
Thank you very much for your excellent work and open-sourcing the StreamVLN project! It is really helpful for my research on vision-language navigation.
I have a small question about the camera intrinsic parameters defined in http_realworld_server.py (lines 166-169):
"camera_intrinsic": np.array([[192. , 0. , 191.42857143, 0. ],
[ 0. , 192. , 191.42857143, 0. ],
[ 0. , 0. , 1. , 0. ],
[ 0. , 0. , 0. , 1. ]]),
Could you please clarify the origin of these values? Specifically:
1. Are these values based on a 384×384 resolution with 90° HFOV in Habitat?
2. Could you please explain how this camera intrinsic matrix was derived?
3. Could you please clarify whether this camera intrinsic matrix refers to the configuration of the RGB sensor in the Habitat-sim simulator, or the camera intrinsic parameters of the Intel RealSense D400 series camera used for real-world deployment?
Dear StreamVLN authors,
Thank you very much for your excellent work and open-sourcing the StreamVLN project! It is really helpful for my research on vision-language navigation.
I have a small question about the camera intrinsic parameters defined in
http_realworld_server.py(lines 166-169):