This repository contains a navigation model that predicts short waypoint trajectories from multimodal observations. The main model used here is ViLiNT, a diffusion-based policy conditioned on image context, 3D observations, robot state, and a goal direction.
For environment and installation instructions, see SETUP.md.
ViLiNT takes as input:
- a short history of RGB images,
- a 3D observation stream,
- a robot's embodiment vector,
- a goal direction in the robot frame.
It outputs a short trajectory of future position waypoints that can then be converted into velocity commands by a PD controller.
- Image encoder:
DUNEViT. - 3D encoder: point cloud encoder
PointTransformerV3.
The 3D input is first encoded by the point-cloud backbone. Then it is converted into a small set of LiDAR tokens with a tokenizer that groups the scene into angular sectors and radial rings. These 3D tokens are concatenated with the image-history tokens, physics token, and goal token before multimodal fusion with the transformer.
The main training entry point is:
cd train
python train.py --config config/vilint.yamlWith Slurm:
cd train
sbatch train.shThe main file to edit is:
train/config/vilint.yaml
The training loader in train/mnt_train/data/vilint_dataset.py does not train directly from loose jpg / npy files. The final per-trajectory format is archive-based:
images.tarfor RGB framespoints.zarrfor stacked point cloudstraj_data.pklor pose arrays undertrajectory.zarr/...for robot poseswidth_curve.zarrfor the clearance distance ground truth.
The train/test split folders referenced in train/config/vilint.yaml should point to splits whose trajectory names match these archived trajectory folders.
The ROS2 deployment entry point is:
cd deployment/src
python3 deploy_vilint.py --model vilint --imgwaypointsThe full tmux launcher is:
cd deployment/src
bash deploy_vilint.shThis starts:
- the ViLiNT inference node,
- the PD waypoint controller,
- RViz,
- a rosbag recorder.
-
deployment/config/models.yaml- set
ckpt_pathto the chosen checkpoint to deploy, - enable or disable
mask_image,mask_lidar, and heuristics.
- set
-
deployment/config/robot.yaml- set robot velocity limits,
- set control topic names,
- adjust robot dimensions if needed.
-
deployment/src/topics_names.py- set the subscribed ROS topics:
IMAGE_TOPICLIDAR_TOPICODOM_TOPICGOAL_TOPIC
- set the subscribed ROS topics:
deploy_vilint.py reads the trained model checkpoint, subscribes to image / LiDAR / odometry / goal topics, predicts waypoints, and publishes them on the waypoint topic. pd_controller.py then converts the waypoint stream into velocity commands.
A docker image can be found in test/docker. It allows the user to deploy the model alongside the IsaacSim simulation running with ROS2. It is made for x86 architectures equiped with NVIDIA gpus.
NB: The image build is computationnaly intensive because of the flash-attention building process.
There are two extraction scripts:
- ROS1 bags:
train/mnt_train/process_data/process_bags.py - ROS2 bags:
train/mnt_train/process_data/process_bags_ros2.py
These scripts first extract each trajectory into a folder of raw files such as:
traj_data.pkl0.jpg,1.jpg, ...0.npy,1.npy, ...
Then generate the width curves ground truth with:
cd train/mnt_train/process_data
python process_lidar_collision.py /path/to/dataset/datas/trajectories -d dataset_nameAfter that, you should build the archive format used by ViLiNT_Dataset with:
cd train/mnt_train/process_data
python build_archives.py --root /path/to/dataset/datas/trajectories --overwritebuild_archives.py creates, inside each trajectory folder:
images.tarpoints.zarraligned_indices.txtbuild_summary.json
and converts width_curve.npy to width_curve.zarr when present.
cd train/mnt_train/process_data
python process_bags_ros2.py \
--dataset-name husky \
--input-dir /path/to/ros2_bags \
--output-dir /path/to/processed_dataset \
--sample-rate 4.0cd train/mnt_train/process_data
python process_bags.py \
--dataset-name husky \
--input-dir /path/to/ros1_bags \
--output-dir /path/to/processed_dataset \
--sample-rate 4.0The bag-processing topic selection is configured in:
train/mnt_train/process_data/process_bags_config.yaml
If your camera, LiDAR, or odometry message format is different, add or adapt the processing functions in:
train/mnt_train/process_data/process_data_utils.pyfor ROS1train/mnt_train/process_data/process_data_utils_ros2.pyfor ROS2
This is where image conversion, point-cloud parsing, and odometry-to-(x, y, yaw) conversion are defined.
- Run
process_bags.pyorprocess_bags_ros2.pyto extract raw images, LiDAR frames, andtraj_data.pkl. - Run
build_archives.pyon the generated trajectory folders to createimages.tarandpoints.zarr. - Point the dataset entries in
train/config/vilint.yamlto the resulting dataset root and splits.