ArghyaChatterjee · ArghyaChatterjee · Dec 27, 2024 · Dec 27, 2024 · Jan 10, 2025 · Jan 10, 2025
diff --git a/.gitignore b/.gitignore
@@ -215,3 +215,4 @@ compile_commands.json
 
 *.pt
 /.vscode
+semantic_inference_venv/
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 # semantic_inference
-A ros1 version of semantic inference package.
+A ros1 version of semantic inference package for openset and closed set semantic segmentation.
 
 <div align="center">
    <img src="docs/media/ade20k_segmentation_efficientvit.png"/>
@@ -100,7 +100,7 @@ $ rosbag decompress uHumans2_office_s1_00h.bag
 ```
 If you don't decompress the rosbag before playing it, it will require additional CPU cycles to decompress the data while reading it. The decompression can be CPU-intensive, especially with BZ2, which is known for high compression ratios but slower decompression speeds.s
 
-## Usage
+## Closed set Segmentation Usage
 
 More details about including these 2 sets in bigger project can be found in the [closed-set](docs/closed_set.md#using-closed-set-segmentation-online) and [open-set](docs/open_set.md#using-open-set-segmentation-online) documentation.
 
@@ -186,5 +186,251 @@ List of ros topics `/nodelet_manager` node is publishing:
  * /semantic_inference/semantic_overlay/image_raw/theora/parameter_updates [dynamic_reconfigure/Config]
 ```
 
+## Modify the default pertrained models
+Create a virtual environment with python-3.10 and don't upgrade the pip version. We have to install the tensorrt python version 8.6.1.
 
+```bash
+cd ~/sematic_overlay_ws/src/semantic_inference/
+python3.10 -m venv semantic_inference_venv
+source semantic_inference_venv/bin/activate
+pip3 install onnx onnxruntime tensorrt==8.6.1
+cd semantic_inference_ros/scripts/
+```
+
+### Change the input shape of the pretrained models
+#### From .onnx to .onnx
+The default trained models support an input size of `1x3x512x512` and output tensor size to be `1x512x512`. If you have image streams with `1920x1080` dimension, you have to change the model input of the default trained model to `1x3x1080x1920` like this:
+```bash
+python3 change_input_dimension_of_onnx.py
+```
+
+### Change the input and output shape of pretrained models 
+#### From .onnx to .onnx 
+The default trained models support an input size of `1x3x512x512` and output tensor size to be `1x512x512`. If you have image streams with `1920x1080` dimension, you have to change the model input of the default trained model to `1x3x1080x1920` and if you also want an output to be the same dimension `1x1080x1920` like this:
+```bash
+python3 change_input_output_dimension_of_onnx.py
+```
+N.B: Normally, you don't need to change the output dimension of the onnx file. The segmentation node will still publish the semantically segmented image streams at the input dimension which is `1920x1080`. 
+
+#### From .onnx to .trt
+If you want to convert an onnx model to a tensorrt model `offline` without going through the default pipeline, use:
+```bash
+python3 convert_and_change_input_output_dimension_of_tensorrt.py
+```
+If you want to keep the parameters as it is, then use:
+```bash
+trtexec --onnx=updated_model.onnx --saveEngine=updated_model.trt --explicitBatch
+```
+If you want to change inside the `semantic inference` code during converting and generating the onnx to trt file online, change these dimensions:
+```
+ profile->setDimensions(name, OptProfileSelector::kMIN, replaceDynamic(dims, 100));
+    profile->setDimensions(name, OptProfileSelector::kOPT, replaceDynamic(dims, 500));
+    profile->setDimensions(name, OptProfileSelector::kMAX, replaceDynamic(dims, 800));
+    config->addOptimizationProfile(profile);
+```
+inside `trt_utilities.cpp` file. These are minimum, optimum and maximum input dimensions for the tensorrt model file which will be generated during this conversion process. For a segmentation model that typically processes images of size 1920x1080 but occasionally processes smaller or larger images, you might define:
+
+```
+profile->setDimensions(name, OptProfileSelector::kMIN, replaceDynamic(dims, 256)); // Smallest valid size
+profile->setDimensions(name, OptProfileSelector::kOPT, replaceDynamic(dims, 1920)); // Most common size
+profile->setDimensions(name, OptProfileSelector::kMAX, replaceDynamic(dims, 2048)); // Largest valid size
+```
+This ensures the engine supports inputs in the range `[256x256]` to `[2048x2048]` while being optimized for 1920x1080.
+
+### Pretrained Models Input Layout:
+
+For an image input of size 1920×1080, the layout `[1 x 3 x 1080 x 1920]` represents a single RGB image with 3 channels (color channels: Red, Green, Blue), 1080 rows (height), 1920 columns (width), and the pixel values stored as 32-bit floating point numbers with `1` as the batch size. This format follows the NCHW convention: N = Batch Size, C = Number of Channels, H = Height, W = Width.
+
+### Pretrained Models Output Layout:
+
+For a tensor output of the layout `[1 x 512 x 512]` (INT32) represents `1` as the batch size, and `512 x 512` as the spatial dimensions of the output tensor (the segmentation map). (INT32) is the output tensor values stored as 32-bit integers used for segmentation labels.
+
+
+### Pretrained models Input and Output Image Tensor Size:
+Try to read `.trt` file and print the input shape of the model using tensorrt:
+```bash
+python3 input_shape_of_image_as_tensorrt.py
+```
+The output is:
+```
+Input Name: input
+Input Shape: (1, 3, -1, -1)
+```
+The first dimension (1) is the batch size, the second dimension (3) is the number of color channels (RGB), and the third and fourth dimensions (-1, -1) represent the dynamic height and width of the input image, which need to be set when you run inference.
+
+```bash
+python3 input_shape_of_image_tensor_as_onnx.py
+```
+The output is:
+```
+Input Name: input
+Input Shape: [1, 3, 0, 0]
+```
+In both the cases, the output height and width are dynamic which are set by the user during the run.
+
+## Open set Segmentation Usage
+More details about including these 2 sets in bigger project can be found in the [closed-set](docs/closed_set.md#using-closed-set-segmentation-online) and [open-set](docs/open_set.md#using-open-set-segmentation-online) documentation.
+
+Now, launch the semantic segmentation inference node in the following way:
+```
+roslaunch semantic_inference_ros semantic_inference.launch
+```
+Since in the launch file, the subscribed topic is already remapped from `/semantic_inference/color/image_raw` to `/tesse/left_cam/rgb/image_raw`. Play the rosbag:
+```
+rosbag play ~/uHumans2_office_s1_00h.bag --clock 
+```
+
+If not, here is how you can play the rosbag while remapping:
+```
+rosbag play ~/uHumans2_office_s1_00h.bag --clock /tesse/left_cam/rgb/image_raw:=/semantic_inference/color/image_raw
+```
+
+List of ros topics being published:
+```
+/semantic_inference/semantic/image_raw
+...
+/semantic_inference/semantic_color/image_raw
+...
+/semantic_inference/semantic_overlay/image_raw
+...
+```
+List of ros topics being subscribed:
+``` 
+/semantic_inference/color/image_raw
+```
 
+List of ros nodes being published:
+```
+/nodelet_manager
+/rosout
+/semantic_inference
+```
+
+List of ros topics `/semantic_inference` node is publishing:
+```
+ * /rosout [rosgraph_msgs/Log]
+```
+List of ros topics `/semantic_inference` node is subscribing:
+```
+Subscriptions: None
+```
+List of ros topics `/nodelet_manager` node is subscribing:
+``` 
+ * /semantic_inference/color/image_raw [unknown type]
+```
+List of ros topics `/nodelet_manager` node is publishing:
+```
+ * /rosout [rosgraph_msgs/Log]
+ * /semantic_inference/semantic/image_raw [sensor_msgs/Image]
+ * /semantic_inference/semantic/image_raw/compressed [sensor_msgs/CompressedImage]
+ * /semantic_inference/semantic/image_raw/compressed/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
+ * /semantic_inference/semantic/image_raw/compressed/parameter_updates [dynamic_reconfigure/Config]
+ * /semantic_inference/semantic/image_raw/compressedDepth [sensor_msgs/CompressedImage]
+ * /semantic_inference/semantic/image_raw/compressedDepth/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
+ * /semantic_inference/semantic/image_raw/compressedDepth/parameter_updates [dynamic_reconfigure/Config]
+ * /semantic_inference/semantic/image_raw/theora [theora_image_transport/Packet]
+ * /semantic_inference/semantic/image_raw/theora/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
+ * /semantic_inference/semantic/image_raw/theora/parameter_updates [dynamic_reconfigure/Config]
+ * /semantic_inference/semantic_color/image_raw [sensor_msgs/Image]
+ * /semantic_inference/semantic_color/image_raw/compressed [sensor_msgs/CompressedImage]
+ * /semantic_inference/semantic_color/image_raw/compressed/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
+ * /semantic_inference/semantic_color/image_raw/compressed/parameter_updates [dynamic_reconfigure/Config]
+ * /semantic_inference/semantic_color/image_raw/compressedDepth [sensor_msgs/CompressedImage]
+ * /semantic_inference/semantic_color/image_raw/compressedDepth/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
+ * /semantic_inference/semantic_color/image_raw/compressedDepth/parameter_updates [dynamic_reconfigure/Config]
+ * /semantic_inference/semantic_color/image_raw/theora [theora_image_transport/Packet]
+ * /semantic_inference/semantic_color/image_raw/theora/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
+ * /semantic_inference/semantic_color/image_raw/theora/parameter_updates [dynamic_reconfigure/Config]
+ * /semantic_inference/semantic_overlay/image_raw [sensor_msgs/Image]
+ * /semantic_inference/semantic_overlay/image_raw/compressed [sensor_msgs/CompressedImage]
+ * /semantic_inference/semantic_overlay/image_raw/compressed/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
+ * /semantic_inference/semantic_overlay/image_raw/compressed/parameter_updates [dynamic_reconfigure/Config]
+ * /semantic_inference/semantic_overlay/image_raw/compressedDepth [sensor_msgs/CompressedImage]
+ * /semantic_inference/semantic_overlay/image_raw/compressedDepth/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
+ * /semantic_inference/semantic_overlay/image_raw/compressedDepth/parameter_updates [dynamic_reconfigure/Config]
+ * /semantic_inference/semantic_overlay/image_raw/theora [theora_image_transport/Packet]
+ * /semantic_inference/semantic_overlay/image_raw/theora/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
+ * /semantic_inference/semantic_overlay/image_raw/theora/parameter_updates [dynamic_reconfigure/Config]
+```
+
+## Modify the default pertrained models
+Create a virtual environment with python-3.10 and don't upgrade the pip version. We have to install the tensorrt python version 8.6.1.
+
+```bash
+cd ~/sematic_overlay_ws/src/semantic_inference/
+python3.10 -m venv semantic_inference_venv
+source semantic_inference_venv/bin/activate
+pip3 install onnx onnxruntime tensorrt==8.6.1
+cd semantic_inference_ros/scripts/
+```
+
+### Change the input shape of the pretrained models
+#### From .onnx to .onnx
+The default trained models support an input size of `1x3x512x512` and output tensor size to be `1x512x512`. If you have image streams with `1920x1080` dimension, you have to change the model input of the default trained model to `1x3x1080x1920` like this:
+```bash
+python3 change_input_dimension_of_onnx.py
+```
+
+### Change the input and output shape of pretrained models 
+#### From .onnx to .onnx 
+The default trained models support an input size of `1x3x512x512` and output tensor size to be `1x512x512`. If you have image streams with `1920x1080` dimension, you have to change the model input of the default trained model to `1x3x1080x1920` and if you also want an output to be the same dimension `1x1080x1920` like this:
+```bash
+python3 change_input_output_dimension_of_onnx.py
+```
+N.B: Normally, you don't need to change the output dimension of the onnx file. The segmentation node will still publish the semantically segmented image streams at the input dimension which is `1920x1080`. 
+
+#### From .onnx to .trt
+If you want to convert an onnx model to a tensorrt model `offline` without going through the default pipeline, use:
+```bash
+python3 convert_and_change_input_output_dimension_of_tensorrt.py
+```
+If you want to keep the parameters as it is, then use:
+```bash
+trtexec --onnx=updated_model.onnx --saveEngine=updated_model.trt --explicitBatch
+```
+If you want to change inside the `semantic inference` code during converting and generating the onnx to trt file online, change these dimensions:
+```
+ profile->setDimensions(name, OptProfileSelector::kMIN, replaceDynamic(dims, 100));
+    profile->setDimensions(name, OptProfileSelector::kOPT, replaceDynamic(dims, 500));
+    profile->setDimensions(name, OptProfileSelector::kMAX, replaceDynamic(dims, 800));
+    config->addOptimizationProfile(profile);
+```
+inside `trt_utilities.cpp` file. These are minimum, optimum and maximum input dimensions for the tensorrt model file which will be generated during this conversion process. For a segmentation model that typically processes images of size 1920x1080 but occasionally processes smaller or larger images, you might define:
+
+```
+profile->setDimensions(name, OptProfileSelector::kMIN, replaceDynamic(dims, 256)); // Smallest valid size
+profile->setDimensions(name, OptProfileSelector::kOPT, replaceDynamic(dims, 1920)); // Most common size
+profile->setDimensions(name, OptProfileSelector::kMAX, replaceDynamic(dims, 2048)); // Largest valid size
+```
+This ensures the engine supports inputs in the range `[256x256]` to `[2048x2048]` while being optimized for 1920x1080.
+
+### Pretrained Models Input Layout:
+
+For an image input of size 1920×1080, the layout `[1 x 3 x 1080 x 1920]` represents a single RGB image with 3 channels (color channels: Red, Green, Blue), 1080 rows (height), 1920 columns (width), and the pixel values stored as 32-bit floating point numbers with `1` as the batch size. This format follows the NCHW convention: N = Batch Size, C = Number of Channels, H = Height, W = Width.
+
+### Pretrained Models Output Layout:
+
+For a tensor output of the layout `[1 x 512 x 512]` (INT32) represents `1` as the batch size, and `512 x 512` as the spatial dimensions of the output tensor (the segmentation map). (INT32) is the output tensor values stored as 32-bit integers used for segmentation labels.
+
+
+### Pretrained models Input and Output Image Tensor Size:
+Try to read `.trt` file and print the input shape of the model using tensorrt:
+```bash
+python3 input_shape_of_image_as_tensorrt.py
+```
+The output is:
+```
+Input Name: input
+Input Shape: (1, 3, -1, -1)
+```
+The first dimension (1) is the batch size, the second dimension (3) is the number of color channels (RGB), and the third and fourth dimensions (-1, -1) represent the dynamic height and width of the input image, which need to be set when you run inference.
+
+```bash
+python3 input_shape_of_image_tensor_as_onnx.py
+```
+The output is:
+```
+Input Name: input
+Input Shape: [1, 3, 0, 0]
+```
+In both the cases, the output height and width are dynamic which are set by the user during the run.
diff --git a/exporting/export_efficientvit.py b/exporting/export_efficientvit.py
@@ -57,6 +57,7 @@ def __init__(self, weight_path, name="l2", dataset="ade20k"):
     def forward(self, img):
         """Run inference."""
         img = F.interpolate(img, size=(512, 512), mode="bilinear")
+        # img = F.interpolate(img, size=(1080, 1920), mode="bilinear")
         ret = self.model(img)
         ret = F.interpolate(ret, size=(img.shape[2], img.shape[3]), mode="bilinear")
         return torch.argmax(ret, dim=1)

diff --git a/semantic_inference/config/colors/ade20k_indoor.csv b/semantic_inference/config/colors/ade20k_indoor.csv
@@ -0,0 +1,22 @@
+name,red,green,blue,alpha,id
+Unknown,80,252,17,255,0
+Wall,0,102,51,255,1
+Floor,191,191,181,255,2
+Ceiling,127,255,127,255,3
+Door,255,255,0,255,4
+Stairs,4,86,162,255,5
+Structure,109,124,135,255,6
+Shelf,0,255,255,255,7
+Plant,0,0,127,255,8
+Bed,151,69,237,255,9
+Storage,247,105,134,255,10
+Table,204,0,102,255,11
+Chair,255,255,127,255,12
+Wall_Decoration,0,127,255,255,13
+Couch,204,0,0,255,14
+Light,10,209,135,255,15
+Appliance,255,0,255,255,16
+Thing,182,133,15,255,17
+Deformable,0,0,255,255,18
+Dynamic_NonHuman,110,61,1,255,19
+Human,127,255,255,255,20
diff --git a/semantic_inference/config/colors/ade20k_mit.csv b/semantic_inference/config/colors/ade20k_mit.csv
@@ -0,0 +1,52 @@
+name,red,green,blue,alpha,id
+unknown,115,22,236,255,0
+sky,135,206,235,255,1
+tree,0,108,0,255,2
+water,0,0,225,255,3
+ground,139,69,19,255,4
+grass,0,127,255,255,5
+sand,146,197,157,255,6
+sidewalk,220,220,220,255,7
+dock,248,62,35,255,8
+road,100,100,100,255,9
+path,201,182,90,255,10
+vehicle,0,255,255,255,11
+building,186,85,211,255,12
+shelter,0,73,153,255,13
+signal,40,61,249,255,14
+rock,0,0,0,255,15
+fence,39,196,191,255,16
+boat,0,255,127,255,17
+sign,37,49,65,255,18
+hill,225,225,0,255,19
+bridge,48,135,55,255,20
+wall,75,0,130,255,21
+floor,128,128,128,255,22
+ceiling,49,131,181,255,23
+door,179,137,19,255,24
+stairs,73,168,252,255,25
+pole,127,127,255,255,26
+rail,255,127,0,255,27
+structure,113,179,22,255,28
+window,255,120,120,255,29
+surface,225,0,225,255,30
+flora,0,255,0,255,31
+flower,198,157,180,255,32
+bed,244,194,148,255,33
+box,0,161,110,255,34
+storage,68,253,46,255,35
+barrel,202,17,112,255,36
+bag,127,0,0,255,37
+basket,255,127,255,255,38
+seating,225,0,0,255,39
+flag,76,249,211,255,40
+decor,255,255,127,255,41
+light,115,61,165,255,42
+appliance,127,255,0,255,43
+trash,76,209,99,255,44
+bicycle,121,20,73,255,45
+food,244,68,252,255,46
+clothes,12,179,12,255,47
+thing,153,253,112,255,48
+animal,189,83,80,255,49
+human,251,59,163,255,50
diff --git a/..._inference/config/distinct_150_colors.csv → ...c_inference/config/colors/ade20k_mp3d.csv b/..._inference/config/distinct_150_colors.csv → ...c_inference/config/colors/ade20k_mp3d.csv
diff --git a/semantic_inference/config/colors/ade20k_outdoor.csv b/semantic_inference/config/colors/ade20k_outdoor.csv
@@ -0,0 +1,38 @@
+name,red,green,blue,alpha,id
+Unknown,10,45,76,255,0
+Sky,135,206,235,255,1
+Tree,0,108,0,255,2
+Water,0,0,225,255,3
+Ground,139,69,19,255,4
+Sidewalk,220,220,220,255,5
+Road,100,100,100,255,6
+Vehicle,0,255,255,255,7
+Building,186,85,211,255,8
+Signal,209,151,185,255,9
+Rock,0,0,0,255,10
+Fence,182,175,79,255,11
+Boat,44,146,171,255,12
+Sign,0,127,255,255,13
+Slope,225,225,0,255,14
+Bridge,76,254,194,255,15
+Wall,75,0,130,255,16
+Floor,128,128,128,255,17
+Ceiling,107,127,251,255,18
+Door,118,174,7,255,19
+Stairs,149,251,101,255,20
+Structure,254,205,114,255,21
+Window,255,120,120,255,22
+Shelf,119,12,236,255,23
+Plant,0,255,0,255,24
+Bed,255,127,0,255,25
+Storage,186,30,114,255,26
+Table,225,0,225,255,27
+Chair,45,194,230,255,28
+Wall_Decoration,11,74,175,255,29
+Couch,225,0,0,255,30
+Light,127,0,0,255,31
+Appliance,12,180,15,255,32
+Thing,127,255,0,255,33
+Deformable,21,219,128,255,34
+Dynamic_NonHuman,255,127,255,255,35
+Human,3,140,92,255,36
Original file line number	Diff line number	Diff line change
Expand Up		@@ -215,3 +215,4 @@ compile_commands.json

		*.pt
		/.vscode
		semantic_inference_venv/