Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -215,3 +215,4 @@ compile_commands.json

*.pt
/.vscode
semantic_inference_venv/
250 changes: 248 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# semantic_inference
A ros1 version of semantic inference package.
A ros1 version of semantic inference package for openset and closed set semantic segmentation.

<div align="center">
<img src="docs/media/ade20k_segmentation_efficientvit.png"/>
Expand Down Expand Up @@ -100,7 +100,7 @@ $ rosbag decompress uHumans2_office_s1_00h.bag
```
If you don't decompress the rosbag before playing it, it will require additional CPU cycles to decompress the data while reading it. The decompression can be CPU-intensive, especially with BZ2, which is known for high compression ratios but slower decompression speeds.s

## Usage
## Closed set Segmentation Usage

More details about including these 2 sets in bigger project can be found in the [closed-set](docs/closed_set.md#using-closed-set-segmentation-online) and [open-set](docs/open_set.md#using-open-set-segmentation-online) documentation.

Expand Down Expand Up @@ -186,5 +186,251 @@ List of ros topics `/nodelet_manager` node is publishing:
* /semantic_inference/semantic_overlay/image_raw/theora/parameter_updates [dynamic_reconfigure/Config]
```

## Modify the default pertrained models
Create a virtual environment with python-3.10 and don't upgrade the pip version. We have to install the tensorrt python version 8.6.1.

```bash
cd ~/sematic_overlay_ws/src/semantic_inference/
python3.10 -m venv semantic_inference_venv
source semantic_inference_venv/bin/activate
pip3 install onnx onnxruntime tensorrt==8.6.1
cd semantic_inference_ros/scripts/
```

### Change the input shape of the pretrained models
#### From .onnx to .onnx
The default trained models support an input size of `1x3x512x512` and output tensor size to be `1x512x512`. If you have image streams with `1920x1080` dimension, you have to change the model input of the default trained model to `1x3x1080x1920` like this:
```bash
python3 change_input_dimension_of_onnx.py
```

### Change the input and output shape of pretrained models
#### From .onnx to .onnx
The default trained models support an input size of `1x3x512x512` and output tensor size to be `1x512x512`. If you have image streams with `1920x1080` dimension, you have to change the model input of the default trained model to `1x3x1080x1920` and if you also want an output to be the same dimension `1x1080x1920` like this:
```bash
python3 change_input_output_dimension_of_onnx.py
```
N.B: Normally, you don't need to change the output dimension of the onnx file. The segmentation node will still publish the semantically segmented image streams at the input dimension which is `1920x1080`.

#### From .onnx to .trt
If you want to convert an onnx model to a tensorrt model `offline` without going through the default pipeline, use:
```bash
python3 convert_and_change_input_output_dimension_of_tensorrt.py
```
If you want to keep the parameters as it is, then use:
```bash
trtexec --onnx=updated_model.onnx --saveEngine=updated_model.trt --explicitBatch
```
If you want to change inside the `semantic inference` code during converting and generating the onnx to trt file online, change these dimensions:
```
profile->setDimensions(name, OptProfileSelector::kMIN, replaceDynamic(dims, 100));
profile->setDimensions(name, OptProfileSelector::kOPT, replaceDynamic(dims, 500));
profile->setDimensions(name, OptProfileSelector::kMAX, replaceDynamic(dims, 800));
config->addOptimizationProfile(profile);
```
inside `trt_utilities.cpp` file. These are minimum, optimum and maximum input dimensions for the tensorrt model file which will be generated during this conversion process. For a segmentation model that typically processes images of size 1920x1080 but occasionally processes smaller or larger images, you might define:

```
profile->setDimensions(name, OptProfileSelector::kMIN, replaceDynamic(dims, 256)); // Smallest valid size
profile->setDimensions(name, OptProfileSelector::kOPT, replaceDynamic(dims, 1920)); // Most common size
profile->setDimensions(name, OptProfileSelector::kMAX, replaceDynamic(dims, 2048)); // Largest valid size
```
This ensures the engine supports inputs in the range `[256x256]` to `[2048x2048]` while being optimized for 1920x1080.

### Pretrained Models Input Layout:

For an image input of size 1920×1080, the layout `[1 x 3 x 1080 x 1920]` represents a single RGB image with 3 channels (color channels: Red, Green, Blue), 1080 rows (height), 1920 columns (width), and the pixel values stored as 32-bit floating point numbers with `1` as the batch size. This format follows the NCHW convention: N = Batch Size, C = Number of Channels, H = Height, W = Width.

### Pretrained Models Output Layout:

For a tensor output of the layout `[1 x 512 x 512]` (INT32) represents `1` as the batch size, and `512 x 512` as the spatial dimensions of the output tensor (the segmentation map). (INT32) is the output tensor values stored as 32-bit integers used for segmentation labels.


### Pretrained models Input and Output Image Tensor Size:
Try to read `.trt` file and print the input shape of the model using tensorrt:
```bash
python3 input_shape_of_image_as_tensorrt.py
```
The output is:
```
Input Name: input
Input Shape: (1, 3, -1, -1)
```
The first dimension (1) is the batch size, the second dimension (3) is the number of color channels (RGB), and the third and fourth dimensions (-1, -1) represent the dynamic height and width of the input image, which need to be set when you run inference.

```bash
python3 input_shape_of_image_tensor_as_onnx.py
```
The output is:
```
Input Name: input
Input Shape: [1, 3, 0, 0]
```
In both the cases, the output height and width are dynamic which are set by the user during the run.

## Open set Segmentation Usage
More details about including these 2 sets in bigger project can be found in the [closed-set](docs/closed_set.md#using-closed-set-segmentation-online) and [open-set](docs/open_set.md#using-open-set-segmentation-online) documentation.

Now, launch the semantic segmentation inference node in the following way:
```
roslaunch semantic_inference_ros semantic_inference.launch
```
Since in the launch file, the subscribed topic is already remapped from `/semantic_inference/color/image_raw` to `/tesse/left_cam/rgb/image_raw`. Play the rosbag:
```
rosbag play ~/uHumans2_office_s1_00h.bag --clock
```

If not, here is how you can play the rosbag while remapping:
```
rosbag play ~/uHumans2_office_s1_00h.bag --clock /tesse/left_cam/rgb/image_raw:=/semantic_inference/color/image_raw
```

List of ros topics being published:
```
/semantic_inference/semantic/image_raw
...
/semantic_inference/semantic_color/image_raw
...
/semantic_inference/semantic_overlay/image_raw
...
```
List of ros topics being subscribed:
```
/semantic_inference/color/image_raw
```

List of ros nodes being published:
```
/nodelet_manager
/rosout
/semantic_inference
```

List of ros topics `/semantic_inference` node is publishing:
```
* /rosout [rosgraph_msgs/Log]
```
List of ros topics `/semantic_inference` node is subscribing:
```
Subscriptions: None
```
List of ros topics `/nodelet_manager` node is subscribing:
```
* /semantic_inference/color/image_raw [unknown type]
```
List of ros topics `/nodelet_manager` node is publishing:
```
* /rosout [rosgraph_msgs/Log]
* /semantic_inference/semantic/image_raw [sensor_msgs/Image]
* /semantic_inference/semantic/image_raw/compressed [sensor_msgs/CompressedImage]
* /semantic_inference/semantic/image_raw/compressed/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
* /semantic_inference/semantic/image_raw/compressed/parameter_updates [dynamic_reconfigure/Config]
* /semantic_inference/semantic/image_raw/compressedDepth [sensor_msgs/CompressedImage]
* /semantic_inference/semantic/image_raw/compressedDepth/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
* /semantic_inference/semantic/image_raw/compressedDepth/parameter_updates [dynamic_reconfigure/Config]
* /semantic_inference/semantic/image_raw/theora [theora_image_transport/Packet]
* /semantic_inference/semantic/image_raw/theora/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
* /semantic_inference/semantic/image_raw/theora/parameter_updates [dynamic_reconfigure/Config]
* /semantic_inference/semantic_color/image_raw [sensor_msgs/Image]
* /semantic_inference/semantic_color/image_raw/compressed [sensor_msgs/CompressedImage]
* /semantic_inference/semantic_color/image_raw/compressed/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
* /semantic_inference/semantic_color/image_raw/compressed/parameter_updates [dynamic_reconfigure/Config]
* /semantic_inference/semantic_color/image_raw/compressedDepth [sensor_msgs/CompressedImage]
* /semantic_inference/semantic_color/image_raw/compressedDepth/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
* /semantic_inference/semantic_color/image_raw/compressedDepth/parameter_updates [dynamic_reconfigure/Config]
* /semantic_inference/semantic_color/image_raw/theora [theora_image_transport/Packet]
* /semantic_inference/semantic_color/image_raw/theora/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
* /semantic_inference/semantic_color/image_raw/theora/parameter_updates [dynamic_reconfigure/Config]
* /semantic_inference/semantic_overlay/image_raw [sensor_msgs/Image]
* /semantic_inference/semantic_overlay/image_raw/compressed [sensor_msgs/CompressedImage]
* /semantic_inference/semantic_overlay/image_raw/compressed/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
* /semantic_inference/semantic_overlay/image_raw/compressed/parameter_updates [dynamic_reconfigure/Config]
* /semantic_inference/semantic_overlay/image_raw/compressedDepth [sensor_msgs/CompressedImage]
* /semantic_inference/semantic_overlay/image_raw/compressedDepth/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
* /semantic_inference/semantic_overlay/image_raw/compressedDepth/parameter_updates [dynamic_reconfigure/Config]
* /semantic_inference/semantic_overlay/image_raw/theora [theora_image_transport/Packet]
* /semantic_inference/semantic_overlay/image_raw/theora/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
* /semantic_inference/semantic_overlay/image_raw/theora/parameter_updates [dynamic_reconfigure/Config]
```

## Modify the default pertrained models
Create a virtual environment with python-3.10 and don't upgrade the pip version. We have to install the tensorrt python version 8.6.1.

```bash
cd ~/sematic_overlay_ws/src/semantic_inference/
python3.10 -m venv semantic_inference_venv
source semantic_inference_venv/bin/activate
pip3 install onnx onnxruntime tensorrt==8.6.1
cd semantic_inference_ros/scripts/
```

### Change the input shape of the pretrained models
#### From .onnx to .onnx
The default trained models support an input size of `1x3x512x512` and output tensor size to be `1x512x512`. If you have image streams with `1920x1080` dimension, you have to change the model input of the default trained model to `1x3x1080x1920` like this:
```bash
python3 change_input_dimension_of_onnx.py
```

### Change the input and output shape of pretrained models
#### From .onnx to .onnx
The default trained models support an input size of `1x3x512x512` and output tensor size to be `1x512x512`. If you have image streams with `1920x1080` dimension, you have to change the model input of the default trained model to `1x3x1080x1920` and if you also want an output to be the same dimension `1x1080x1920` like this:
```bash
python3 change_input_output_dimension_of_onnx.py
```
N.B: Normally, you don't need to change the output dimension of the onnx file. The segmentation node will still publish the semantically segmented image streams at the input dimension which is `1920x1080`.

#### From .onnx to .trt
If you want to convert an onnx model to a tensorrt model `offline` without going through the default pipeline, use:
```bash
python3 convert_and_change_input_output_dimension_of_tensorrt.py
```
If you want to keep the parameters as it is, then use:
```bash
trtexec --onnx=updated_model.onnx --saveEngine=updated_model.trt --explicitBatch
```
If you want to change inside the `semantic inference` code during converting and generating the onnx to trt file online, change these dimensions:
```
profile->setDimensions(name, OptProfileSelector::kMIN, replaceDynamic(dims, 100));
profile->setDimensions(name, OptProfileSelector::kOPT, replaceDynamic(dims, 500));
profile->setDimensions(name, OptProfileSelector::kMAX, replaceDynamic(dims, 800));
config->addOptimizationProfile(profile);
```
inside `trt_utilities.cpp` file. These are minimum, optimum and maximum input dimensions for the tensorrt model file which will be generated during this conversion process. For a segmentation model that typically processes images of size 1920x1080 but occasionally processes smaller or larger images, you might define:

```
profile->setDimensions(name, OptProfileSelector::kMIN, replaceDynamic(dims, 256)); // Smallest valid size
profile->setDimensions(name, OptProfileSelector::kOPT, replaceDynamic(dims, 1920)); // Most common size
profile->setDimensions(name, OptProfileSelector::kMAX, replaceDynamic(dims, 2048)); // Largest valid size
```
This ensures the engine supports inputs in the range `[256x256]` to `[2048x2048]` while being optimized for 1920x1080.

### Pretrained Models Input Layout:

For an image input of size 1920×1080, the layout `[1 x 3 x 1080 x 1920]` represents a single RGB image with 3 channels (color channels: Red, Green, Blue), 1080 rows (height), 1920 columns (width), and the pixel values stored as 32-bit floating point numbers with `1` as the batch size. This format follows the NCHW convention: N = Batch Size, C = Number of Channels, H = Height, W = Width.

### Pretrained Models Output Layout:

For a tensor output of the layout `[1 x 512 x 512]` (INT32) represents `1` as the batch size, and `512 x 512` as the spatial dimensions of the output tensor (the segmentation map). (INT32) is the output tensor values stored as 32-bit integers used for segmentation labels.


### Pretrained models Input and Output Image Tensor Size:
Try to read `.trt` file and print the input shape of the model using tensorrt:
```bash
python3 input_shape_of_image_as_tensorrt.py
```
The output is:
```
Input Name: input
Input Shape: (1, 3, -1, -1)
```
The first dimension (1) is the batch size, the second dimension (3) is the number of color channels (RGB), and the third and fourth dimensions (-1, -1) represent the dynamic height and width of the input image, which need to be set when you run inference.

```bash
python3 input_shape_of_image_tensor_as_onnx.py
```
The output is:
```
Input Name: input
Input Shape: [1, 3, 0, 0]
```
In both the cases, the output height and width are dynamic which are set by the user during the run.
1 change: 1 addition & 0 deletions exporting/export_efficientvit.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ def __init__(self, weight_path, name="l2", dataset="ade20k"):
def forward(self, img):
"""Run inference."""
img = F.interpolate(img, size=(512, 512), mode="bilinear")
# img = F.interpolate(img, size=(1080, 1920), mode="bilinear")
ret = self.model(img)
ret = F.interpolate(ret, size=(img.shape[2], img.shape[3]), mode="bilinear")
return torch.argmax(ret, dim=1)
Expand Down
22 changes: 22 additions & 0 deletions semantic_inference/config/colors/ade20k_indoor.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name,red,green,blue,alpha,id
Unknown,80,252,17,255,0
Wall,0,102,51,255,1
Floor,191,191,181,255,2
Ceiling,127,255,127,255,3
Door,255,255,0,255,4
Stairs,4,86,162,255,5
Structure,109,124,135,255,6
Shelf,0,255,255,255,7
Plant,0,0,127,255,8
Bed,151,69,237,255,9
Storage,247,105,134,255,10
Table,204,0,102,255,11
Chair,255,255,127,255,12
Wall_Decoration,0,127,255,255,13
Couch,204,0,0,255,14
Light,10,209,135,255,15
Appliance,255,0,255,255,16
Thing,182,133,15,255,17
Deformable,0,0,255,255,18
Dynamic_NonHuman,110,61,1,255,19
Human,127,255,255,255,20
52 changes: 52 additions & 0 deletions semantic_inference/config/colors/ade20k_mit.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name,red,green,blue,alpha,id
unknown,115,22,236,255,0
sky,135,206,235,255,1
tree,0,108,0,255,2
water,0,0,225,255,3
ground,139,69,19,255,4
grass,0,127,255,255,5
sand,146,197,157,255,6
sidewalk,220,220,220,255,7
dock,248,62,35,255,8
road,100,100,100,255,9
path,201,182,90,255,10
vehicle,0,255,255,255,11
building,186,85,211,255,12
shelter,0,73,153,255,13
signal,40,61,249,255,14
rock,0,0,0,255,15
fence,39,196,191,255,16
boat,0,255,127,255,17
sign,37,49,65,255,18
hill,225,225,0,255,19
bridge,48,135,55,255,20
wall,75,0,130,255,21
floor,128,128,128,255,22
ceiling,49,131,181,255,23
door,179,137,19,255,24
stairs,73,168,252,255,25
pole,127,127,255,255,26
rail,255,127,0,255,27
structure,113,179,22,255,28
window,255,120,120,255,29
surface,225,0,225,255,30
flora,0,255,0,255,31
flower,198,157,180,255,32
bed,244,194,148,255,33
box,0,161,110,255,34
storage,68,253,46,255,35
barrel,202,17,112,255,36
bag,127,0,0,255,37
basket,255,127,255,255,38
seating,225,0,0,255,39
flag,76,249,211,255,40
decor,255,255,127,255,41
light,115,61,165,255,42
appliance,127,255,0,255,43
trash,76,209,99,255,44
bicycle,121,20,73,255,45
food,244,68,252,255,46
clothes,12,179,12,255,47
thing,153,253,112,255,48
animal,189,83,80,255,49
human,251,59,163,255,50
38 changes: 38 additions & 0 deletions semantic_inference/config/colors/ade20k_outdoor.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name,red,green,blue,alpha,id
Unknown,10,45,76,255,0
Sky,135,206,235,255,1
Tree,0,108,0,255,2
Water,0,0,225,255,3
Ground,139,69,19,255,4
Sidewalk,220,220,220,255,5
Road,100,100,100,255,6
Vehicle,0,255,255,255,7
Building,186,85,211,255,8
Signal,209,151,185,255,9
Rock,0,0,0,255,10
Fence,182,175,79,255,11
Boat,44,146,171,255,12
Sign,0,127,255,255,13
Slope,225,225,0,255,14
Bridge,76,254,194,255,15
Wall,75,0,130,255,16
Floor,128,128,128,255,17
Ceiling,107,127,251,255,18
Door,118,174,7,255,19
Stairs,149,251,101,255,20
Structure,254,205,114,255,21
Window,255,120,120,255,22
Shelf,119,12,236,255,23
Plant,0,255,0,255,24
Bed,255,127,0,255,25
Storage,186,30,114,255,26
Table,225,0,225,255,27
Chair,45,194,230,255,28
Wall_Decoration,11,74,175,255,29
Couch,225,0,0,255,30
Light,127,0,0,255,31
Appliance,12,180,15,255,32
Thing,127,255,0,255,33
Deformable,21,219,128,255,34
Dynamic_NonHuman,255,127,255,255,35
Human,3,140,92,255,36
Loading