A PyTorch implementation of EfficientSegNet for real-time instance segmentation of 3D point clouds with integrated Bayesian uncertainty quantification. Designed for robotic applications requiring both speed and confidence estimates.
- Hierarchical Coarse-to-Fine Architecture: 3.3× speedup over standard learnable region-growing methods
- Bayesian Uncertainty Quantification:
- Epistemic uncertainty (model uncertainty) via Monte Carlo dropout
- Aleatoric uncertainty (data uncertainty) with learned per-point estimates
- Class-Agnostic: Single model segments objects of any class without shape/size assumptions
- Real-Time Performance: ~720ms inference on GPU for 4096-point clouds
- Robot-Ready: Integration with grasp planning and manipulation workflows
- Python 3.8+
- PyTorch 1.9+
- NumPy
- Matplotlib (for visualization)
- tqdm
# Clone or download the implementation
cd efficient_segnet
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install torch numpy matplotlib tqdm
# Verify installation
python -c "import torch; print(f'PyTorch version: {torch.__version__}')"from inference_efficient_segnet import SegmentationInference, PostProcessor
import numpy as np
# Initialize inference engine
segmenter = SegmentationInference('model_checkpoint.pt', device='cuda')
# Load point cloud (N, 3) array
points = np.random.randn(5000, 3) # Replace with real data
# Run segmentation
result = segmenter.process_point_cloud(
points,
return_uncertainty=True,
confidence_threshold=0.5,
)
# Extract instances
instances = PostProcessor.extract_instances(result, points)
print(f"Detected {result.num_instances} instances")
print(f"Inference time: {result.inference_time:.3f}s")from train_efficient_segnet import Trainer, S3DISDataset, PointCloudCollator
from efficient_segnet_model import EfficientSegNet
import torch
from torch.utils.data import DataLoader
# Create model
model = EfficientSegNet(
in_channels=3,
feature_dim=512,
hidden_dim=256,
max_instances=32,
num_mc_samples=5,
)
# Create data loaders
train_dataset = S3DISDataset(split='train', num_points=4096)
val_dataset = S3DISDataset(split='val', num_points=4096)
collator = PointCloudCollator(max_instances=32)
train_loader = DataLoader(train_dataset, batch_size=8, collate_fn=collator, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=8, collate_fn=collator)
# Train
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
trainer = Trainer(model, train_loader, val_loader, device=device, num_epochs=100)
metrics = trainer.train()from inference_efficient_segnet import SegmentationInference, RobotGraspPlanner
from utils_efficient_segnet import PostProcessor
import numpy as np
# Initialize components
segmenter = SegmentationInference('model_checkpoint.pt')
grasp_planner = RobotGraspPlanner(gripper_width=0.1)
# Get point cloud from sensor (e.g., RGB-D camera)
points = get_point_cloud_from_sensor() # Your sensor interface
# Segment point cloud
result = segmenter.process_point_cloud(points, return_uncertainty=True)
# Extract instances
instances = PostProcessor.extract_instances(result, points, min_points=50)
# Plan grasps for each instance
all_grasps = []
for inst_id, inst_points in instances.items():
# Get confidence of this instance
inst_confidence = result.confidence[result.instance_labels == inst_id].mean()
# Skip low-confidence predictions
if inst_confidence < 0.5:
continue
# Plan grasps
grasps = grasp_planner.plan_grasps(inst_id, inst_points, inst_confidence)
all_grasps.extend(grasps)
# Select best grasp with quality threshold
best_grasp = grasp_planner.select_best_grasp(all_grasps, quality_threshold=0.6)
if best_grasp:
# Execute grasp with robot
execute_grasp(best_grasp)from inference_efficient_segnet import SegmentationInference
from utils_efficient_segnet import PointCloudVisualizer, EvaluationMetrics
import numpy as np
# Run inference
segmenter = SegmentationInference('model_checkpoint.pt')
result = segmenter.process_point_cloud(points, return_uncertainty=True)
# Visualize uncertainty
fig = PointCloudVisualizer.visualize_uncertainty(
points,
result.epistemic_uncertainty,
result.aleatoric_uncertainty,
)
plt.show()
# Evaluate uncertainty quality
metrics = EvaluationMetrics.compute_all_metrics(
result.instance_labels,
ground_truth_labels,
result.confidence,
)
print(f"Calibration Error: {metrics['ECE']:.4f}")
print(f"ROC-AUC: {metrics['AUC_ROC']:.4f}")-
Coarse Stage: Downsamples to 20% density via farthest point sampling
- PointNet-style hierarchical feature extraction
- Outputs coarse instance proposals and confidence scores
-
Refinement Stage: Confidence-guided processing
- Only refines high-confidence regions (confidence > 0.7)
- Reduces computation by 3-5× compared to uniform processing
- Iterative refinement using GRU cells
- Adaptive iteration counts (2-8 iterations) based on confidence
- Predicts add/remove probabilities for boundary points
-
Epistemic Uncertainty: Monte Carlo dropout with concrete dropout
- Computed as variance of predictions across T=5 dropout samples
- Captures model uncertainty about predictions
-
Aleatoric Uncertainty: Learned per-point with probabilistic loss
- Predicts data-dependent uncertainty
- Higher for ambiguous points (e.g., boundaries)
- Embedding-based approach for instance discrimination
- Multi-resolution feature fusion
- Supports up to 32 instances per point cloud
model = EfficientSegNet(
in_channels=3, # Point cloud feature dimension
feature_dim=512, # Hidden feature dimension
hidden_dim=256, # GRU hidden dimension
downsample_ratio=0.2, # Coarse stage downsampling
num_mc_samples=5, # MC dropout samples
max_instances=32, # Maximum instances
)
# Forward pass
outputs = model(
points, # (B, N, 3) point coordinates
return_uncertainty=True, # Compute uncertainty
mc_samples=5, # Number of MC samples
)
# Output keys:
# - instance_labels: (B, N) predicted instance IDs
# - confidence: (B, N) per-point confidence scores
# - epistemic_uncertainty: (B, N) model uncertainty
# - aleatoric_uncertainty: (B, N) data uncertainty
# - embeddings: (B, N, 16) learned representationssegmenter = SegmentationInference(
model_path='checkpoint.pt',
device='cuda',
num_mc_samples=5,
)
result = segmenter.process_point_cloud(
points, # (N, 3) numpy array
return_uncertainty=True, # Compute uncertainties
confidence_threshold=0.5, # Filter low-confidence points
max_points=10000, # Downsample if needed
)metrics = EvaluationMetrics.compute_all_metrics(
predictions, # (N,) predicted labels
targets, # (N,) ground truth labels
confidence, # (N,) confidence scores
)
# Returns: accuracy, mAP, mIoU, mPQ, mSQ, mRQ, AUC_ROC, ECEplanner = RobotGraspPlanner(
gripper_width=0.1, # Maximum gripper opening (meters)
max_grasp_quality=1.0,
)
grasps = planner.plan_grasps(
instance_id=1, # Instance to grasp
instance_points=points, # (N, 3) instance point cloud
confidence=0.8, # Segmentation confidence
num_candidates=10, # Number of grasp candidates
)
# Select best grasp
best_grasp = planner.select_best_grasp(grasps, quality_threshold=0.6)- Coarse Stage: 50-100ms (downsampled 20% of points)
- Refinement: 500-600ms (adaptive, confidence-dependent)
- MC Dropout: +30-40ms per sample
- Total: ~720ms per 4096-point cloud on single GPU
| Metric | ESN | Region Growing [4] |
|---|---|---|
| mAP | 63.8% | 64.3% |
| mPQ | 50.7% | 51.2% |
| mSQ | 77.9% | 78.1% |
| mRQ | 65.2% | 65.6% |
- Pick Success Rate: 82.1% (vs 74.2% baseline)
- Task Success Rate: 71.2% (vs 60.3% baseline)
- 18% improvement with uncertainty filtering
{
"model": {
"in_channels": 3,
"feature_dim": 512,
"hidden_dim": 256,
"max_instances": 32,
"num_mc_samples": 5
},
"training": {
"batch_size": 8,
"learning_rate": 0.001,
"weight_decay": 1e-4,
"num_epochs": 100,
"num_points": 4096
},
"inference": {
"confidence_threshold": 0.5,
"num_mc_samples": 5,
"max_points": 10000
}
}points = np.array([
[x1, y1, z1],
[x2, y2, z2],
...
[xN, yN, zN],
]) # Shape: (N, 3), dtype: float32instance_labels = np.array([
0, # Background
1, # Instance 1
1, # Instance 1
2, # Instance 2
...
]) # Shape: (N,), dtype: int32# Reduce batch size
train_loader = DataLoader(..., batch_size=4)
# Reduce number of points
result = segmenter.process_point_cloud(
points,
max_points=5000 # Default is 10000
)
# Disable MC dropout during inference
outputs = model(points, return_uncertainty=False)# Increase number of MC samples
model = EfficientSegNet(num_mc_samples=10)
# Use higher learning rate for aleatoric loss
loss_fn = EfficientSegNetLoss(lambda_aleatoric=1.0)# Use deterministic mode (single forward pass)
outputs = model(points, return_uncertainty=False) # ~500ms
# Reduce point cloud resolution
result = segmenter.process_point_cloud(
points,
max_points=2000,
)
# Use CPU inference for small batches
model = model.to('cpu')efficient_segnet/
├── efficient_segnet_model.py # Core model implementation
├── train_efficient_segnet.py # Training loop and data loading
├── inference_efficient_segnet.py # Inference engine and robot integration
├── utils_efficient_segnet.py # Utilities, metrics, visualization
├── example_usage.py # Complete usage examples
├── README.md # This file
└── requirements.txt # Dependencies
If you use this implementation, please cite the accompanying research paper:
@article{efficientSegNet2026,
title={Real-Time Uncertainty-Aware Point Cloud Segmentation for Autonomous Robotic Manipulation},
author={Shraddha Sharma},
year={2026}
}MIT License - see LICENSE file for details
For issues, questions, or contributions:
- Check existing issues and documentation
- Refer to the research paper for methodology details
- Review example scripts in
example_usage.py
- J Chen et al. "LRGNet: Learnable Region Growing for Class-Agnostic Point Cloud Segmentation"
- Qi et al. (2017). "PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space"
- Gal & Ghahramani (2016). "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning"
- Kendall & Gal (2017). "What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?"
- Berman et al. (2018). "The Lovász-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-over-Union Measure in Neural Networks"
- Core model architecture with hierarchical encoder
- Bayesian uncertainty quantification (epistemic + aleatoric)
- Training pipeline with S3DIS/ScanNet support
- Inference engine with real-time performance
- Robot manipulation pipeline integration
- Comprehensive evaluation metrics
- Visualization and debugging tools