A Python implementation of classical monocular Visual Odometry using feature-based tracking and geometric pose estimation.
Visual Odometry (VO) estimates the trajectory of a camera by analyzing the motion of features in sequential images. This project demonstrates the fundamental pipeline used in robotics, autonomous vehicles, and AR/VR systems.
Key Features:
- FAST corner detection for robust feature extraction
- KLT optical flow for efficient feature tracking
- Essential Matrix decomposition for camera pose estimation
- Real-time trajectory visualization
┌─────────────┐
│ Video/Image │
│ Sequence │
└──────┬──────┘
│
v
┌─────────────────┐
│ Feature Detection│ ← FAST Algorithm
│ (Frame N-1) │
└──────┬──────────┘
│
v
┌─────────────────┐
│ Feature Tracking │ ← Lucas-Kanade Optical Flow
│ (Frame N) │
└──────┬──────────┘
│
v
┌─────────────────┐
│ Pose Estimation │ ← Essential Matrix + RANSAC
│ (R, t) │
└──────┬──────────┘
│
v
┌─────────────────┐
│ Trajectory │ ← Integrate Motion
│ Update │
└─────────────────┘
- Python 3.8+
- pip
# Clone the repository
git clone https://github.com/yourusername/VisualOdometry.git
cd VisualOdometry
# Install dependencies
pip install -r requirements.txtpython main.py --video path/to/your/video.mp4- Download a sequence from KITTI Odometry Dataset
- Run:
python main.py --kitti path/to/dataset/sequences/00- ESC: Exit the application
Detects corner features in the image using the FAST (Features from Accelerated Segment Test) algorithm.
detector = cv2.FastFeatureDetector_create(threshold=20)
keypoints = detector.detect(image)Tracks features from frame N-1 to frame N using Lucas-Kanade optical flow.
p1, status, err = cv2.calcOpticalFlowPyrLK(prev_img, curr_img, p0, None)Computes the Essential Matrix E that encodes the relative camera motion.
E = K^T @ F @ K
Where:
K= Camera intrinsic matrixF= Fundamental matrix
E, mask = cv2.findEssentialMat(points1, points0, focal, pp, method=cv2.RANSAC)Decomposes E into rotation (R) and translation (t):
_, R, t, mask = cv2.recoverPose(E, points1, points0, focal, pp)Updates global pose by integrating relative motion:
t_global = t_global + R_global @ (scale * t)
R_global = R_global @ RThe epipolar constraint states that for corresponding points p1 and p2:
p2^T @ E @ p1 = 0
- Rank 2 matrix (singular)
- 5 degrees of freedom
- Decomposition:
E = [t]_x @ R
Where [t]_x is the skew-symmetric matrix of translation vector.
| Parameter | Value | Purpose |
|---|---|---|
| FAST threshold | 20 | Corner detection sensitivity |
| KLT window size | 21×21 | Optical flow search area |
| RANSAC probability | 0.999 | Outlier rejection confidence |
| RANSAC threshold | 1.0 px | Inlier distance threshold |
- Scale Ambiguity: Monocular VO cannot determine absolute scale (distance is relative)
- Drift: Without loop closure, position error accumulates over time
- Lighting Sensitivity: FAST features may fail in extreme lighting conditions
- Rotation-Only Motion: Fails when camera only rotates (no translation)
- Add depth estimation using pre-trained neural networks
- Implement bundle adjustment for trajectory optimization
- Add loop closure detection
- Replace FAST with learned features (SuperPoint)
- Add IMU fusion (Visual-Inertial Odometry)
- Implement local mapping (Visual SLAM)
- Nistér, D. (2004). "An efficient solution to the five-point relative pose problem"
- Scaramuzza & Fraundorfer (2011). "Visual Odometry: Part I & II"
- Hartley & Zisserman (2003). "Multiple View Geometry in Computer Vision"
Contributions are welcome! Feel free to open issues or submit pull requests.
MIT License - see LICENSE file for details
- KITTI Vision Benchmark Suite for datasets
- OpenCV community for excellent documentation
- NASA JPL for inspiring Mars rover navigation work
Built for learning Visual Odometry fundamentals and preparing for Computer Vision interviews in Robotics and AR/VR.