# Dataset Visual Odometry / SLAM Evaluation

1. [Download odometry data set (grayscale, 22 GB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_odometry_gray.zip)
2. [Download odometry data set (color, 65 GB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_odometry_color.zip)
3. [Download odometry data set (velodyne laser data, 80 GB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_odometry_velodyne.zip)
4. [Download odometry data set (calibration files, 1 MB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_odometry_calib.zip)
5. [Download odometry ground truth poses (4 MB)](https://s3.eu-central-1.amazonaws.com/avg-kitti/data_odometry_poses.zip)



# Sensor setup 
<img src="images/setup_top_view.png" />

<img src="images/passat_sensors_920.png" />



- $P0$: Reference camera (left of stereo pair 1), extrinsics are identity.
- $P1$: Right camera of stereo pair 1, extrinsics include baseline offset.
- $P2$: Left camera of stereo pair 2, extrinsics depend on setup.
- $P3$: Right camera of stereo pair 2, extrinsics depend on setup.


---

Camera: $P0$:

```
Projection Matrix:
[[707.0912   0.     601.8873   0.    ]
 [  0.     707.0912 183.1104   0.    ]
 [  0.       0.       1.       0.    ]]
Intrinsic Matrix:
[[707.0912   0.     601.8873]
 [  0.     707.0912 183.1104]
 [  0.       0.       1.    ]]
Rotation Matrix:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Translation Vector:
[[0.]
 [0.]
 [0.]]
```
---

Camera: $P1$:
```
Projection Matrix:
[[ 707.0912    0.      601.8873 -379.8145]
 [   0.      707.0912  183.1104    0.    ]
 [   0.        0.        1.        0.    ]]
Intrinsic Matrix:
[[707.0912   0.     601.8873]
 [  0.     707.0912 183.1104]
 [  0.       0.       1.    ]]
Rotation Matrix:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Translation Vector:
[[ 5.37150653e-01]
 [-1.34802944e-17]
 [ 0.00000000e+00]]
```

From the above image the distance between two camera is `0.54` on $x$ axis and from decomposition we have: `5.37150653e-01`.

Refs: [1](https://www.cvlibs.net/datasets/kitti/setup.php)






Refs: [1](https://stackoverflow.com/questions/29407474/how-to-understand-the-kitti-camera-calibration-files), [2](https://github.com/yanii/kitti-pcl/blob/master/KITTI_README.TXT), [3](https://www.cvlibs.net/datasets/kitti/eval_odometry.php), [4](https://github.com/avisingh599/mono-vo/)



Refs: [1](https://rpg.ifi.uzh.ch/docs/VO_Part_I_Scaramuzza.pdf), [2](https://rpg.ifi.uzh.ch/docs/VO_Part_II_Scaramuzza.pdf), [3](https://rpg.ifi.uzh.ch/docs/Visual_Odometry_Tutorial.pdf), [4](https://github.com/alishobeiri/Monocular-Video-Odometery), [5](https://avisingh599.github.io/vision/monocular-vo/)


# Ground Truth Poses
each row of the data has 12 columns, 12 come from flattening a `3x4` transformation matrix of the left:

```
r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz
```





## Reconstruct Sparse/Dense Model From Known Camera Poses with Colmap

Your data should have the following structure: 

```
├── database.db
├── dense
│   └── sparse
│       └── model
│           └── 0
├── images
│   ├── 00000.png
│   ├── 00001.png
│   ├── 00002.png
│   └── 00003.png
└── sparse
    └── model
        └── 0
            ├── cameras.txt
            ├── images.txt
            └── points3D.txt
```

1. `cameras.txt`: should be like this:

```
# Camera list with one line of data per camera:
#   CAMERA_ID, MODEL, WIDTH, HEIGHT, PARAMS[]
# Number of cameras: 3
1 SIMPLE_PINHOLE 3072 2304 2559.81 1536 1152
2 PINHOLE 3072 2304 2560.56 2560.56 1536 1152
3 SIMPLE_RADIAL 3072 2304 2559.69 1536 1152 -0.0218531
```

2. `images.txt`: should be like this:

```
# Image list with two lines of data per image:
#   IMAGE_ID, QW, QX, QY, QZ, TX, TY, TZ, CAMERA_ID, NAME
#   POINTS2D[] as (X, Y, POINT3D_ID)
# Number of images: 2, mean observations per image: 2
1 0.695104 0.718385 -0.024566 0.012285 -0.046895 0.005253 -0.199664 1 00000.png

2 0.696445 0.717090 -0.023185 0.014441 -0.041213 0.001928 -0.134851 1 00001.png

3 0.697457 0.715925 -0.025383 0.018967 -0.054056 0.008579 -0.378221 1 00002.png

4 0.698777 0.714625 -0.023996 0.021129 -0.048184 0.004529 -0.313427 1 00003.png
```
and finally:

3. `points3D.txt`: This file should be empty.

KITI format for ground truth poses (for instance, for the file `data/kitti/odometry/05/poses/05.txt`) is:

```
# r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz
```
The colmap format for `images.txt` is: 

```
# colmap format:
# IMAGE_ID, QW, QX, QY, QZ, TX, TY, TZ, CAMERA_ID, NAME
```

Run the script [kitti_to_colmap.py](../scripts/kitti/kitti_to_colmap.py). It dumps the output into `images.txt` file. 


You can run the following script to add noise:

```
[kitti_to_colmap_noise](../scripts/kitti/kitti_to_colmap_noise.py).
```

The inside of `~/colmap_projects/kitti_noisy` create a soft link pointing to KITTI images:
ln -s <path-to-kiti-odometry-image> images

in my case:

```
 ln -s /home/$USER/workspace/OpenCVProjects/data/kitti/odometry/05/image_0/ images
```

### Setting up parameters

Then set the camera param:

```
CAM=707.0912,707.0912,601.8873,183.1104
```

set the project:
```
project_name=kitti_noisy
DATASET_PATH=/home/$USER/colmap_projects/$project_name
```

### Feature extraction

extract the features:
```
colmap feature_extractor  \
--database_path $DATASET_PATH/database.db  \
--image_path $DATASET_PATH/images  \
--ImageReader.single_camera=true --ImageReader.camera_model=PINHOLE --ImageReader.camera_params=$CAM \
--SiftExtraction.use_gpu 1 \
--SiftExtraction.estimate_affine_shape=true \
--SiftExtraction.domain_size_pooling=true
```

or 

```
colmap feature_extractor  \
--database_path $DATASET_PATH/database.db  \
--image_path $DATASET_PATH/images  \
--ImageReader.single_camera=true --ImageReader.camera_model=PINHOLE --ImageReader.camera_params=$CAM
```

### Matcher
run the matcher:

```
colmap sequential_matcher \
   --database_path $DATASET_PATH/database.db \
   --SequentialMatching.overlap=3 \
   --SequentialMatching.loop_detection=true \
   --SequentialMatching.loop_detection_period=2 \
   --SequentialMatching.loop_detection_num_images=50 \
   --SequentialMatching.vocab_tree_path="$DATASET_PATH/../vocab_tree/vocab_tree_flickr100K_words256K.bin" \
   --SiftMatching.use_gpu 1 --SiftMatching.gpu_index=-1  --SiftMatching.guided_matching=true 
```

Create the following directory:

```
dense/sparse/model/0
dense/refined/model/0
```

### Triangulation
then run the 

```
colmap point_triangulator \
    --database_path $DATASET_PATH/database.db \
    --image_path $DATASET_PATH/images\
    --input_path $DATASET_PATH/sparse/model/0 \
    --output_path $DATASET_PATH/dense/sparse/model/0
```

Now run bundle adjuster to only optimize the extrinsic (camera position and orientations) and **NOT** intrinsic (camera parameter)


```
colmap bundle_adjuster  \
  --input_path $DATASET_PATH/dense/sparse/model/0 \
  --output_path $DATASET_PATH/dense/refined/model/0 \
  --BundleAdjustment.refine_focal_length  0 \
  --BundleAdjustment.refine_principal_point   0 \
  --BundleAdjustment.refine_extra_params  0 \
  --BundleAdjustment.refine_extrinsics  1
```


Refs [1](https://colmap.github.io/faq.html#reconstruct-sparse-dense-model-from-known-camera-poses)