Pushing ReadME and an example dataset. Final changes from project.

atomoclast · May 2, 2018 · bcb3178 · bcb3178
1 parent 86bb74c
commit bcb3178
Show file tree

Hide file tree

Showing 11 changed files with 215 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,97 @@
+# cSfM
+
+This is an early prototype project to develop a Structure from Motion Pipeline. 
+
+![Example Output](images/SFM_Condo.png)
+
+## Requirements and Dependencies
+
+This package uses the following 3rd party packages to run: 
+
+1. [gtsam](https://bitbucket.org/gtborg/gtsam)
+2. [PMVS2](https://www.di.ens.fr/pmvs/documentation.html)
+
+You will need to go and install GTSAM on your own machine. I used [this tag](https://bitbucket.org/gtborg/gtsam/src/c82fe1fde2fc988b6bde5e4798b66129bbb5da19/?at=4.0.0-alpha2) when I did my testing. 
+
+For PMVS2, I have included it in this local repo so It can make local calls to it to build the actual pointcloud `*.ply` file.
+
+## How To Build
+
+Once you have installed GTSAM, you will need to navigate to the root of this directory and run: 
+
+```
+mkdir build
+cd build
+cmake ..
+make
+```
+
+You should see the `cSfM` executable now in the build directory. 
+
+## Usage
+
+To use cSfM, you will need to have a directory containing several images of your item of interest. You will also need to pass in the *FULL* path to a textfile that contains the name of the images in the same folder you wish to build the pointcloud of, the focal length of your image (in pixels), the center of your image in x and y.
+
+If you are using a [claibrated camera](https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html), this should be easy to find as elements in the resultant distortion matrix. 
+
+However, if you are using images from a smart phone/uncalibrated camera, you can estimate these parameters.
+
+To calculate the focal length of your image, I have written a simple helper Python script. The way you use that is as follows: 
+
+```
+python calculate_focal_length.py [pxW] [f_mm] [fov]```
+
+where: 
+
+1. `pxW` is the width of the images in pixels.
+2. `f_mm` is the focal length of the camera that was used to capture the image in [mm].
+3. `fov` is the camera's Field of View in degrees. 
+
+To get this information, one can look at the [EXIF](https://en.wikipedia.org/wiki/Exif) metadata that their images should have.
+
+
+This should print out a relatively accurate estimate of the focal length of your images in pixel space.
+
+For a rough estimate of the center of the image in x and y, you can use this rough estimate: 
+
+1. cx = PixelWidth/2
+2. cy = PixelHeight/2
+
+This should be accruate enough, as the algorithm will attempt to calculate a more accurate camera matrix as it triangulates points in space.
+
+## Assumptions and Caveats
+
+This is still a first attempt at SfM, this is a very complex problem that still has a lot of research conducted on how to use different cameras and different improve results. That being said, here are some thoughts and tips to get good results: 
+
+1. Scale down your images. This pipeline does not use a GPU based method to calculate poses in space via triangulation, so it may take a long time. 
+
+2. Try and use images that have large overlaps in them. Right now the pipeline matches keypoints across all the images, so if there is not enough overlap, it will struggle to recover camera poses.
+
+3. Use a few images at first to test things out. Do not overestimate how computationally expensive it becomes as the size of images and volume of the image stack increases.
+
+## Future Improvements
+
+I hope to continue to develop this further by adding the following improvements over time as a side project: 
+
+1. Move away from AKAZE for feature matching and towards optical flow.
+While working on this project I learned that although photojournalism uses unsorted SfM, most of my image sets were image sets from walking around with a camera to different points in space. This lends itself more to the Visual Odometry aspect of SfM as well as the applications of robotics and more traditional computer vision. 
+Integrate this pipeline into ROS and use it with Visual Odometry. 
+
+2. In the past I have started to write a [Monocular Visual Odometry ROS package](https://github.com/atomoclast/ros_mono_vo). With what I learned here, I would really like to go and see how I can improve my error prone implementation there with what I learned. See how I can integrate VO and SfM together.
+
+3. Look further into OpenMVG, OpenMVS, and PCL. One issue with PMVS is that it produces a singular static ply file. If I wanted to integrate this into ROS for dynamic pointcloud pointcloud generation, I will need to figure out how to build up pointcloud data structures and publish them for the data types that ROS can handle.
+
+
+## References
+
+1. http://www.cse.psu.edu/~rtc12/CSE486/lecture19.pdf
+2. https://docs.opencv.org/3.0-beta/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
+3. https://docs.opencv.org/3.1.0/da/de9/tutorial_py_epipolar_geometry.html
+4. http://nghiaho.com/?p=2379
+5. https://www.di.ens.fr/pmvs/pmvs-1/index.html
+6. http://www.cs.cornell.edu/~snavely/bundler/
+7. https://robotics.stackexchange.com/questions/14456/
+determine-the-relative-camera-pose-given-two-rgb-camera-frames-in-opencv-python
+8. https://www.cc.gatech.edu/grads/j/jdong37/files/gtsam-tutorial.pdf
+
+
diff --git a/cSfM.h b/cSfM.h
@@ -154,13 +154,13 @@ void findMatches(vector<cv::Mat> &images, SFM_Tracker &track)
                 }
             }
 
-            // Match images and visualize it:
-            Mat img_matches;
-            cout<< "about to match the images? " <<endl;
-            drawMatches(img_pose_m.img, img_pose_m.keypoints, img_pose_n.img, img_pose_n.keypoints, matches, img_matches);
-            resize(img_matches, img_matches, img_matches.size());
-            imshow("img", img_matches);
-            waitKey(0);
+//             Match images and visualize it:
+            // Mat img_matches;
+            // cout<< "about to match the images? " <<endl;
+            // drawMatches(img_pose_m.img, img_pose_m.keypoints, img_pose_n.img, img_pose_n.keypoints, matches, img_matches);
+            // resize(img_matches, img_matches, img_matches.size());
+            // imshow("img", img_matches);
+            // waitKey(0);
 
         }
     }
@@ -227,8 +227,48 @@ void triangulateSFMPoints(SFM_Tracker &track, cv::Mat K)
         Mat E = findEssentialMat(destination, source, K.at<double>(0,0), pp, RANSAC, 0.999, 1.0, status);
         Mat local_R, local_t;
 
+        /*
+            Mat findEssentialMat(InputArray points1, InputArray points2, double focal=1.0, Point2d pp=Point2d(0, 0), int method=RANSAC, double prob=0.999, double threshold=1.0, OutputArray mask=noArray() )¶
+            Parameters: 
+            points1 – Array of N (N >= 5) 2D points from the first image. The point coordinates should be floating-point (single or double precision).
+            points2 – Array of the second image points of the same size and format as points1 .
+            focal – focal length of the camera. Note that this function assumes that points1 and points2 are feature points from cameras with same focal length and principle point.
+            pp – principle point of the camera.
+            method –
+            Method for computing a fundamental matrix.
+
+            RANSAC for the RANSAC algorithm.
+            MEDS for the LMedS algorithm.
+            threshold – Parameter used for RANSAC. It is the maximum distance from a point to an epipolar line in pixels, beyond which the point is considered an outlier and is not used for computing the final fundamental matrix. It can be set to something like 1-3, depending on the accuracy of the point localization, image resolution, and the image noise.
+            prob – Parameter used for the RANSAC or LMedS methods only. It specifies a desirable level of confidence (probability) that the estimated matrix is correct.
+            mask – Output array of N elements, every element of which is set to 0 for outliers and to 1 for the other points. The array is computed only in the RANSAC and LMedS methods.            
+        */
+
         cout<< "Recovering Pose." << endl;
         // https://docs.opencv.org/3.0-beta/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
+
+        /*
+
+        int point_count = 100;
+        vector<Point2f> points1(point_count);
+        vector<Point2f> points2(point_count);
+
+        for( int i = 0; i < point_count; i++ )
+        {
+            points1[i] = ...;
+            points2[i] = ...;
+        }
+
+        double focal = 1.0;
+        cv::Point2d pp(0.0, 0.0);
+        Mat E, R, t, mask;
+
+        E = findEssentialMat(points1, points2, focal, pp, RANSAC, 0.999, 1.0, mask);
+        recoverPose(E, points1, points2, R, t, focal, pp, mask);
+
+
+        */
+
         recoverPose(E, destination, source, local_R, local_t, K.at<double>(0,0), pp, status);
 
         // local tansform
@@ -251,6 +291,21 @@ void triangulateSFMPoints(SFM_Tracker &track, cv::Mat K)
 
         cur.P = P;
 
+
+        /* 
+
+            void triangulatePoints(InputArray projMatr1, InputArray projMatr2, InputArray projPoints1, InputArray projPoints2, OutputArray points4D)
+
+            Parameters: 
+            projMatr1 – 3x4 projection matrix of the first camera.
+            projMatr2 – 3x4 projection matrix of the second camera.
+            projPoints1 – 2xN array of feature points in the first image. In case of c++ version it can be also a vector of feature points or two-channel matrix of size 1xN or Nx1.
+            projPoints2 – 2xN array of corresponding points in the second image. In case of c++ version it can be also a vector of feature points or two-channel matrix of size 1xN or Nx1.
+            points4D – 4xN array of reconstructed points in homogeneous coordinates.
+
+        */
+
+
         Mat points4D;
         triangulatePoints(prev.P, cur.P, source, destination, points4D);
 
@@ -417,6 +472,11 @@ void bundleAdjustment(SFM_Tracker &track, gtsam::Cal3_S2& K, gtsam::Values& resu
 
         Pose3 pose(R, t);
 
+  //       // Add a prior on pose x1. This indirectly specifies where the origin is.
+      // noiseModel::Diagonal::shared_ptr poseNoise = noiseModel::Diagonal::Sigmas((Vector(6) << Vector3::Constant(0.3), Vector3::Constant(0.1)).finished()); // 30cm std on x,y,z 0.1 rad on roll,pitch,yaw
+      // graph.emplace_shared<PriorFactor<Pose3> >(Symbol('x', 0), poses[0], poseNoise); // add directly to graph
+
+
         // First image special case to make it really noisy and lower the confidence on it.
         if(i == 0)
         {
@@ -428,6 +488,15 @@ void bundleAdjustment(SFM_Tracker &track, gtsam::Cal3_S2& K, gtsam::Values& resu
 
         // iterate through common points between images:
 
+  //         // Simulated measurements from each camera pose, adding them to the factor graph
+  // for (size_t i = 0; i < poses.size(); ++i) {
+  //   SimpleCamera camera(poses[i], *K);
+  //   for (size_t j = 0; j < points.size(); ++j) {
+  //     Point2 measurement = camera.project(points[j]);
+  //     graph.emplace_shared<GenericProjectionFactor<Pose3, Point3, Cal3_S2> >(measurement, measurementNoise, Symbol('x', i), Symbol('l', j), K);
+  //   }
+  // }
+
         for (size_t k=0; k<imgPose.keypoints.size(); k++)
         {
             if(imgPose.keypoint_3d_exist(k))
@@ -467,6 +536,13 @@ void bundleAdjustment(SFM_Tracker &track, gtsam::Cal3_S2& K, gtsam::Values& resu
             {
                 initPrior=true;
 
+  //               // Because the structure-from-motion problem has a scale ambiguity, the problem is still under-constrained
+  // // Here we add a prior on the position of the first landmark. This fixes the scale by indicating the distance
+  // // between the first camera and the first landmark. All other landmark positions are interpreted using this scale.
+      // noiseModel::Isotropic::shared_ptr pointNoise = noiseModel::Isotropic::Sigma(3, 0.1);
+      // graph.emplace_shared<PriorFactor<Point3> >(Symbol('l', 0), points[0], pointNoise); // add directly to graph
+      // graph.print("Factor Graph:\n");
+
                 noiseModel::Isotropic::shared_ptr point_noise = noiseModel::Isotropic::Sigma(3, 0.1);
                 Point3 p;
                 p(0) = track.vGlobalPoint[i].point.x;
@@ -482,6 +558,9 @@ void bundleAdjustment(SFM_Tracker &track, gtsam::Cal3_S2& K, gtsam::Values& resu
 
     result = LevenbergMarquardtOptimizer(graph, initial).optimize();
 
+  //   /* Optimize the graph and print results */
+  // Values result = DoglegOptimizer(graph, initialEstimate).optimize();
+
     cout<<"\n%%%%%%%%%%%%%%%%%%%%%"<<endl;
     cout<<"Initial Graph Error = " <<graph.error(initial) <<endl;
     cout<<"Final Graph Error = " << graph.error(result) <<endl;

diff --git a/calculate_focal_length.py b/calculate_focal_length.py
@@ -1,10 +1,26 @@
+#!/usr/bin/env python
+
 from math import tan
+from sys import argv, exit
+
+# pxW = 4032
+# f_mm = 4.25 #mm
+# fov = 69.4 # deg
+
+def calculate_fpx(pxW, f_mm, fov):
+
+	f_px = (pxW * 0.5) / tan(fov*0.5*(3.1415/180))
+
+	print "Focal length in px: ", f_px
 
-pxW = 4032
-f_mm = 4.25 #mm
-fov = 69.4 # deg
+if __name__ == "__main__":
 
-f_px = (pxW * 0.5) / tan(fov*0.5*(3.1415/180))
+	if(len(argv) != 4):
+		print "Usage: calculate_focal_length.py <pxW> <f_mm> <fov>"
+		exit()
 
-print "Focal length in px: ", f_px
+	pxW = float(argv[1])
+	f_mm = float(argv[2])
+	fov = float(argv[3])
 
+	calculate_fpx(pxW, f_mm, fov)
diff --git a/image_sets/Condo/01.jpg b/image_sets/Condo/01.jpg
diff --git a/image_sets/Condo/02.jpg b/image_sets/Condo/02.jpg
diff --git a/image_sets/Condo/03.jpg b/image_sets/Condo/03.jpg
diff --git a/image_sets/Condo/04.jpg b/image_sets/Condo/04.jpg
diff --git a/image_sets/Condo/images.txt b/image_sets/Condo/images.txt
@@ -0,0 +1,4 @@
+01.jpg
+02.jpg
+03.jpg
+04.jpg
diff --git a/image_sets/Condo/params_fourth.txt b/image_sets/Condo/params_fourth.txt
@@ -0,0 +1,3 @@
+f	728
+cx	504
+cy	378
diff --git a/images/SFM_Condo.png b/images/SFM_Condo.png
diff --git a/main.cpp b/main.cpp
@@ -106,6 +106,10 @@ int main(int argc, char** argv ) {
 //    double cy = kMat.at<double>(1,2);
 
     gtsam::Values result;
+
+  //     // Define the camera calibration parameters
+  // Cal3_S2::shared_ptr K(new Cal3_S2(50.0, 50.0, 0.0, 50.0, 50.0));
+
     gtsam::Cal3_S2 K(f, f, 0, cx, cy); //Skew matrix
     bundleAdjustment(tracker, K,result);