⭐ RANSAC is an algorithm used for fiting models with posible large amount of noise that can get wors performance due to this outlier points. In this project, its build a model in C++
to fit noisy linear data.
Author -> Arnau Pérez Pérez | Github | LinkedIn | 01arnauperez@gmail.com
nData
|nSamples
|nInliers
: original, random samples and inliers size.data
: original (x, y) coordinate pairs.sampleData
: random (x, y) coordinate pairs taken from data.currentInliers
: (x, y) coordinate pairs that are inferred as inliers.bestInliers
: (x, y) coordinate pairs of inliers that have the greater number of inliers.maxIter
: number of times that a random sample is chosen.maxIterLR
: number of times each random sample is trained.currentModel
: m and b parameters at each iteration.bestModel
: m and b parameters corresponding to model that infers the greater number of inliers.lr (learning rate)
: a hyperparameter used to set the rate at which a model updates its parameters (m, b).threshold
: criterion to decide whether a point is an inlier or not.costFunction
: function used for accepting points as inliers.
- Read the data into nData pairs of (x, y) coordinates.
- Get a random set of points, randomData, of size nSample > 1.
- Perform a LR(maxIterLR times) over randomData resulting in a currentModel.
- Count nInliers using currentModel over data and store this points in currentInliers.
- If currentModel has more nInliers than than bestModel, replace bestModel by currentModel parameters, otherwise continue to the next step.
- Repeat steps 2 to 5 until maxIter.
- At that point you have a bestModel with m and b parameters and bestInliers with (x, y) coordinates, where nInliers ≤ nData.
(Optional)
Perform a LR(maxIterLR times) over bestInliers resulting in a bestModel (superbestModel if you'd like to call it).
The threshold values must align in the same data magnitude. To ensure this alignment, the median absolute deviation (MAD)
is employed and calculated as follows [3]:
For empirical validation of the model's efficacy in generating noisy linear data, a concise program dataBuilder.cpp has been developed and executed.
Two different cost functions are proposed to be used in Inliers counter process:
-
Absolute function
->$|y_{pred} - y|$ -
Square error function
->$(y_{pred} - y)^{2}$
Figure 1.
Comparative analysis of distinct scale magnitudes delineated by the linear data slope values. Gray data points represent outliers, while black data points represent inliers as determined by the model. Additionally, the performance of both the RANSAC model and standard Linear Regression is shown.
Scale | Slope (m) | Intercept (b) | Observations |
---|---|---|---|
Small | 0.002 | 5 | Due to the small differences between predicted and actual values, the Squared Error function make these distances even smaller. |
Medium | -4 | 12 | Both cost functions demonstrate effectiveness; however, in this particular example, the Squared Error function appears superior. |
Large | 4000 | 300 | Larger scale values amplify existing distances. If the data exhibits minimal deviation, the Absolute Error function may offer a more appropriate choice. |
Table 1.
presents the data scale (magnitude) accompanied by arbitrarily generated linear data with parameters m and b along with added noise. Additionally, notable observations are documented, concluded from the accompanying figures.
-
Enhance the research environment
by incorporating additional functionalities to facilitate algorithm testing. -
Conduct an in-depth investigation
into cost functions to ensure model magnitude independence. -
Conduct comprehensive testing
with diverse parameter sets to assess their impact on model performance. -
Implement an estimation for maxIter
by introducing a confidence formula. [4]
There are three distinct executable files in the project:
Two C++
files for generating testing data dataBuilder.cpp and running tests main.cpp.One Python
file for plotting results plot.py.
For dataBuilder.cpp:
g++ -o dataBuilder.exe dataBuilder.cpp
For main.cpp:
g++ -std=c++11 -o main.exe main.cpp includes/logFunctionalities.cpp includes/utils.cpp includes/linearRANSAC.cpp includes/fileFunctionalities.cpp
After compiling, run the executables with the -h
flag to view the instructions:
./dataBuilder.exe -h
./main.exe -h
For the Python file, there is also a help flag:
python plot.py -h
[1] Outlier detection using the RANSAC algorithm | https://medium.com/mlearning-aioutlier-detection-using-the-ransac-algorithm-de52670adb4a
[2] Dealing with Outliers: RANSAC | Image Stitching | https://www.youtube.com/watch?v=EkYXjmiolBg
[3] Python scikit learn library | MAD threshold | https://github.com/scikit-learn/scikit-learn/blob/84c3ff0fc122a5aa16b224ac081856808f42f2e5/sklearn/linear_model/_ransac.py#L72
[4] Overview of the RANSAC Algorithm, Konstantinos G. Derpanis | https://rmozone.com/snapshots/2015/07/cdg-room-refs/ransac.pdf