RANSAC

⭐ RANSAC is an algorithm used for fiting models with posible large amount of noise that can get wors performance due to this outlier points. In this project, its build a model in C++ to fit noisy linear data.

Author -> Arnau Pérez Pérez | Github | LinkedIn | 01arnauperez@gmail.com

Parameters and variables description

nData | nSamples | nInliers: original, random samples and inliers size.
data: original (x, y) coordinate pairs.
sampleData: random (x, y) coordinate pairs taken from data.
currentInliers: (x, y) coordinate pairs that are inferred as inliers.
bestInliers: (x, y) coordinate pairs of inliers that have the greater number of inliers.
maxIter: number of times that a random sample is chosen.
maxIterLR: number of times each random sample is trained.
currentModel: m and b parameters at each iteration.
bestModel: m and b parameters corresponding to model that infers the greater number of inliers.
lr (learning rate): a hyperparameter used to set the rate at which a model updates its parameters (m, b).
threshold: criterion to decide whether a point is an inlier or not.
costFunction: function used for accepting points as inliers.

Algorithm steps to follow

Read the data into nData pairs of (x, y) coordinates.
Get a random set of points, randomData, of size nSample > 1.
Perform a LR(maxIterLR times) over randomData resulting in a currentModel.
Count nInliers using currentModel over data and store this points in currentInliers.
If currentModel has more nInliers than than bestModel, replace bestModel by currentModel parameters, otherwise continue to the next step.
Repeat steps 2 to 5 until maxIter.
At that point you have a bestModel with m and b parameters and bestInliers with (x, y) coordinates, where nInliers ≤ nData.
(Optional) Perform a LR(maxIterLR times) over bestInliers resulting in a bestModel (superbestModel if you'd like to call it).

Model performance

The threshold values must align in the same data magnitude. To ensure this alignment, the median absolute deviation (MAD) is employed and calculated as follows [3]: $$MAD(y)=median(y−median(y))$$

For empirical validation of the model's efficacy in generating noisy linear data, a concise program dataBuilder.cpp has been developed and executed.

Two different cost functions are proposed to be used in Inliers counter process:

Absolute function -> $|y_{pred} - y|$
Square error function -> $(y_{pred} - y)^{2}$

Figure 1. Comparative analysis of distinct scale magnitudes delineated by the linear data slope values. Gray data points represent outliers, while black data points represent inliers as determined by the model. Additionally, the performance of both the RANSAC model and standard Linear Regression is shown.

Scale	Slope (m)	Intercept (b)	Observations
Small	0.002	5	Due to the small differences between predicted and actual values, the `Squared Error` function make these distances even smaller.
Medium	-4	12	Both cost functions demonstrate effectiveness; however, in this particular example, the `Squared Error` function appears superior.
Large	4000	300	Larger scale values amplify existing distances. If the data exhibits minimal deviation, the `Absolute Error` function may offer a more appropriate choice.

Table 1. presents the data scale (magnitude) accompanied by arbitrarily generated linear data with parameters m and b along with added noise. Additionally, notable observations are documented, concluded from the accompanying figures.

Next Steps

Enhance the research environment by incorporating additional functionalities to facilitate algorithm testing.
Conduct an in-depth investigation into cost functions to ensure model magnitude independence.
Conduct comprehensive testing with diverse parameter sets to assess their impact on model performance.
Implement an estimation for maxIter by introducing a confidence formula. [4]

Short guide for using the code

There are three distinct executable files in the project:

Two C++ files for generating testing data dataBuilder.cpp and running tests main.cpp.
One Python file for plotting results plot.py.

For dataBuilder.cpp:

g++ -o dataBuilder.exe dataBuilder.cpp

For main.cpp:

g++ -std=c++11 -o main.exe main.cpp includes/logFunctionalities.cpp includes/utils.cpp includes/linearRANSAC.cpp includes/fileFunctionalities.cpp

After compiling, run the executables with the -h flag to view the instructions:

./dataBuilder.exe -h

./main.exe -h

For the Python file, there is also a help flag:

python plot.py -h

References

[1] Outlier detection using the RANSAC algorithm | https://medium.com/mlearning-aioutlier-detection-using-the-ransac-algorithm-de52670adb4a

[2] Dealing with Outliers: RANSAC | Image Stitching | https://www.youtube.com/watch?v=EkYXjmiolBg

[3] Python scikit learn library | MAD threshold | https://github.com/scikit-learn/scikit-learn/blob/84c3ff0fc122a5aa16b224ac081856808f42f2e5/sklearn/linear_model/_ransac.py#L72

[4] Overview of the RANSAC Algorithm, Konstantinos G. Derpanis | https://rmozone.com/snapshots/2015/07/cdg-room-refs/ransac.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
includes		includes
.gitignore		.gitignore
README.md		README.md
dataBuilder.cpp		dataBuilder.cpp
main.cpp		main.cpp
plot.py		plot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RANSAC

Parameters and variables description

Algorithm steps to follow

Model performance

Next Steps

Short guide for using the code

References

About

Releases

Packages

Languages

arnaupy/RANSAC

Folders and files

Latest commit

History

Repository files navigation

RANSAC

Parameters and variables description

Algorithm steps to follow

Model performance

Next Steps

Short guide for using the code

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages