
# Flaws of floating-point computing

Floating-point numbers can represent a very large range of numbers, from the smallest to the largest, similarly to scientific notation. They are the prefered types for scientific computing. Yet, one must be aware of the many **rounding errors** which are implied. 

First, in order to check visually the accuracy of some calculations, let's increase to 18 the output stream precision (this is 6 by default).

In [2]:
#include <iostream>
std::cout.precision(18) ;

The floating-point types have a limited number of digits available for their internal representation. Many numbers, such as `1./3.`, cannot be represented exactly:

In [3]:
std::cout << (1./3.) << std::endl ;

0.333333333333333315


Less intuitive, some very simple numbers (for humans) do not have an exact base-two representation:

In [4]:
std::cout << 0.1 << std::endl ;

0.100000000000000006


In [5]:
#include <iomanip>
std::cout << std::hexfloat << 0.1 << std::endl ;

0x1.999999999999ap-4


Some simple operations may amass rounding errors, which complicates comparison of floating-point numbers:

In [6]:
double d1 = 1. ;
double d2 = .1+.1+.1+.1+.1+.1+.1+.1+.1+.1 ;

std::cout << d1 << std::endl ;
std::cout << d2 << std::endl ;

if (d1==d2)
 { std::cout<<"numbers are the same"<<std::endl ; }
else
 { std::cout<<"numbers differ !"<<std::endl ; }

0x1p+0
0x1.fffffffffffffp-1
numbers differ !


## Good old-fashioned practice: epsilon

When comparing some floating point numbers, always allow an epsilon difference.

In [22]:
#include <cmath>
#include <limits>

In [20]:
bool compare( double val1, double val2 )
 {
  double epsilon = std::numeric_limits<double>::epsilon() ;
  return (std::abs(val1-val2)<epsilon) ;
 }

In [21]:
if (compare(1.,.1+.1+.1+.1+.1+.1+.1+.1+.1+.1 ))
 { std::cout<<"numbers are the same"<<std::endl ; }
else
 { std::cout<<"numbers differ !"<<std::endl ; }

numbers are the same


## Good old-fashioned practice: make the precision adjustable

The first simplest step is to define a type alias and use it throughout the code:

In [3]:
#include <iostream>
#include <cmath>
#include <limits>

In [4]:
typedef double real ;

In [7]:
bool compare( real val1, real val2 )
 {
  double epsilon = std::numeric_limits<real>::epsilon() ;
  std::cout<<"(~"<<epsilon<<") " ;
  return (std::abs(val1-val2)<epsilon) ;
 }

In [8]:
if (compare(1.,.1+.1+.1+.1+.1+.1+.1+.1+.1+.1 ))
 { std::cout<<"numbers are the same"<<std::endl ; }
else
 { std::cout<<"numbers differ !"<<std::endl ; }

(~2.22045e-16) numbers are the same


If you wish to perform the same calculations in single precision, just set `typedef` to `float`, compile and run.

## More flexible (but difficult) approach : templates

Make any computing function a template, with the floating type as parameter. This allows to mix different precisions within different steps of a scientific computing application.

In [9]:
template< typename Real >
bool compare( Real val1, Real val2 )
 {
  double epsilon = std::numeric_limits<Real>::epsilon() ;
  std::cout<<"(~"<<epsilon<<") " ;
  return (std::abs(val1-val2)<epsilon) ;
 }

In [10]:
float f1 = 1. ;
float f2 = .1+.1+.1+.1+.1+.1+.1+.1+.1+.1 ;

if (compare(f1,f2))
 { std::cout<<"numbers are the same"<<std::endl ; }
else
 { std::cout<<"numbers differ !"<<std::endl ; }

(~1.19209e-07) numbers are the same


In [12]:
float f1 = 1.f ;
float f2 = .1f+.1f+.1f+.1f+.1f+.1f+.1f+.1f+.1f+.1f ;

if (compare(f1,f2))
 { std::cout<<"numbers are the same"<<std::endl ; }
else
 { std::cout<<"numbers differ !"<<std::endl ; }

(~1.19209e-07) numbers differ !


In [13]:
float f1 = 1.f ;
float f2 = 10.f*.1f ;

if (compare(f1,f2))
 { std::cout<<"numbers are the same"<<std::endl ; }
else
 { std::cout<<"numbers differ !"<<std::endl ; }

(~1.19209e-07) numbers are the same


## Conclusion

Modern C++ will not bring any silver bullet for the rounding problems of floating point computing. You still have to rely on only some old-fashioned good practice, and externals tools that can help to locate greatest errors (CADNA, verificarlo, verrou).

## Questions ?

# References

* https://www.learncpp.com/cpp-tutorial/floating-point-numbers/
* [What Every Programmer Should Know About Floating-Point Arithmetic](https://floating-point-gui.de/)
* [IEEE-754 Floating-Point Conversion](https://babbage.cs.qc.cuny.edu/IEEE-754.old/Decimal.html)

© *CNRS 2020*  
*This document was created by David Chamont and translated by Olga Abramkina. It is available under the [Licence Creative Commons - Attribution - No commercial use - Shared under the conditions 4.0 International](http://creativecommons.org/licenses/by-nc-sa/4.0/)*