# Numbers

C++ benefits for scientific computing:
* performance and precision is the primary priority
* compatibility with C

Drawbacks:
* portability is not the primary priority
* some C inherited features, such as implicit conversions, may lead to tricky numerical errors.

## Uninitialized variables

First thing to be very careful about: if a basic variable is defined without any initial value, there is no guarantee that its value is set to 0. The C designers probably thought it was a waste of precious runtime...

Recommendation : **always give an initial value to your variables**.

If by mistake you use an uninitialized variable, no doubt that the compiler will give you a warning.

Recommendation : **compiler's warnings should be taken with the utmost seriousness**.

## Unportable numeric types

The size of numeric variable types in C++ depends on the implementation. This may impede the portability of the code.

For example, the following rules are imposed on integer types by the C++ standards:
* `short` : a width of at least 16 bits.
* `int`   : a width of at least 16 bits.
* `long`  : a width of at least 32 bits.
* `sizeof(short)` <= `sizeof(int)` <= `sizeof(long)`

The rules on floating point types are not strict either:
* `sizeof(float)` <= `sizeof(double)` <= `sizeof(long double)`
* `float`  : typically 32 bits *(IEEE 754, 6-9 significant digits, typically 7)*.
* `double` : typically 64 bits *(IEEE 754, 15-18 significant digits, typically 16)*.
* `long double` : 80 to 128 bits *(18-36 significant digits)*.

For a given platform, the standard class `numeric_limits` can help to check the sizes:

In [1]:
#include <iostream>

In [16]:
#include <limits>

std::cout
 << "type\tbits\tmin\t\tmax\n"
 << "int\t" << sizeof(int)*8 << "\t"
 << std::numeric_limits<int>::min() << '\t'
 << std::numeric_limits<int>::max() << '\n'
 << "float\t" << sizeof(float)*8 << "\t"
 << std::numeric_limits<float>::min() << '\t'
 << std::numeric_limits<float>::max() << '\n' 
 << "double\t" << sizeof(double)*8 << "\t"
 << std::numeric_limits<double>::min() << '\t' 
 << std::numeric_limits<double>::max() << '\n' ;

type	bits	min		max
int	32	-2147483648	2147483647
float	32	1.17549e-38	3.40282e+38
double	64	2.22507e-308	1.79769e+308


## Unpredictable precision

When an operation mix variables with different precisions, the compiler casts all the operands to the *best* precision.

In [4]:
float f = std::numeric_limits<float>::max() ;
std::cout<<( f * 10.f )<<std::endl ;
std::cout<<( f * 10.  )<<std::endl ;

inf
3.40282e+39


Floating point types are considered better than integer types.

In [5]:
float f = std::numeric_limits<float>::max() ;
std::cout<<( f * 10 )<<std::endl ;

inf


`unsigned` types are considered better than their `signed` flavor...

In [6]:
unsigned int i = 1 ;
int j = -1 ;
std::cout<<( i * j )<<std::endl ;

4294967295


Sometimes, computation is performed with a precision higher than the original variables. For example, a `short` based operation will always be evaluated as an `int` (because `int` is the same size as the hardware registers).

In [8]:
short s1 = std::numeric_limits<short>::max() ;
short s2 = 1 ;
short s3 = (s1+s2) ;
std::cout << (s1+s2) << std::endl ;
std::cout << s3 << std::endl ;

32768
-32768


**BEWARE**: Intel processors are typically computing their `double` operations with extraneous digits (80). One may think **the higher the precision, the better**. But it implies a **portability issue**, because the results will differ when running your code on a different processor, when vectorizing, when porting the code to GPU...

## Unnoticed conversions

For simplicity of coding, C compilers and thus, C++ compilers as well, are authorized to perform multiple conversions between predefined numeric types. These so-called **implicit conversions** are automatic and often unnoticed by the developer. 

Some of these allowed implicit conversions can introduce a loss of precision: for example a transformation from a floating-point number to an integer. The compiler assumes that such ***narrowing*** is done on purpose. Is it a reasonable assumption for a code made of thousands of lines?

In [8]:
double pi = 3.1416 ; 
int i = pi ;
std::cout<<"double "<<pi<<" => int "<<i<<std::endl ;

double 3.1416 => int 3


In [9]:
long lmax = std::numeric_limits<long>::max() ;
short s = lmax ;
std::cout<<"long "<<lmax<<" => short "<<s<<std::endl ;

long 9223372036854775807 => short -1


In [10]:
double dmax = std::numeric_limits<double>::max() ;
float f = dmax ;
std::cout<<"double "<<dmax<<" => float "<<f<<std::endl ;

double 1.79769e+308 => float inf


Even worse, the compiler can transform any signed/unsigned integer into unsigned/signed !

In [11]:
void display_signed( short v )
 { std::cout<<v<<std::endl ; }

unsigned short us = 42000 ;
display_signed(us) ;

-23536


In [12]:
void display_unsigned( unsigned short v )
 { std::cout<<v<<std::endl ; }

short s = -42 ;
display_unsigned(s) ;

65494


**BEWARE**: when mixing signed and unsigned integers in an expression, the compiler will consider the unsigned flavor as the most accurate, and transform all the integers accordingly. One more time, **paying attention to compiler warnings will help you out**.

## Disputed practice: never use unsigned numbers

...if you can !
- There are some contexts (embedded computing) where every bits is worth saving.
- The standard library designers made the choice of unsigned integers for the size and indexes of all the containers :(

## Good old-fashioned practice: make all conversions explicit

In a program of a large size, implicit conversions are more of a hindrance than a help. It is advised to set the warning level to maximum, scrutinize carefully all compiler warnings, and make explicit any conversion you identify in the code.

The C way, for explicit conversions, is to use the type name as a function:

In [17]:
unsigned short i = 42000 ;
short j = short(i) ; 
std::cout<<j<<std::endl ;

-23536


Better, C++ comes with a set of explicit type casting operators. The one to be used by default is `static_cast`:

In [10]:
unsigned short i = 42000 ;
short j = static_cast<short>(i) ;
std::cout<<j<<std::endl ;

-23536


Three other type casting operators are available, for rare specific use-cases:
* `const_cast` : in rare cases, when one wants to get rid of the constness of a variable;
* `dynamic_cast` : to goes down an inheritance tree;
* `reinterpret_cast` : in very rare cases, when one wants to change the way a memory chunk is interpreted.

## User-defined numerical types

If you are gritty enough to develop your own numerical type, you may want to provide a constructor and/or a conversion operator, so to ease interaction with functions which generate and/or require doubles.

In [18]:
#include <iostream>

In [19]:
class Half
 {
  public :
    Half( double f ) { std::cout<<"Half::Half(double)"<<std::endl ; }
    operator double() { std::cout<<"Half::operator double())"<<std::endl ; return 0 ; }
 } ;  

In [20]:
void display_double( double ) { std::cout<<"display_double()"<<std::endl ; }

In [21]:
void display_half( Half ) { std::cout<<"display_half()"<<std::endl ; }

In [22]:
Half value = 3.14 ;
display_double(value) ;
display_half(3.14) ;

Half::Half(double)
Half::operator double())
display_double()
Half::Half(double)
display_half()


BEWARE: the unary constructor, and the conversion operator, opens the door for **implicit conversions**. The compiler can even chain several ones, such as `short` => `double` => `Half` below, or the contrary. 

In [23]:
short value = 42 ;
display_half(value) ;

Half::Half(double)
display_half()


In [25]:
void display_short( short ) { std::cout<<"display_short()"<<std::endl ; }

In [26]:
Half value = 42 ;
display_short(value) ;

Half::Half(double)
Half::operator double())
display_short()


## Good old-fashioned practice: make unary constructors explicit

The implicit conversions problem does not only apply to numerical classes. It is true for any class which has a unary constructor (constructor with only one argument).

The keyword `explicit` forbids the use of those unary constructors for implicit conversions. It should be used almost always.

In [None]:
#include <iostream>

In [8]:
class Half
 { 
  public :
    explicit Half( double f ) { std::cout<<"construct"<<std::endl ; }
    operator double() { std::cout<<"convert"<<std::endl ; return 0 ; }
 } ;

In [9]:
Half value = 3.14 ;
display_double(value) ;

[1minput_line_15:2:7: [0m[0;1;31merror: [0m[1mno viable conversion from 'double' to '__cling_N58::Half'[0m
 Half value = 3.14 ;
[0;1;32m      ^       ~~~~
[0m[1minput_line_14:1:7: [0m[0;1;30mnote: [0mcandidate constructor (the implicit copy constructor) not viable: no known conversion from 'double' to 'const __cling_N58::Half &' for 1st argument[0m
class Half
[0;1;32m      ^
[0m[1minput_line_14:1:7: [0m[0;1;30mnote: [0mcandidate constructor (the implicit move constructor) not viable: no known conversion from 'double' to '__cling_N58::Half &&' for 1st argument[0m


Interpreter Error: 

The keyword `explicit` has been generalized to conversion operators within C++11.

# Take Away

Implicit conversions are a major source of bugs in ancient C++, and modern C++ will hardly try to make them visible, forbid them, control them... thanks to various mechanisms such as the **universal initialization**.

# Questions ?

# References

* https://en.cppreference.com/w/cpp/language/types
* https://en.cppreference.com/w/cpp/language/implicit_conversion
* https://dbj.org/how-to-avoid-implicit-conversion-in-c/

© *CNRS 2020*  
*This document was created by David Chamont and translated by Olga Abramkina. It is available under the [Licence Creative Commons - Attribution - No commercial use - Shared under the conditions 4.0 International](http://creativecommons.org/licenses/by-nc-sa/4.0/)*