# Basics I - Data Types

# Table of contents

1 [Executive Summary](#summary)
2 [Integers](#int)\
3 [Floats](#float)\
4 [Booleans](#bool)\
5 [Strings](#str)

# 1. Executive Summary <a name="summary"></a>

Informally, the type of a variable has a one-to-one correspondance with the amount of bits that are reserved in memory to store it.

The Python interpreter infers at run-time the type of a variable: Python is a _dynamically typed_ language. This to contrast it with other - compiled - languages, like C or C++, where the type of a variable has to be declared when the variable identifier (its name) is introduced in the code. The latter are _statically typed_ languages.

The functino `type()` can be called over any defined variable to retrieve its type.

The following sections are organized as follows: 
- In Sec. [2](#int) we introduce integer numbers (or `int`), which are the Python data type to represent integers like 1, 2, 3,...
- In Sec. [3](#float) we introduce float numbers (or `float`), which are the Python data type to represent fractions like 1/2, 0.25 or real numbers like $e$, $\pi$,...
- In Sec. [4](#bool) we introduce booleans (or `bool`), which are the Python data type to represent the logical values `True` or `False`.
- In Sec. [5](#str) we introduce strings (or `str`), which are the Python data type to represent text like "this one".

# 2. Integers <a name="int"></a>

Integers like 1, 2, 3,... are represented in Python as `int` data

In [None]:
n = 10
type(n)

The amount of bits (i.e. memory) reserved to an `int` depends on its value. For `n=10` it is 4 bits (using the `.bit_length()` method of `ìnt` variables)

In [None]:
n.bit_length()

Indeed, it's simple to see that (check this [decimal-to-binary converter](https://www.rapidtables.com/convert/number/decimal-to-binary.html))

$$
10 = (1 \times 2^3) + (0 \times 2^2) + (1 \times 2^1) + (0 \times 2^0) = 8 + 0 + 2 + 0
$$

therefore the 4 binary numbers (i.e. 0/1 bits) `1010` are sufficient to represent the integer number 10.

Python is very efficient in its internal representation of integer numbers as it can represent integers arbitrarily big, like $10^{100}$ (named [Googol](https://en.wikipedia.org/wiki/Googol))

In [7]:
m = 10**100
print(m)

10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000


In [8]:
m.bit_length()

333

# 3. Floats <a name="float"></a>

Non-integers numbers are represented in Python as `float` data.

In [9]:
q = 1/4
print(q)

0.25


In [10]:
type(q)

float

As we see the fraction $\frac{1}{4}$ is represented _exactly_ as the `float` 0.25. This is because 0.25 has an exact (and obvious) binary representation (in terms of negative powers of the base 2)

$$
\frac{1}{4} = (0 \times 2^{0}) + (0 \times 2^{-1}) + (1 \times 2^{-2}) = \left(0 \times 1 \right) + \left(0 \times \frac{1}{2} \right) + \left(1 \times \frac{1}{4} \right) = 0.25
$$

where the 0/1 bits associated to smaller powers of 2: $ 2^{-3}, 2^{-4}, ...$ are all zero.

Therefore, in a [_fixed-point_](https://en.wikipedia.org/wiki/Fixed-point_arithmetic) binary representation (that is a binary representation using a fixed number of bits after the decimal point '.', as the one above), the decimal number 0.25 can be represented as the binary number `0.01` (check this [decimal-to-binary converter](https://www.rapidtables.com/convert/number/decimal-to-binary.html)), that is using only the first two leftmost bits after the '.' (which are the most significant).

Binary representation of `float` numbers is not always _perfect_. That is, it's not alway true that a decimal number $0 < q < 1$ can be represented exactly as the series

$$
q = \sum_{i=1}^{k} b_i \times 2^{-i}
$$

where $b_i = 0/1$ is the $i$-th bit. In particular it can be that:

- the series is infinite ($k = \infty$);

- the series requires more bits than those at disposal. That is, given a finite number of bits at disposal - say $k_{MAX}$ - it can be that $k > k_{MAX}$.

In this last case, the best we can do is a _truncation_ of the series. That is, $q$ can will be approximately represented as 

$$
q \approx \sum_{i=1}^{k_{MAX}} b_i \times 2^{-i}
$$


In real life things are more complicated. In particular, The IEEE 754 [double-precision](https://en.wikipedia.org/wiki/Double-precision_floating-point_format) standard - currently adopted by modern 64-bits machines - reserves 64 bits to represent a decimal number, but bits are not simply associated to negative and decreasing powers of the base 2: $2^{-1}, 2^{-2}, ...$ as in the [fixed-point](https://en.wikipedia.org/wiki/Fixed-point_arithmetic) binary representation that we considered before. The IEEE 754 standard prescribes a [_floating-point_](https://en.wikipedia.org/wiki/Double-precision_floating-point_format) format, where the meaning and role of the bits in the binary representation changes depending on their position. In particular, for your knowledge (more informations in [Wikipedia](https://en.wikipedia.org/wiki/Double-precision_floating-point_format)):
- 1 bit (the $1$st one) represents the sign;
- 11 bits (from the $2$nd to the $12$th) represent an exponent
- 52 bits (from the $13$th to the last one) represet the fractional part.

This representation allows to represent a greater range of decimal numbers, given the same amount of bits at disposal (64). This increase in the range of number representable comes at the cost of precision. In the IEEE 754 double-precision standards, the relative accuracy is of 15-digits.

The _finite-precision_ in the binary representation of decimal numbers leads to expected results like:

In [23]:
q = 0.25 + 0.1
q

0.35

but also to unexpected ones like:

In [25]:
q = 0.35 + 0.1  #should be 0.45
q

0.44999999999999996

Nevertheless, module [`decimal`](https://docs.python.org/2/library/decimal.html) allows us to set an arbitrary precision (we won't use it, but good for you to know):

In [56]:
import decimal 
from decimal import Decimal

In [57]:
decimal.getcontext()

Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow])

the precision is of 28 significant (non-zero) digits by default (`prec=28`)

In [58]:
q = Decimal(1) / Decimal(17)
q

Decimal('0.05882352941176470588235294118')

the precision can be changed arbitrarily to set the number of significant (non-zero) digits after the '.'

In [66]:
decimal.getcontext().prec = 3

q = Decimal(1) / Decimal(17)
q

Decimal('0.0588')

# 4. Bool <a name="bool"></a>

# 5. Strings <a name="str"></a>