# Basic data types

The states we have manipulated up until now are too limited in what they can contain: only integer numbers. This limitation makes it much more difficult to encode information which is naturally not understood as integer numbers, such as text (a name, a surname, a country, ...), a yes/no piece of information (whether or not a user has a driver's license), a rational number (the precise length of something in meters: 1.79), and more.

We might want to go the extra length of trying to encode these other types of data by only using integers in a clever way, and indeed we could define:
- numbers for letters (`0` for `a`, `1` for `b`, etc.);
- `0` for _no_, `1` for _yes_;
- two numbers for fractions (`1` and `2` for `1/2 = 0.5`);
- ...

The issue with such a strategy would be the inherent confusion associated with overloading the meaning of a construct, integers, with many more associated meanings. Moreover, adding two rational numbers would become much more complex: instead of just being able to write `0.5 + 0.25`, we would need to add `1/2 + 1/4`, which in turn requires many steps. Programs written this way would become long, needlessly complex, and unwieldy.

Moreover, since these concepts are very important when encoding the solutions to many recurring problems, we want to treat them as _primitives_, instead of derived objects.


## Data types and operators
In order to be able to tackle this issue, we will now introduce the concept of **data types**.

A data type is a set, $T$, equipped with a series of distinct operations $Ops$.

The set, $T = \{ a, b, c, d, \dots \}$, contains all the elements that make up the data type. This is the static foundation of the datatype, and for integers it would contain all numbers such as $T = \mathbb{N} = \{ 0, 1, -1, 2, -2, 3, -3, \dots \}$ .

A data type contains more than just an unstructured collection of values. A data type also has some important connections between elements. We call these connections *structure*, as they determine a network of ties between the elements which are all and the only paths that can be followed.

These connections are represented by the operators available, $Ops = \{ op^{a_1}_1, op^{a_2}_2, \dots \}$. An operator $op_i^a$ would take as input (connect) $a$ values from $T$, and yield as output (to) a single value in $T$. $a$ is called the _ariety_ of the operator.

In the case of integers this set could for example be: $\{ +^2, -^2, -^1, \times^2, /^2 \}$. Notice that there are two distinct meanings for the same symbol, depending on its ariety: $-^2$ and $-^1$. Indeed, subtraction and negation both use the minus symbol $-$, but with either two or one arguments, which we both see in action in an expression such as $4 -^2 (-^1 3)$.

Notice that we usually do not bother with specifying the ariety for very well known operators, but the ariety is still quite important and will need, at some point, to be defined. When clear from context (almost always) we will not need to write it as a superscript to the operator $4 -^2 (-^1 3)$ then becomes the well known $4 - (-3)$.

The fact that operators simply represent connections between elements of $T$ can be understood visually. Let us consider, for example, negation. Negation connects each element of $\mathbb{N}$ to another element of $\mathbb{N}$:

<img src="images/integer_negation_set.png" alt="Integer negation set" style="width: 400px;"/>

Negation is an operator of ariety one, which is simpler to visualise. We can make a first step in visualizing operators of higher ariety by showing $T$ multiple times, and drawing the operator as picking first one element from the first $T$, then another from the second $T$, etc., and finally diving into the resulting element in the final $T$: 

<img src="images/integer_sum_two_steps.png" alt="Integer sum as pairs to singles" style="width: 800px;"/>

Another notation, perhaps less visually suggestive but very often used, would group elements in all possible combinations of the desired ariety. The set of all tuples of a given ariety is called _Cartesian product_ of the set with itself. This leads us to defining an operator with ariety greater than one as a link from a _Cartesian product_ $T \times T \times \dots T$, into $T$ itself:

<img src="images/integer_sum_pairs.png" alt="Integer sum as pairs to singles" style="width: 400px;"/>

Following the notations above, we will define an operator $op^n$ in a data type on $T$ as 

$$op^n : (T_1, T_2, \dots, T_n) \rightarrow T$$

The colon and the comma's in the definition above tell us that $op^n$ will *accept* (or *take*) a series of $n$ parameters taken from $T$, which we call $T_1$, $T_2$, etc. 

An alternative notation would emphasize the fact that the operands are given to the operator in a specific order, and as such giving an operand already corresponds to "following an arrow":

$$op^n : T_1 \rightarrow T_2 \rightarrow \dots \rightarrow T_n \rightarrow T$$


The two notations above are, for those familiar with functional programming, the curried and uncurried versions of the operator.


### About arrows

Notice how we are not saying $3 + 2 = 5$, but rather $3 + 2 \rightarrow 5$. This apparent difference between what is traditionally seen in basic arithmetics and what we are presenting here is not just a random occurrence. When we say $3 + 2$ in programming, we are not defining a number, but rather a specification used to determine a number via computation. We call such a specification an _expression_, and expressions and statements are combined together to form programs.

Evaluating expressions is the process of following the arrows defined, for example, in the pictures above in order to slowly, one piece at a time, reduce the expression to simpler and simpler form. We stop when we have reached a value of $T$, which we cannot simplify any further.

This means that $3 + 2$ _is not_ $5$, but we can **go** _from_ $3 + 2$ _to_ $5$.

Moreover, since our goal is automation, we require that following arrows must be a clear, unambiguous process: at every step of the computation it must be that at most one arrow can be followed. Notice that the arrow $3 + 2 \rightarrow 5$ has a clear direction: left to right. Arrows (or computations) move from a complex specification to a simpler answer, but not backwards. Indeed, suppose we were trying to go backwards from a simple answer such as $5$. How did we get to $5$? There is no way to determine this uniquely, as it could have been determined by:
- $0 + 5 \rightarrow 5$
- $1 + 4 \rightarrow 5$
- $2 + 3 \rightarrow 5$
- $10 - 5 \rightarrow 5$
and infinite other possibilities. 

This means that our notion of *computing*, which is based on following arrows, trades time and information (the previous program and state are usually lost) for a simpler (but equivalent) formulation that is, hopefully, closer to the final result we are seeking.

## Some concrete data types

- let us move back into the realm of programming constructs
- what other data types make sense?
    - let us begin with the simplest data type of all: `unit`
        - `unit` has only one value: `None` (often called `null`)
        - `unit` supports no operations; it just exists
    - the next data type has a little more structure: `bool`
        - `bool` has two values: `True` and `False`
        - draw the set of values
        - we use `bool` whenever we must model two different states/situations
            - `is_subscribed`
            - `wants_info_by_email`
            - `shields_on`
            - ...
        - `bool`, like all non-trivial data types, has operators to manipulate one or more `bool`'s:
            - the simplest operator is `not`, which flips values around
                - `eval_expr(<not True>, S)` $\rightarrow$ `False`
                - `eval_expr(<not False>, S)` $\rightarrow$ `True`
            - `not: bool` $\rightarrow$ `bool`
            - draw `not`
            - the first combination operator is `and`, which gives `False` if any of the operands are `False`
                - `eval_expr(<False and E>, S)` $\rightarrow$ `False`
                - `eval_expr(<True and E>, S)` $\rightarrow$ `<E>`
                - why does it look like this?
            - `and: bool` $\times$ `bool` $\rightarrow$ `bool`
            - draw `and`
            - the second combination operator is `or`, which gives `True` if any of the operands are `True`
                - notice the definition symmetry of `and` and `or` 
                - `eval_expr(<True or E>, S)` $\rightarrow$ `True`
                - `eval_expr(<False or E>, S)` $\rightarrow$ `<E>`
                - why does it look like this?
            - `or: bool` $\times$ `bool` $\rightarrow$ `bool`
            - draw `or`
    - we know `int` already
        - we use it whenever we need to *count* something
        - operators are the usual arithmetic operators `+`, `*`, ...
        - there are also division operators `/`, `//`, `%`, ..., but these are slightly problematic (division by zero, modulus/remainder of negative numbers)
        - remember that `5 // 2` $\rightarrow$ `2`
    - `float` is similar to `int`, in that it has exactly the same operators and precedences, but (some, binary) fractional numbers are also supported
        - in `float`, `5.0 / 2.0` $\rightarrow$ `2.5`
        - also, `5 / 2` $\rightarrow$ `2.5`
    - some languages use different operators for `int` and `float`
        - Python uses `//` for `int` division and `/` for float division (**we will use this from now on!**)
    - in many languages the conversion between `int` and `float` happens automatically as we are computing, in order not to lose information
        - show an example of automatic conversion
    - text is represented as the `string` data type:
        - the simplest string is the empty string `""`
        - we add an alphabet (UTF characters: 'a', 'b', ..., '♥', ...): all elements of the alphabet are strings
        - given two string constants, we can concatenate them by writing them next to each other
        - show this visually as a set with levels: all strings of length 0, length 1, length 2, etc. and draw lines to connect the strings from the previous levels into the new level
            - watch out for associativity-induced equivalences: "a" + "ba" = "ab" + "a" = ...
        - the only combination operator is `+`, which concatenates strings (and string variables)
          - `+: string` $\times$ `string` $\rightarrow$ `string`
          - draw `+` (implicitly giving many examples)
- it is possible to mix values from different data types, that is to define operators that take as input values from one type and return a value from another type
    - draw examples of each
    - `< : int` $\times$ `int` $\rightarrow$ `bool`
    - `> : int` $\times$ `int` $\rightarrow$ `bool`
    - `<= : int` $\times$ `int` $\rightarrow$ `bool`
    - `!= : int` $\times$ `int` $\rightarrow$ `bool`
    - ... (GIYF)
    - `str` converts a value of another primitive data type into a `string`
    - `int : string` $\rightarrow$ `int`
    - `float : string` $\rightarrow$ `float`