# Type Annotations in Python

## Introduction

Many people find scripting languages attractive because they enable fast prototyping of new code and quick development of extensive programs. It is certainly the case that simply calling the intepreter to execute the code in our source files is much easier and gives better feedback than calling makefiles and compilers and sorting through error messages. 

## Intermission: Build systems and compilers

Luckily much progress have been made on this front. In most cases there is no need to manually write makefiles, figure out appropiate compiler flags, and list the source files of your project, since many new build systems and programs have been developed to do just that.

For "older" languages like C, and C++, check out:
- [CMake](https://cmake.org/)
- [build2](https://build2.org/)
- [Conan](https://conan.io/) (this is more of a package manager and build orchestrator)
- [xmake](https://xmake.io/)
- [premake](https://premake.github.io/)

Of course this diversity raises some more problems. How do we integrate packages with different build systems with each other? What happens if a user cannot install the build system we are using to develop our project (e.g. it is not available on Windows)? This problem can be helped with using containers, and images like the ones provided by [Docker](https://docker.com), but that is another topic entirely.

Newer compiled languages like [Go](https://golang.org/), and [Rust](https://www.rust-lang.org/) ship with their own build automation and package management solutions, so there is no need to use some third-party software for that.

## Intermission over

Quick development is also helped by the lack of static typing. Most scripting languages, including Python, are dynamically typed. This does not mean that there are absolutely no types inside the langauge, but that type information of a variable is only available at runtime, not at compile time (the Python interpreter generates bytecode files that live in the `__pycache__` directories, this allows to omit the parsing of a source file if its contents are not changes).

It is possible to check for type information in runtime like so:

In [1]:
def append_to_string(text):
    assert(isinstance(text, str))
    return text + "appended_text"

print(append_to_string("123"))
append_to_string(1.0)


123appended_text


AssertionError: 

But this is not considered idiomatic Python and is not feasible in the long term, since the developer has to insert these type assertions everywhere in the code.

One might think that this is not a problem since the language was designed to be dynamically typed and there is no need for type checking since bugs and variables of the wrong types will be discovered while using the program. Before responding to this let me mention one more thing.

## Intermission: "Old" statically typed languages

Usually people had a bad experience with statically typed and compiled languages like C, C++, and Fortran. (e.g. strange and long compiler messages, long compile times, frustrating debugging sessions). Because of these experiences they consider a type system a burden and nuisance instead of helping hand. I cannot really blame them. Take a look at the example below written in C++:

```c++
vector<int> v = {1, 2, 3};
for (vector<int>::iterator i = v.begin(); i != v.end(); ++i) {
   // ... use *i ...
}
```

Here we iterate over the elements of variable `v`. Let's look at the same example in Python:

```python
v = [1, 2, 3]

for elem in v:
    # use elem
```

This is much more clear and readable. Two things to mention here:

1. Compiled languages have come a long way, so the same `for` loop written in modern C++ is almost as clear as the one written in Python.
2. The type annotations I will touch upon are much lighter than then type system of "old" C++.

This was just an example of how bad experiences with older compiled languages could leave a sour taste regarding statically enforced types.

## Intermission over

Back to the issue at hand. Why wouldy we want to use statically checked types in Python. Many libraries have beent written without them and they seem to work fine. It seems to be te case that adding a some kind of type system to Python would be detrimental, until:

- you have written a Python library with more than 500-1000 SLOC (source lines of code, the number of lines in the source files that are code, i.e. not comment or empty line)
- try to refactor said Python library source code (i.e. rename a function, variable or change the behaviour of a code block)
- try to figure out the arguments to a function, or the order of arguments of a function while trying to use / call said function

There are a quite a few number of cases where declaring the type of your variables and function arguments, can be really useful. Indeed many cases in the documentation Python libraries the authors have refered to the type of function arguments (see the example from numpy below). Before Python 3.5 this was the only way to add type hints.

In [6]:
import numpy as np
help(np.asarray)

Help on function asarray in module numpy:

asarray(a, dtype=None, order=None)
    Convert the input to an array.
    
    Parameters
    ----------
    a : array_like
        Input data, in any form that can be converted to an array.  This
        includes lists, lists of tuples, tuples, tuples of tuples, tuples
        of lists and ndarrays.
    dtype : data-type, optional
        By default, the data-type is inferred from the input data.
    order : {'C', 'F'}, optional
        Whether to use row-major (C-style) or
        column-major (Fortran-style) memory representation.
        Defaults to 'C'.
    
    Returns
    -------
    out : ndarray
        Array interpretation of `a`.  No copy is performed if the input
        is already an ndarray with matching dtype and order.  If `a` is a
        subclass of ndarray, a base class ndarray is returned.
    
    See Also
    --------
    asanyarray : Similar function which passes through subclasses.
    ascontiguousarray : Convert input 

Python 3.5 introduced so-called [type annotations](https://www.python.org/dev/peps/pep-0484/). These annotations can be used to give hints to the user of what kind of arguments are expected to functions. I will demonstrate using the example from the start.

In [7]:
def append_to_string(text: str) -> str:
    return text + "appended_text"

The expected type of the argument `text` is signified with the typename (`str` in our case) written after the colon (`:`).  Since the start of the function block is also signified by the colon character, the return type of the function can be signified by the arrow "symbol" (`->`) which is made up of characters `-`, and `>`.

Let's try it out!

In [8]:
append_to_string("abc")

'abcappended_text'

In [9]:
append_to_string(1.0)

TypeError: unsupported operand type(s) for +: 'float' and 'str'

It looks like we did not get any typechecks or warnings.

It is important to note, that these annotations do not yield in any actual runtime or compile time type checks and also do not decrease runtime performance.

So how can these help?

First of all, types help with documenting our code and functions as they show up in the built-in Python help:

In [10]:
help(append_to_string)

Help on function append_to_string in module __main__:

append_to_string(text: str) -> str



Second, there are a couple of external tools that can use type information to check if we passed the variables with the right types to function arguments:

- [mypy](http://mypy-lang.org/) is probably oldest of Python static type checkers. It works similar to a compiler, you give ot a filename and it checks for the correctness of types
- [pyre](https://pyre-check.org/) is a relatively new tool, that is more performant than mypy and it also includes linting capabalities
- [pyright](https://github.com/microsoft/pyright) is a type checker from Microsoft that supports both a command line and a [language server](https://langserver.org/) mode

## Intermission: Code linting

I would really recommend using some kind of linter. Not just when writing Python code, generally when writing code. A linter is a tool that checks for syntax and other kind of errors in the source code file. It can detect bad coding style, missing variables, and other gotchas.

Usually they are command line programs that write their output to the terminal, but nowadays there are plenty of extensions for mainstream text editors that integrate them. This means that you do not have to jump back and forth between your text editor and command line to see the warnings given by the linter. Instead the text editor will display errors and warnings next to the relevant line.

A widely used editor is [Visual Studio Code](https://code.visualstudio.com/) (note: this is different from Visual Studio, the proprietary IDE) that pretty much all programming languages and many Python linters.

If you decide to use VS Code, consider checking out the [VSCodium](https://vscodium.com/) as an alternative. This version does not have built-in telemetry that sends your editing data to Microsoft.

## Intermission over

## New style classes, `dataclass` and `NamedTuple`

### Creating Python classes the hard way

In [3]:
class MinMax(object):
    def __init__(self, min, max):
        self.min, self.max = min, max

In [4]:
m = MinMax(min=1.0, max=2.0)

In [5]:
class MinMax(object):
    def __init__(self, min, max):
        self.min, self.max = min, max
    
    def shift(self, val):
        self.min += val
        self.max += val

In [6]:
m = MinMax(min=1.0, max=2.0)

In [7]:
m.shift(5.0)

Explain problem with printing.

In [8]:
m

<__main__.MinMax at 0x7f3d39471a00>

Back to the drawing table. Implement `__str__` method.

In [12]:
class MinMax(object):
    def __init__(self, min, max):
        self.min, self.max = min, max
    
    def __str__(self):
        return "MinMax(min=%s, max=%s)" % (self.min, self.max)
    
    def __repr__(self):
        return self.__str__()
    
    def shift(self, val):
        self.min += val
        self.max += val

In [17]:
m = MinMax(min=1.0, max=2.0)

In [18]:
m

MinMax(min=1.0, max=2.0)

In [19]:
m.shift(0.5)

In [20]:
m

MinMax(min=1.5, max=2.5)

In [21]:
from typing import NamedTuple

class MinMaxNM(NamedTuple):
    min: object
    max: object

Explain: `_replace`, mutability, immutability. 