In [1]:
import numpy as np

# Code Formatting, Docstrings, & Type Hints

**Content**:
+ Code Formatting
    + `Flake8`
    + `Black`
+ Docstrings
    + Function docstrings
    + Class docstrings
+ Typing
    + `mypy` 

**Sources**:

*Formatting*:
   + [list of Flake8 rules](https://www.flake8rules.com/)
   + [Flake8 documentation](https://flake8.pycqa.org/en/latest/user/invocation.html)
   + [Flake8 tutorial](https://gamedevacademy.org/flake8-tutorial-complete-guide/)
   + [Black documentation](https://black.readthedocs.io/en/stable/index.html)
    
*Documentation*:
   + [PEP 257 standards](https://peps.python.org/pep-0257/)
   + [Numpydoc standards](https://numpydoc.readthedocs.io/en/latest/format.html)
   + [Tutorial in RealPythong](https://realpython.com/documenting-python-code/#docstrings-background)
   + [docstring styles in Pandas](https://pandas.pydata.org/docs/development/contributing_docstring.html)

*Type hints*:
   + [`mypy` documentation](https://mypy.readthedocs.io/en/stable/index.html)
   + [Daily Dose of Data Science](https://blog.dailydoseofds.com/p/10-ways-to-declare-type-hints-in)
   + [Exploring the Power of Python's `typing` Library](https://blog.dailydoseofds.com/p/10-ways-to-declare-type-hints-in)

## Code formatting
### Flake8
+ `flake8` helps you with formatting your code according to PEP standards
+ it checks the code of a specified Python file and provides feedback if the code violates PEP standards
+ if your code violates specific formatting standards, the errors are related to specific *error codes*
+ an overview of the *error codes* can be found among others here: [https://www.flake8rules.com/](https://www.flake8rules.com/)

#### Installation
+ Installation: `pip install flake8`
+ after having installed `flake8` we can check the code of a specific Python file
    + to do this, open the terminal (e.g., GitBash) and type `flake8 <path-to-file>`
    + try for example in the Spyder console: `!flake8 py_scripts\plotting.py` (using the file created during the `intro-spyder session`)
+ for checking the code in a jupyter notebook cell, we use an extension `pycodestyle_magic`
    + install the extension via `pip install pycodestyle pycodestyle_magic`
    + load the extension via writing `%load_ext pycodestyle_magic` in a cell
    + use the extension by including `%%pycodestyle` into the cell in which we want to check the code

In [2]:
# load flake8 extension for notebooks
%load_ext pycodestyle_magic

#### Options with `flake8`
Consider in the following that the Python file we want to check is called `example.py`
+ ignore specific error codes: `flake8 --ignore=E303,E302 example.py`
+ change default maximum limit of characters for a line (=79): `flake8 --max-line-length=100 example.py` 
+ exclude certain files or directories from being checked: `flake8 --exclude=tests/* example.py`
+ report only specific errors: `flake8 --select=E303 example.py`

Instead of typing your custom configuration always as part of the command, you can configure all the settings in a **configuration file**
+ create a file in your main directory with the name `.flake8`
    + GitBash Console `$ touch .flake8` (creates a new file in the working directory)
+ open `.flake8` in Spyder and copy&paste the following lines of code

This configuration file 
+ sets the maximum line length as 120 characters,
+ ignores errors E226, E302, and E41, and
+ excludes the `tests` directory and `__init__.py` files from the check.

#### Example

In [None]:
%%pycodestyle
def someFunction():
    """This is a docstring with a longer description of the function's behavior including $\LaTeX$"""
        I  =0
        while I<5:
            if ( x<= 10) and (x > =5):
              print("x between 5 and 10")
            else:
                
                print("x is sth. else")
    

Formatted code that adheres to PEP standards:

Note, the comment `# noqa` behind a code line tells flake8 to ignore this line while checking
If you want to ignore only a specific error code for this line, you can simply specify the corresponding code as follows `# noqa: E501`. This would check the line regarding all potential errors except error with code E501

In [None]:
%%pycodestyle
def someFunction():
    """This is a docstring with a longer description of the function's behavior including $\LaTeX$"""  # noqa: E501
    i = 0
    while i < 5:
        if (x <= 10) and (x >= 5):
            print("x between 5 and 10")
        else:
            print("x is sth. else")

#### Use flake8 outside a Jupyter notebook
+ Let's try out to use `flake8` for a .py file
+ open a new file in Spyder and save it as `.../python-class-25/py_scripts/test_formatters.py` 
+ copy & paste the following code in your Python file and save the py-file

In [None]:
#%% test_formatters.py

import os

# get current working directory
os.getcwd()

# change current directory
# os.setwd("path-to-folder")

x = 4

def someFunction():
    """This is a docstring with a longer description of the function's behavior including $\LaTeX$"""
    I  =0
    while I<5:
        if ( x<= 10) and (x >=5):
            print("x between 5 and 10")
        else:
            
            print("x is sth. else")

+ open GitBash Terminal pointing to the `python-class-25` directory (e.g., via GitHub Desktop > Repository > Open in GitBash)
+ activate the created conda environment
    + `$ conda activate python-course-25-env`
+ if your forgot how you called your environment, you can list all created conda environments:
    + `$ conda env list`
+ check that `flake8` is installed in your environment (if not, install it)
    + `$ conda list flake8` (should give information about version etc.) 
+  run flake8 for the file `test_formatters.py`
    + `$ flake8 py_scripts/test_formatters.py`
+ you should see something like the following: 

+ alternatively run flake8 directly in the Spyder Console
    + `ln [1]: !flake8 py_scripts/test_formatters.py`

### `Black` for automatic formatting
+ Now you can go step-by-step over the complaints made by `flake8`. 
+ However, can we not automate this step to help us formatting our code according to PEP standards?
+ Yes this is possible, one tool that allows for automatic formatting is for example [Black](https://black.readthedocs.io/en/stable/index.html)
+ let's install `black` via GitBash
    + `$ pip install black`
+ check installation in your environment
    + `$ conda list black`
+ apply black to `test_formatter.py`
    + `$ black py_scripts/test_formatter.py`
+ you should get in GitBash something like the following:

+ check the python script in your editor (do you notice the changes?)
+ let's check the file again via `flake8`
    + `$ flake8 py_scripts/test_formatter.py`
    + Tip: When you are in your terminal (e.g. GitBash), use arrow up and down to "scroll" to your command history (this reduces the effort of re-typing over and over the same command)
+ you should get as output in your terminal sth. as follows:

+ thus, we notice that `black` has indeed reformatted a lot of aspects in our code
+ we get clearly less error messages from `flake8`
+ some errors remain that we have to adjust "by hand":
    + the maximum line length
    + using `r` in front of our docstring if a backslash is included `r"""docstring with \"""` 
    + (ambiguous) variable names


# Docstrings
+ string literal that occurs as the first statement in a module, function, class, or method definition.
+ a docstring becomes the `__doc__` special attribute of the corresponding object
+  use """triple double quotes""" around docstrings
+  and use r"""raw triple double quotes""" if you use any *backslashes* in your docstrings (i.e., put an `r` in front of the docstring).

In [34]:
def test_doc():
    """This is the form of a docstring.
    
    It can be spread over several lines.

    """

# see docstring when using help()


# see docstring via the internal .__doc__ method


# example of docstring with backslashes
def test_doc2():
    r""" This docstring includes Latex: $y_i \sim Normal(\mu, \sigma)$ """

# see docstring via the internal .__doc__ method



' This docstring includes Latex: $y_i \\sim Normal(\\mu, \\sigma)$ '

### Standards for docstrings
+ some standards regarding docstrings exist, which make them easier to read
+ a very general standard is [PEP 257](https://peps.python.org/pep-0257/)
+ a more specific standard is [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html) docstring
+ in the following we will see some example docstrings following the `numpydoc`style

### `numpydoc` style
The following information are taken from [https://numpydoc.readthedocs.io/en/latest/format.html](https://numpydoc.readthedocs.io/en/latest/format.html).
+ length of docstring lines should be kept to 75 characters to facilitate reading the docstrings in text terminals.

#### Sections in a function docstring
+  **short summary**: one-line summary
+  **deprecation warning**: warn users that the object is deprecated. Should include information about
    + when object was deprecated (in which version)
    + reasons for deprecation
    + new recommended way of obtaining the same functionality
+ **extended summary**: few sentences giving an extended description
+ **parameters**: description of the function arguments, keywords and their respective types
    +  colon must be preceded by a space, or omitted if the type is absent
    +  optional keyword parameters have default values, which are displayed
    +  parameters that only assume one of a fixed set of values: values can be listed in braces, with the default appearing first
    +  two or more input parameters with exactly the same type, shape and description can be combined
    +  documenting variable length positional, or keyword arguments, leave the leading star(s) in front of the name and do not specify a type
+ **returns**: explanation of the returned values and their types
+ **raises**: optional section detailing which errors get raised and under what conditions
+ **notes**: optional section that provides additional information about the code, possibly including a discussion of the algorithm.
+ **references**: References cited in the Notes section may be listed here, e.g. if you cited the article below using the text [1]_
+ **examples**: optional section for examples meant to illustrate usage of the function
+ Further sections are **yields, receives, warns, warnings, see also**

Examples:
+ [numpy.outer()](https://github.com/numpy/numpy/blob/v2.2.0/numpy/_core/numeric.py#L876-L961)
+ [numpy.eigh()](https://github.com/numpy/numpy/blob/v2.2.0/numpy/linalg/_linalg.py#L1485-L1630)

In [None]:
def example_doc(y, x_1=-1, x_2=-1, order="C", *args, **kwargs):
    """The sum of two numbers.

    .. deprecated:: 1.6.0
          `ndobj_old` will be removed in NumPy 2.0.0, it is replaced by
          `ndobj_new` because the latter works also with array subclasses.

    Parameters
    ----------
    x_1, x_2 : type
        Description of parameters `x_1`, `x_2`. (the default is -1, which implies summation
        over all axes).
    y
        Description of parameter `y` (with type not specified).
    order : {'C', 'F', 'A'}
        Description of `order`.
    *args
        Additional arguments should be passed as keyword arguments
    **kwargs
        Extra arguments to `metric`: refer to each metric documentation for a
        list of all possible arguments.

    Returns
    -------
    int
        Description of anonymous integer return value.

    Raises
    ------
    LinAlgException
        If the matrix is not numerically invertible.

    Notes
    -----
    The FFT is a fast implementation of the discrete Fourier transform [1]_:
    
    .. math:: X(e^{j\omega } ) = x(n)e^{ - j\omega n}

    .. [1] O. McNoleg, "The integration of GIS, remote sensing,
       expert systems and adaptive co-kriging for environmental habitat
       modelling of the Highland Haggis using object-oriented, fuzzy-logic
       and neural-network techniques," Computers & Geosciences, vol. 22,
       pp. 585-588, 1996.

    Examples
    --------
    >>> np.add(1, 2)
    3
    
    Comment explaining the second example.
    
    >>> np.add([[1, 2], [3, 4]],
    ...        [[5, 6], [7, 8]])
    array([[ 6,  8],
           [10, 12]])
    """

#### The Class docstring
+ same sections as outlined for function docstrings (except Returns)
+ special sections for classes:
    + **attributes**: located below the Parameters section, may be used to describe non-method attributes of the class
    + **methods**: optional, allows to list all class methods. Generally, not necessary. Situation where it is useful: class has a lot of methods but only a few are relevant for the user. Methods that are not part of the public API have names that start with an underscore.

Example:
+ [pandas class Grouper](https://github.com/pandas-dev/pandas/blob/main/pandas/core/groupby/grouper.py)
+ [pandas class ExtensionArray](https://github.com/pandas-dev/pandas/blob/main/pandas/core/arrays/base.py#L786)

## Commenting Code via Type Hinting
### Why should we use *typing*?

+ improved code **readability and maintainability**
    + makes code more self-documenting
    + better tracability of input-output structure in computational workflow 
+ **reduced errors**
    + type checkers (such as `mypy`) can catch type errors early
+ **easier onboarding** of new developers
    + new developers can faster understand the codebase

### Tool for type checking: `mypy` and `nb_mypy`

+ `mypy` is an optional static type checker for Python.
+ to use `mypy` for Jupyter notebooks we need the extension `nb_mypy` which is a facility to automatically run `mypy` on Jupyter notebook cells as they are executed, whilst retaining information about the execution history.

#### Install `mypy`and `nb_mypy`

+ open GitBash Terminal and activate conda environment
+ install the `nb_mypy` and `mypy`
    + `$ pip install mypy`
    + `$ pip install nb_mypy`
+ check installation in your environment
    + `$ conda list mypy` (both, mypy and the extension nb-mypy should be listed)
+ back to the jupyter notebook, restart kernel (either by pressing two times `0` on your keyboard or using the button with the circle/arrow in the upper toolbar)
+ load extension in Jupyter notebook:

In [None]:
# load mypy extension in jupyter notebook
%load_ext nb_mypy

### Basic data types
It is possible to provide type hints for basic data types.

In [37]:
# basic type hints
name:  = "Anna"
year:  = 1990
countries:  = ["Germany", "Peru", "Italy"]
password:  = {"user": 123}
shape:  = (2, 3)


using the `typing` module for annotation provides us with more flexibility

In [None]:
from typing import List, Tuple, Dict

countries:   = ["Germany","Peru","Italy"]
password:   = {"user": 123}
double_password:   = {"user": [123, 234], "user2": [123, 234], "user3": [123, 234]}
shape:   = (2,3)

# when you want declare iterables (i.e., tuples, lists, dicts) of various length
shape2:   = (2,3,4) # mpypy raises an error
shape3:   = (2,3,4) # works
# let us redefine shape3 as a singleton
shape3 = (2,)


### Function annotations
In the function below the arguments, `x`, `y`, as well as the return type are expected to be integers.

In [None]:
def addition(x:  , y:  ) ->  :
    return x+y

# working example


# mpyp raises a conflict:


### Unions (`Union[T,U]`)
The union type `Union[T,U]` means either type `T` or `U`.

In [None]:
from typing import Union

# from Python 3.9 onwards:
# when the output can be of different type
def multiply(x:  , y:  ) ->  :
    return x*y

# alternative using Union
def multiply2(x:  , y:  ) ->  :
    return x*y

# integer as input


# mix as input


# invalid input -> mypy raises an error



additional properties of `Union`
+ unions of unions are flattened
+ redundant arguments are skipped

In [42]:
# unions of unions are flattened
 

# redundant arguments are skipped



### Type aliases

A type alias is defined by assigning the type to the alias.
In the following example, `Union[int, float]` and `IntFloat` will be treated as synonyms.

In [44]:
 

def multiply3(x:  , y:  ) ->  :
    return x*y

### Callables (`Callable`)
We can also indicate that the type is a function or class object via the `Callable` type.
The type hint is used as follows: `Callable[[Arg1Type, Arg2Type], ReturnType]`.

In [None]:
from typing import Callable
# write your own function
def div(x:  , y:  ) ->  :
    return x/y

# let's create a type alias
 

# write a custom function that takes two values and
# returns the result when applying our custom method
def math_op(a:  , b:  , div_method:  ) ->  :
    return div_method(a,b)

# works


# mypy raises an error



### Arbitrary types (`Any`)
+ [Source: Any](https://mypy.readthedocs.io/en/stable/kinds_of_types.html)
+ Any is compatible with every other type, and vice versa. 
+ You can freely assign a value of type Any to a variable with a more precise type:

In [54]:
from typing import Any
# when the type of the variable can be of any kind
def misc(a:  ) ->  :
    return a
    
# string
type(misc("Hello"))
# float
type(misc(1.2))
# list
type(misc([1,"A",sum]))

list

### Typed Dictionaries (`TypedDict`)
+ [Source: TypedDict](https://mypy.readthedocs.io/en/stable/typed_dict.html)
+ `TypedDict` lets you give precise types for dictionaries that represent objects with a fixed schema

In [56]:
# use Any for indicating different possible dict values
def user_input(seed:  , weight:  , method:  ) ->  :
    res_dict = dict(
        seed = float(seed),     # assume some internal process changes the type
        weight = int(weight),   # for seed and weight
        method = method
    )
    return res_dict

# changes in type are not recognized by Any
 


{'seed': 10.0, 'weight': 1, 'method': 'parametric'}

In [None]:
from typing import TypedDict

# create a TypedDict that represents the structure



def user_input2(seed:  , weight:  , method:  ) ->  :
    res_dict: UserInputType = dict(
        seed = float(seed),     # assume some internal process changes the type
        weight = int(weight),   # for seed and weight
        method = method
    )
    return res_dict

# changes in type are recognized and mypy raises an error



### Generics (`TypeVar`)
Serve as the parameters for generic types as well as for generic function definitions.
Let's see how generic functions work:


In [96]:
from typing import TypeVar, Sequence

T = TypeVar('T')  # Can be anything
S = TypeVar('S', bound=str)  # Can be any subtype of str
A = TypeVar('A', float, int)  # Must be exactly float or int

def repeat(x: T, n: int) -> Sequence[T]:
    """Return a list containing n references to x."""
    return [x]*n

print(repeat("Hello World!", 3))

def print_capitalized(x: S) -> S:
    """Print x capitalized, and return x."""
    print(x.capitalize())
    return x

print_capitalized("this is a capitalized sentence.")

def add_a(x: A, y: A) -> A:
    """Add two integers or floats together."""
    return x+y

print(add_a(1,2))

# raises a conflict as only one constraint can be true at a time
print(add_a(1,2.))

['Hello World!', 'Hello World!', 'Hello World!']
This is a capitalized sentence.
3
3.0


### Optional (`Optional`)
This concept is useful when an explicit value of `None` is allowed. Note that this argument does not require to be an *optional* argument.


In [None]:
from typing import Optional

def greeting(names:  , greet:   = None) ->  :
    # greeting is "Hello"
    if greet is None:
        greet = "Hello"
    # custom greeting
    else:
       greet = greet
    # print greeting
    for i in range(len(names)):
        print(greet+" "+names[i])

# working example


# raises an error



### `Final`
Special typing construct to indicate to type checkers that a name cannot be re-assigned or overwritten in a subclass

In [None]:
# working with constants
from typing import Final

 

def area_circle(radius:  ) ->  :
    return PI*radius**2

area_circle(2)

# try to change PI
 

### `Literal`
Can be used to indicate that the corresponding variable/function parameter has a value equivalent to the provided literal.


In [None]:
# working with fixed sets
# Grades in school in Germany: 1,2,3,4,5,6

from typing import List, Literal

def get_gpa(grade_ger:  ) ->  :
    return np.mean(grade_ger, dtype=float)

# hand over a list of numbers within the range
 

# what happens if we put a number outside the range?

