# Numeric Python

## introduction

Python was initially designed in the late 80s to be easy and intuitive, suitable for everyday tasks (quoting former BDFL Guido van Rossum).
It then become very popular for text processing, web development, and as a fast scripting glue language.

However, due to its interpreted nature, it was not adequate for computationnally intensive tasks.

In the early 2000s, a library *Numeric* quickly renamed as *numpy* implemented wrappers for low level linear algebra operations BLAS (>1979!) and LAPACK (>1990). It uses Python object system to make it transparent to the user.
This made it efficient to perform vectorized operations and provided the foundation of a matlab-like environment, which was later complemented by many scientific
routines, packaged as the scipy library.

Recently, Python introduced special operator `@` aimed to facilitate the development of scientific code.




## numpy

Numpy implements an array object, called ndarray. It is a container for homogenous data.

(remark: there used to be a distinction between 2d arrays and matrices, with different default multiplication operator)


### array definition

In [8]:
import numpy as np

In [None]:
# create an empty array

In [None]:
# create an array full of zeros

In [9]:
# modify values 

In [None]:
# create an array from a python list

In [None]:
# from iterators

In [None]:
# from a list of lists

In [None]:
# create a range vector

In [40]:
# create a linearly spaced vector

In [49]:
# create diagonal/identity matrices

In [50]:
# create random matrices

### basic operations

In [47]:
# transpose

In [44]:
# addition, substraction

In [45]:
# element-wise multiplication (default)

In [None]:
# matrix multiplication

In [46]:
# tensor reduction

### indexing

In [None]:
# extract one element

In [None]:
# extract a slice

In [52]:
# multidimensional slicing

In [85]:
# boolean indexing

In [101]:
# ellipsis (inplace operations)

### casting and broadcasting

Operations are defined for arrays that have the exact same size. 
However, when arrays have the same number of dimensions, a dimension of length of length $1$ is implicitly casted to a dimension of length $N$. When an array is missing dimensions, dimension of length are added at the beginning

In [97]:
# examples

a = np.array([[1,2,3],[4,5,6]]) # 2x2 matrix
v = np.array([0.1,0.2,0.3]).reshape((1,3))
a + v

array([[1.1, 2.2, 3.3],
       [4.1, 5.2, 6.3]])

In [None]:
# this works too
a = np.array([[1,2,3],[4,5,6]]) # 2x2 matrix
v = np.array([0.1,0.2,0.3])
a + v

In [98]:
# this doesn't
a = np.array([[1,2,3],[4,5,6]]) # 2x2 matrix
v = np.array([0.1,0.2])
a + v

ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

New empty dimensions can be added by indexing by None. I recommend to always do it.

In [100]:
a = np.array([[1,2,3],[4,5,6]]) # 2x2 matrix
v = np.array([0.1,0.2])
a+v[:,None]

array([[1.1, 2.1, 3.1],
       [4.2, 5.2, 6.2]])

### data types

While matlab is *double* centric, numpy creates *float64* arrays by default, but can handle any other precision just as well.

In [24]:
# float values (8 bits, 16, ...)

In [25]:
# int values

In [26]:
# str values (str length)

In [27]:
# object (don't do that)

### (advanced) data ordering

There are two main ways to store multidimensional data:

![ordering](column_order.svg)

Default ordering of data with numpy is row-major (aka as last index varies first).


In [78]:
a = np.arange(6).reshape((2,3))
print(a)
a.flags

[[0 1 2]
 [3 4 5]]


  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

Numpy can also deal with arrays in Fortran order.

In [80]:
a = np.arange(6).reshape((2,3), order='F')
print(a)
print()
print(a.flags)

[[0 2 4]
 [1 3 5]]

  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False


Usually arrays in non-`C` order are obtained as the result of an operation, like transposition, or slicing.

In [64]:
a = np.array([
        [1,2,3],
        [4,5,6]
    ])
print(a.T)
print(a.T.flags)


[[1 4]
 [2 5]
 [3 6]]
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False


In [65]:
b = np.random.random((3,3,3))
b.swapaxes(2,1).flags

  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [70]:
c = np.random.random((5,5))
c[2:4,2:4]
c[2:4,2:4].flags

  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

The result of most operations doesn't depend on underlying ordering, but it is an important factor performance-wise: it is faster to access contiguous data.

In all cases, it is possible to get a new contiguous array, with the copy() method.
Not that in this case, the data is not shared anymore with the initial array.

In [71]:
c[2:4,2:4].copy().flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

### Differences between numpy's arrays and matlab:

- numpy has 0, 1, 2, 3, ... dimensional arrays while matlab treats scalars and vectors as degenerate 2d matrices
- default data order is row-major (like C), while matlab is column-major (like Fortran)
- indexing is zero-based conistent with python's conventions
- all dimensions are treated symmetrically for instance `a[:,:,0]` and `a[:,0,:]` are both vectors (in matlab, the last dimension is not suppressed without calling squeeze)
- many operations in numpy return views instead of copies
- broadcasting is very powerful and consistent


### Other subpackages

The basic object of numpy is the nd-array. It contains many other submodules.

In [104]:
# math functions work are optimized for vectorized evaluation
# sin, cos, log, abs, ...



In [105]:
# linalg: linear algebra

In [106]:
# polynomials


In [107]:
# ...

## scipy

Scipy's library leverages numpy to provide many other functions:
- optimization
- integration
- interpolation
- sparse matrices
- more linear algebra

## matplotlib

### matlab-like interface

### other plotting options

- seaborn (customized matplotlib to plot distributions)
- ipympl (interactive matplotlib)
- bqplot (full jupyterlab integration with matplotlib and gg interface)
- plotly (interactive, share plots online)

### grammar of graphics approach

- ggplot
- altair (based on vega-lite)


## performance optimization (if time)

Numpy encourages you to write using vectorized computations. This is because interpreted code is slow (one avoids loops). However, it is also possible to compile python code using numba which limits the interpretation overhead and memory footprint.