<div class="licence">
<span>Licence CC BY-NC-ND</span>
<span>Valérie Roy</span>
<span><img src="media/ensmp-25-alpha.png" /></span>
</div>

# **numpy**

#### references
   - https://docs.scipy.org/doc/numpy/reference/
   - http://scipy-lectures.org/intro/index.html
   - https://www.labri.fr/perso/nrougier/from-python-to-numpy/

#### this course is inspired by:
   - Arnaud Legout, Inria
   - Thierry Parmentelat, Inria
   - the numpy documentation
   - stackoverflow
   - ...

#### a quick story of **numpy**
   - Numpy 1.0 in 2006
   - Reused old projects Numeric and Numarray
   - No single coordinator, no single aim
   - Guido refused to add Numeric in the Python
      standard library
      +  Guido deemed the code not maintainable
   - Travis Oliphant merged Numeric and Numarray to create numpy

Many $\texttt{numpy}$ operations are programmed in C, making it faster.

# importing the $\texttt{numpy}$ **library** 

   - $\texttt{numpy}$ is **by convention** and **for short** named **np** 

In [62]:
import numpy as np

# Creating array in **numpy**  $\texttt{numpy.ndarray}$ 

## internal memory layout : array are homogenous

$\texttt{numpy.ndarray}$
   - it is the **core datatype** of $\texttt{numpy}$

a $\texttt{numpy.ndarray}$:
   - it is a **contiguous one-dimensional segment of computer memory**
   - we **combined it** with **indexing schemes** for **multi-dimensions**
   - each **indexing scheme** is a **view** on the **underlying segment**

#### elements
  
   - **all** the elements have the **same data type** (**unlike** python's containers)   
   - elements are **directely** accessed

#### why ? 
   - for the sake of **speed**
   - it is **much faster** to access memory along a **single segment** (each **case** has the same size)
   - than to access **small portions** of memory (the cases) **everywhere** in the memory of the computer (cf. NTFS vs ext4 fragmentation)
   - (as we would be in the case of python lists !)
   - but python lists have other **advantages** 

#### in consequence
   - an $\texttt{numpy.ndarray}$ is very **compact**
   - the methods we can apply are **optimized**

#### you can create $\texttt{numpy.ndarray}$
   - from **existing** arrays (e.g. $\texttt{python}$ arrays)
   - by **constructing** directly $\texttt{numpy.ndarray}$ **objects**
   - by **reading files** with data such as **csv** files

## 2) creating $\texttt{numpy.ndarray}$ from existing $\texttt{python}$ arrays

In [None]:
tab = [1, 2, 3, 4, 5, 6, 7]
np.array(tab)        # from a list

# or

np.array([1, 2, 3, 4, 5, 6, 7])  # from a list
np.array((1, 2, 3, 4, 5, 6, 7))  # from a tuple

In [None]:
type(np.array ([1, 2, 3, 4, 5, 6, 7])) # we create a numpy.ndarray

   - you can **create** an $\texttt{numpy.ndarray}$ from a **python** generator

In [None]:
p = (i for i in range(20) if i%2 == 0) # p is a generator of the even numbers from 0 to 18

In [None]:
p

In [None]:
a = np.fromiter(p, dtype = np.int8)
a

## 3) deducing the type of the elements

   - **without** any indication $\texttt{numpy}$ **decides** on **its own**

### a) initialising with an homogeneous array

#### an **integer** array

In [None]:
np.array([0, 1, 2, 3, 4, 5, 6, 7]).dtype
# elements are all integers
# the array will be an integer-typed array

   - here we **get** integers on $64$ bits

#### a **float** array

In [None]:
np.array([0.72, -1.45, 2.29]).dtype
# elements are all floats
# the array will be a floating-point typed array

#### a **Boolean** array

In [None]:
np.array([True, False]).dtype
# elements are all booleans (scalar type)
# the array will be a boolean typed array

#### a **character string** array

In [None]:
np.array(['héllo', 'world!']).dtype

   - the element must have the same **type**
   - the **longuest** string has $6$ characters
   - **all** the elements will be **strings with $6$ characters**

#### the type is '<U6'
   - **U** for Unicode
   - $6$ is the number of octets to hold the longuest string (here 'world!')
   - (< is for little endian, i.e. the order in which the octets are stored in memory)

## exercise$^{(*)}$


(\*) *the exercices must be completed on the students' free time*

### b) initialisation with an heterogeneous array

   - for example, we mix **integers** and **float**

In [None]:
[0, 1., 2, 3, 4, 5, 6, 7.]

   - containers in python are **heterogeneous**
   - **no** conversion is done

   - but in $\texttt{numpy}$ the **array type** is homogeneous
   - **conversions** shall be **done**

In [None]:
np.array([0, 1., 2, 3, 4, 5, 6, 7.]).dtype

   - $0$, $1$, ..., $6$ are **converted** in **floats**

#### the data type
   - is **deduced** as the ***smaller*** **type** that can **hold** all the elements
   - $\texttt{numpy}$ tries **not** to **lose** information
   - the elements are converted **automatically** and **silently**
   - it is **very different** from the $\texttt{Python}$ **philosophy**

   - **False** is converted to  $0$ of the **integers**

In [None]:
np.array([False, 1, 2, 3, 4, 5, 6, 7]).dtype

   - **True** is converted to $1.$ of the **floats**

In [None]:
np.array([False, 1., 2, 3, 4, 5, 6, 7]).dtype

   - mixing **numbers** and **strings**

In [None]:
np.array([0, 1, 2, 3, 4, 5, 6, '7'])

   - mixing **strings** and **numbers**

In [None]:
np.array(['0', 1, 2, 3, 4, 5, 6, 7])

in such cases ask **stackoverflow** or do not try to understand ...

### d) modifying an element in an $\text{numpy.ndarray}$ can result in a loss of precision

In [None]:
a = np.array([0, 1, 2, 3, 4, 5, 6, 7])
a.dtype

In [None]:
a[0] = 3.14159
a[0]

   - you will **not** get a **float**
   - it will be **converted** to **integer**
   - you have lost **precision**

   - you can **forbid** to modify the **elements** of an array $\texttt{numpy.ndarray.flags.writeable}$
   - i.e. the array became **immutable** (an array with constant elements)

In [None]:
a = np.random.randint(1, 20, 10)

In [None]:
a[0] = 99

In [None]:
a.flags.writeable = False

In [None]:
try:
    a[0] = 100
except ValueError as e:
    print(e)

## 4) giving the type of the elements using $\texttt{dtype}$

$\texttt{numpy}$ types are listed there: https://docs.scipy.org/doc/numpy/user/basics.types.html

In [None]:
a = np.array([0, 1, 2, 3, 4, 5, 6, 7])
a.dtype

###### computer memory
   - a **chunk** of memory of $8$ **bits** forms a **byte** or an **octet**
   - sometime, we speak of **types** as multiples of one **byte**
   - $4$ bytes is $32$ bits, $8$ is $64$ bits

   - you can **access** the number of **bytes** an **element** is **stored on**
   - one **byte** is $8$ bits ($64 = 8 \times 8$)

In [None]:
a.itemsize # 8 bytes (of 8 bits)

#### how many integers can I store on a chunk of memory of $8$ bits ?
   - $2^{8}$
   - from $0$ to $255$ if **unsigned** ($0$ to $2^n-1$)
   - from $-128$ to $127$ if **signed** ($-2^{n-1}$ to $2^{n-1} -1$)

In [None]:
a = np.array([0, 1, 2, 3, 4, 5, 6], dtype=np.int8)
a

In [None]:
a.itemsize # one byte

   - the size in **bytes** of the **array**

In [None]:
a.nbytes # the number of bytes

   - here we **only** calculated the size of the data **buffer underlying** the $\texttt{ndarray}$
   - the $\texttt{ndarray}$ has also some **overhead** of memory to store other attributes (meta data) 

#### conversion from 64-bits integers to 32-bits or 16-bits integers
   - is not **safe**
   - i.e. values can be **truncated**

#### you must master what you do
   - because $\texttt{numpy}$ will obey your wishes

#### integers can be truncated

In [None]:
nmax = 2**64 - 1  # greater unsigned 64-bits integer

In [None]:
np.array([nmax])

In [None]:
np.array([nmax], np.uint32)  # nmax is converted to the greatest 32-bits unsigned integer

In [None]:
2**32-1

#### floats will be truncated too

In [None]:
np.array([1.22, 2.34, 3.57, 4.99], dtype=np.int32)  # you will not obtain floats !

your floats have been **truncated** without hesitation !

#### characters strings

In [None]:
np.array(['0', '1', '2', '3', '4', '5', '6', '7'], dtype='<U4') # your strings will 4 bytes long rather than 1

In [None]:
np.array(['0', '1', '2', '3', '4', '5', '6', '7'], dtype='int16') # you will obtain integers on 16 bits

In [None]:
np.array(['0', '1', True], dtype=np.int16) # you will obtain integers on 16 bits

#### if you do non-sense, errors will be raised

In [None]:
np.array(['0', '1', 'True'], dtype=np.int16) # now you will obtain nothing

In [None]:
# in order to avoid interrupting the execution: we can catch the error 
try:
    np.array(['0', '1', 'True'], dtype=np.int16)
except ValueError as e:
    print("we are here")
    print(e)    

### conversion from one type to another with the method $\texttt{numpy.ndarray.astype}$

In [None]:
d = np.array([1, 2, 3, 4, 5])
d.dtype

In [None]:
d.astype(np.int16)  # cast to another type

In [None]:
d.dtype # the original array has not changed

#### the method returns a new array
   - it does **not modify** the **original** one
   - you have to **assign** the new array to some variable

In [None]:
d = np.array([1, 2, 3, 4, 5])
d = d.astype(np.int32)

#### the conversion is not **safe**: values can be truncated

In [None]:
np.can_cast(np.int32, np.int64)  # ok to convert integers from 32-bits to 64-bits 

In [None]:
np.can_cast(np.int64, np.int32)  # it is not safe to convert from 64-bits to 32-bits

#### you can ask the method to **refuse** an **unsafe** conversion
   - regardless to any modification of values

#### it is **not safe** to cast from an integer 64-bits to an integer 32-bits

In [None]:
try:
    d = np.array([1, 2, 3, 4, 5], dtype='int64')
    d.astype(np.int32, casting='safe')
except TypeError as e:
    print(e)

#### it is **safe** to cast from a 16-bits integer to a 32-bits integer

In [None]:
d = np.array([1, 2, 3, 4, 5], dtype='uint16')
d.astype(np.int32, casting='safe')

### predefined-types for $\texttt{numpy-ndarray}$

https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html

In [None]:
np.sctypes # u for unsigned

### **min** and **max** values of **numpy** **types**

   - integer info

In [None]:
np.iinfo(np.int8).min

In [None]:
np.iinfo(np.int64).max

   - float info

In [None]:
np.finfo(np.float32).min

In [None]:
np.finfo(np.float64).max

### size of arrays

| method                            | what they do                           	  |
|-----------------------------------|---------------------------------------------|
| $\texttt{numpy.size}$             | total number of elements in the array       |
| $\texttt{numpy.ndarray.itemsize}$  | size in bytes of a single item              |
| $\texttt{numpy.ndarray.nbytes}$  	| total size in bytes of the underlying array |
| $\texttt{numpy.shape}$  	        | **dimentional shape** of the array            |
| $\texttt{numpy.ndim}$  	        | **dimentional shape** of the array            |

In [None]:
d = np.array([1, 2, 3, 10, 20, 30])

In [None]:
print(f'array\n {d}\n') # formating string using f and {}

In [None]:
print(f'd.size is {d.size} (number of elements)\n')

In [None]:
print(f'd.itemsize is {d.itemsize} (number of bytes of a single element)\n')

In [None]:
print(f'd.nbytes is {d.nbytes} (number of bytes of the elements in the array)\n')

In [None]:
print(f'shape {d.shape} (dimentional structure of the array)\n')

In [None]:
print(f'shape {d.ndim} (dimentional structure of the array)\n')

In [None]:
print(f'shape {d.ndim} (number of dimensions of the array\n')

## 4) $\texttt{na}$, $\texttt{NaN}$, ...
   - Not Avalaible
   - Not a Number

   - $\texttt{numpy.NaN}$ is a **float**   
   - there is **no** equivalent for **integers**

In [None]:
type(np.nan)

  - **NaN** can be tested:

In [None]:
np.isnan([np.log(-1.), 1., np.log(0)])    # np.log(-1) is an invalid value i.e. it will be a NaN

In [None]:
np.log(0) == -np.inf

   - $\texttt{numpy.log(0)}$ is $-\infty$ and not **NaN**
   - see $\texttt{numpy}$ **constants** https://www.numpy.org/devdocs/reference/constants.html