# Introduction to `numpy`

This Notebook provides an overview of the capabilities of the `numpy` module. It covers Sect. II of [Modules_in__python.ipynb](Modules_in__python.ipynb). 

## Table of Content

- [II. Numpy](#II)
    * [II.1 Array Definition and construction](#II.1)
    * [II.2 Array copies and views](#II.2)
    * [II.3 Shape manipulation](#II.3)
    * [II.4 What makes numpy Arrays useful structures ?](#II.4)
        - [II.4.1 ufunc](#II.4.1)
        - [II.4.2 Aggregation](#II.4.2)
        - [II.4.3 Broadcasting](II.4.3)
        - [II.4.4 Slicing, masking, fancy indexing](#II.4.4)
    * [II.5 Reading arrays from a file and string formatting](#II.5)
    * [II.6 Summary](#II.6)
    * [II.7 References](#VI)

## II. `numpy`:  <a class="anchor" id="II"></a>

`numpy` can be seen as the implementation of mathematical functions and operations for python language. It also introduces one key object `arrays`. 

### II.1 `array` definition and construction:  <a class="anchor" id="II.1"></a>

- A `numpy` array is an object of the type `np.ndarray` (although this type specifier is rarely used directly). Instead one can create arrays in several ways: 

``` python
import numpy as np
np.array([1,2,3,4])   # creates an array from a python list
np.array([[0, 1, 2], [3, 4, 5]])   # Creates a 2D array from a python list
np.empty(shape=(2,3)) # Creates an "empty" (entry not initialised) array with 2 rows and 3 columns 
np.arange(5) # similar to the built-in range() function.
np.linspace(1, 10, 10) # creates an array of 10 elements from 1 to 10
np.zeros(10)  # creates an array of 10 elements filled with 0
np.ones(10)   # creates an array of 5 elements filled with 1
np.zeros((2, 5))  # mulitdimensional arrays of 2 rows and 5 columns

```
- 2-D arrays of `shape=(r, c)` are arrays with `r` *rows* and `c` *columns*. 

In [1]:
# Let's try the above commands and visualise the output. 
import numpy as np
a = np.array([[1,2,3], [3,5,5]])
a

array([[1, 2, 3],
       [3, 5, 5]])

In [2]:
np.shape(a)

(2, 3)

In [3]:
empty_array = np.empty(shape=(2,3))
empty_array

array([[1.13000982e-42, 7.87665127e-71, 5.39936977e-62],
       [5.05254058e-38, 1.15389091e-71, 7.40696866e-38]])

In [4]:
zero_array = np.zeros(shape=(2,3))
zero_array

array([[0., 0., 0.],
       [0., 0., 0.]])

In [5]:
ones_array = np.ones(shape=(2,2,3))
ones_array  # [1,1,0]

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

In [6]:
type(zero_array)

numpy.ndarray

In [7]:
zero_array.dtype

dtype('float64')

In [8]:
array_of_string = np.array(['qqqq', 'a', 'f'], dtype=str)
array_of_string

array(['qqqq', 'a', 'f'], dtype='<U4')

In [9]:
for i in range(5):
    print(i)

0
1
2
3
4


In [10]:
np.arange(0., 5., 0.5)

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [11]:
np.linspace(0, 5, 9)

array([0.   , 0.625, 1.25 , 1.875, 2.5  , 3.125, 3.75 , 4.375, 5.   ])

- numpy has also tools to create arrays filled with random elements:

``` python
np.random.random(size=4)  # uniform between 0 and 1
np.random.normal(size=4)  # elements are std-normal distributed

```

In [12]:
# Try out the above commands
np.random.random(size=4)

array([0.56038104, 0.52118484, 0.66049889, 0.89974892])

In [13]:
np.random.normal(size=4)

array([-1.76978657,  1.50850182,  0.0306952 , -0.92057114])

- You can explicitly specify which **data-type** you want:

``` python 
c = np.array([1, 2, 3], dtype=float)
c.dtype
    Out: dtype('float64')
```

In [14]:
# Try out the above commands 
c = np.array([1, 2, 3], dtype=float)
c.dtype

dtype('float64')

The default data type is floating point. Other possible data types are: 

* **COMPLEX** numbers: 
``` python
d = np.array([1+2j, 3+4j, 5+6*1j])
d.dtype
    Out: dtype('complex128')
```

In [15]:
# Try out the above commands 
d = np.array([1+2j, 3+4j, 5+6*1j])
d.dtype

dtype('complex128')

* **BOOL**:
``` python
e = np.array([True, False, False, True])
e.dtype
    Out: dtype('bool')
```

In [17]:
# Try out the above commands 
e = np.array([True, False, False, True])
e.dtype

dtype('bool')

* **String**:
``` python
f = np.array(['abc', 'eddafg', 'hjk'])
f.dtype
    Out: dtype('S6')   # <--- String of 6 characters (by default largest elements of the array 
```

In [18]:
# Try out the above commands 
f = np.array(['abc', 'eddafg', 'hjk'])
f.dtype

dtype('<U6')

* **Other data types**:  `int32`, `int64`, `uint32`, `uint64`

Note that `type(f)` tells you that `f` is a numpy array, while `f.dtype` gives you the *type of the elements* containted in `f`. `dtype` is an attribute of the object `np.array`. If you try to access the attribute dtype of a List, you will get an error message. 

In [19]:
# Difference between type/dtype; application to List/arrays.
f = np.array(['abc', 'eddafg', 'hjk'])
print(type(f))
print(f.dtype)
print('----------')
L = ['abc', 'eddafg', 'hjk']
print(type(L))
print(L.dtype)

<class 'numpy.ndarray'>
<U6
----------
<class 'list'>


AttributeError: 'list' object has no attribute 'dtype'

- Last but not least, `numpy` is also the package that allows you to calculate many common mathematical function (see also [`ufunc`](#II.4.1)): `np.log10()` (base 10 log), `np.log()` (natural log), `np.exp()`, `np.sin()`, `np.cos()`, etc. See the list of `numpy` mathematical functions [here](https://docs.scipy.org/doc/numpy/reference/routines.math.html)

In [20]:
# create an array of floats and calculate its log / sin / ... 
#x = np.linspace(-2*np.pi, 2*np.pi, 20.)
np.log(2.3)

0.8329091229351039

**Exercise:**   
For the array:
``` python
a = np.array([[1,2,3,4], [4,5,6,7], [2,3,4,5] ])
```
- What is the output of `a.ndim`, `a.shape`, `len(a)` ?     
- How does the above commands relate to the rows, columns, dimensions ?       
- How do you access 2nd item of the first row ?   

*Note:* 
Try to do the same with the following array:
``` python
b = np.array([[1,2,3], []])
```

In [21]:
a = np.array([ [1,2,3,4], [4,5,6,7], [2,3,4,5] ])
a.ndim

2

In [22]:
a.shape

(3, 4)

In [23]:
len(a)

3

In [24]:
biga = np.array([ a, a ])
len(biga)

2

In [26]:
biga.shape

(2, 3, 4)

In [27]:
biga.ndim

3

In [29]:
a

array([[1, 2, 3, 4],
       [4, 5, 6, 7],
       [2, 3, 4, 5]])

In [28]:
# Secomd item of first row 
a[0, 1]

2

In [31]:
b = np.array([[1,2,3], []])
b

array([list([1, 2, 3]), list([])], dtype=object)

In [32]:
b.ndim

1

In [33]:
b.shape

(2,)

In [34]:
len(b)

2

In [49]:
a = np.array([[1,2,3,4], [4,5,6,7], [2,3,4,5] ])
print('a.ndim = ', a.ndim)
print('len(a) =', len(a))
print('a.shape =', a.shape)
print('a.size = ', a.size)
print(a)
print('2nd item 1st row = ', a[0,1])
# len(a) gives the number of rows, 
# shape returns a tuple containing the number of elements along each dimension. 
# ndim gives the dimensions / number of axis of the array
# size = total number of elements 
print('----------------- redefine the array: np.array([[1.,2.,3.], []])')
b = np.array([[1.,2.,3.], []])
print('b.ndim = ', b.ndim)
print('len(b)=', len(b))
print('b.shape = ', b.shape)
print('b.size = ', b.size)
# This basically tells that the shape is 1D ... so you can only. use n=2 indices but there are no columns 
print(b)
print('2nd item 1st row = ', b[0][1])
print(type(b))

a.ndim =  2
len(a) = 3
a.shape = (3, 4)
a.size =  12
[[1 2 3 4]
 [4 5 6 7]
 [2 3 4 5]]
2nd item 1st row =  2
----------------- redefine the array: np.array([[1.,2.,3.], []])
b.ndim =  1
len(b)= 2
b.shape =  (2,)
b.size =  2
[list([1.0, 2.0, 3.0]) list([])]
2nd item 1st row =  2.0
<class 'numpy.ndarray'>


**Exercise:** Elementwise operations

In the code cell below, try simple arithmetic elementwise operations: 
- add even elements with odd elements using 2 different techniques (slicing and list comprehension)
- Time the two solution using %timeit.
- Generate an array from a list made of strings and floats. What is the final array type ?
- Generate 2 arrays such that their elements are as follow :    
   `[2**0, 2**1, 2**2, 2**3, 2**4]`    
   `a_i = 2^(3*i) - i `    


In [51]:
# Slicing
a = np.arange(10) 
b = a[0::2] + a[1::2]
print(b)
print('Indeed, even elements are ', a[0::2])
print('and even elements are ', a[1::2])

[ 1  5  9 13 17]
Indeed, even elements are  [0 2 4 6 8]
and even elements are  [1 3 5 7 9]


In [55]:
# list comprehension
a = np.arange(10) 
b = np.array( [a[2*i] + a[2*i+1] for i in range(5)] ) 
print(b)

[ 1  5  9 13 17]


In [58]:
%timeit b = a[0::2] + a[1::2]

The slowest run took 15.53 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 1.59 µs per loop


In [59]:
%timeit b = np.array( [a[2*i] + a[2*i+1] for i in range(5)] ) 

The slowest run took 6.15 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 3.57 µs per loop


In [61]:
mixed_array = np.array([[0,1,2,3,4], ['a', 'b', 'c', 'd', 'e']])
mixed_array

array([['0', '1', '2', '3', '4'],
       ['a', 'b', 'c', 'd', 'e']], dtype='<U21')

In [63]:
#Generate:
#   `[2**0, 2**1, 2**2, 2**3, 2**4]`
#   `a_i = 2^(3*i) - i `
a = np.array([2**i for i in np.arange(5)])
print(a)
a = np.array([2**(3*i) - i for i in range(4)])
print(a) 
b = np.array([2**(3*i) - i for i in np.arange(4)])
print(b)
%timeit a = np.array([2**(3*i) - i for i in range(4)])
%timeit b = np.array([2**(3*i) - i for i in np.arange(4)])

[ 1  2  4  8 16]
[  1   7  62 509]
[  1   7  62 509]
100000 loops, best of 5: 2.51 µs per loop
The slowest run took 5.03 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 4.72 µs per loop


### II.2 `array` copies and views:   <a class="anchor" id="II.2"></a>

A slicing operation creates a **view** on the original array, which is just a way of accessing array data. Thus the original array is not copied in memory. You can use `np.may_share_memory()` to check if two arrays share the same memory block.
To provide this behaviour, and create a brand new array from the slice of the original one *without modifying the latter*, you may use the method `copy()`: `c = a[0:2].copy()` will create a **new array** that is a **copy** of the first two elements of a. 

**When modifying the view, the original array is modified as well**. Try the cells below to understand how memory allocation work. 

In [64]:
#import numpy as np
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [65]:
b = a[::2]
b

array([0, 2, 4, 6, 8])

In [66]:
np.may_share_memory(a, b)

True

In [67]:
b[0] = 12
b

array([12,  2,  4,  6,  8])

In [68]:
a   # (!)

array([12,  1,  2,  3,  4,  5,  6,  7,  8,  9])

In [71]:
a = np.arange(10)
print(a) 
c = a[::2].copy()  # force a copy
c[0] = 12
print('a=', a)
print('c=', c)

[0 1 2 3 4 5 6 7 8 9]
a= [0 1 2 3 4 5 6 7 8 9]
c= [12  2  4  6  8]


In [70]:
np.may_share_memory(a, c)

False

In [74]:
L = list(np.arange(10))
L2 = L[0::2]
L2[0] = 12
L, L2

([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [12, 2, 4, 6, 8])

In [78]:
np.may_share_memory(L, L2)

False

In [79]:
# But this is tricky ! 
L3 = L
L3[0] = 12
L

[12, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### II.3 Array shape manipulation <a class="anchor" id="II.3"></a>

- **II.3.1 Flattening**:    
The method `ravel()` flattens the array into a single-row array (each row of the array is merged with the previous one). 

In [80]:
#import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
print(a) 
print(a.ravel())
a

[[1 2 3]
 [4 5 6]]
[1 2 3 4 5 6]


array([[1, 2, 3],
       [4, 5, 6]])

In [81]:
a.T   # Transpose the array

array([[1, 4],
       [2, 5],
       [3, 6]])

In [82]:
a.T.ravel()

array([1, 4, 2, 5, 3, 6])

**Note**: `a.T` is a property of array `a` that returns the array transposed, while np.transpose(a) is a function that returns a view of the array(a) transposed. As a.T is a property of the object a, it is relatively quicker than the call of a function as you can test using the `%timeit` magic command. For N dim arrays, transpose() allows a bit more than just transposing (see below II.3.4.)

In [83]:
%timeit(a.transpose())
%timeit(a.T)

The slowest run took 52.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 167 ns per loop
The slowest run took 22.82 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 149 ns per loop


- **II.3.2 Reshaping**:   
The method `reshape(newshape)` allows one to reorganise the elements of an array, to create a "new" array (see below) that has a different shape. The total number of items of the array has to be the same ! 

In [96]:
print(a) 
a.shape

[[0. 0.]
 [0. 0.]
 [0. 0.]]


(3, 2)

In [97]:
b = a.ravel()
b = b.reshape((2, 3))
b

array([[0., 0., 0.],
       [0., 0., 0.]])

In [98]:
# Alternatively 
a.reshape((2, -1))    # unspecified (-1) value is inferred

array([[0., 0., 0.],
       [0., 0., 0.]])

In [99]:
a

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

**WARNING:** Reshaping may return a **view** or a **copy** !

In [89]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b=a.ravel()
b=b.reshape((2,3))
# Let's modify b and show a to see if we have a view or a copy ... 
b[0, 0] = 99
a

array([[99,  2,  3],
       [ 4,  5,  6]])

In [92]:
# let's now create an array with np.zeros and reshape it 
a = np.zeros((3, 2))
b = a.T.reshape(3*2)
b[0] = 9
a


array([[0., 0.],
       [0., 0.],
       [0., 0.]])

To understand this you need to learn more about the memory layout of a numpy array. This is beyond the scope of this lecture. 

- **II.3.3 Adding a dimension**:

Indexing with the `np.newaxis` object allows us to add an axis to an array. You can also use the `reshape` method.  

In [93]:
z = np.array([1, 2, 3])
print(z.shape)
z

(3,)


array([1, 2, 3])

In [94]:
print(z[:, np.newaxis])
z[:, np.newaxis].shape

[[1]
 [2]
 [3]]


(3, 1)

In [95]:
z

array([1, 2, 3])

In [100]:
z[np.newaxis, :]

array([[1, 2, 3]])

In [101]:
z[np.newaxis, :].shape

(1, 3)

In [102]:
# An alternative is to reshape your array
y = np.array([1, 2, 3])

# When one shape dimension is -1, the value is inferred from the length of the array and remaining dimensions.
y = y.reshape((-1,1))   
y.shape

(3, 1)

In [103]:
y = np.array([1, 2, 3])
y = y.reshape((1,-1))
y.shape

(1, 3)

- **II.3.4. Dimension shuffling**:

In [104]:
a = np.arange(4*3*2).reshape(4, 3, 2)
a.shape

(4, 3, 2)

In [105]:
a

array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]],

       [[12, 13],
        [14, 15],
        [16, 17]],

       [[18, 19],
        [20, 21],
        [22, 23]]])

In [106]:
a[1, 2, 0]

10

In [107]:
b = a.transpose(1, 2, 0)
b.shape

(3, 2, 4)

In [108]:
b

array([[[ 0,  6, 12, 18],
        [ 1,  7, 13, 19]],

       [[ 2,  8, 14, 20],
        [ 3,  9, 15, 21]],

       [[ 4, 10, 16, 22],
        [ 5, 11, 17, 23]]])

In [109]:
b[2, 1, 0]

5

In [110]:
# Check that shuffling dimensions creates a view of the array
a

array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5]],

       [[ 6,  7],
        [ 8,  9],
        [10, 11]],

       [[12, 13],
        [14, 15],
        [16, 17]],

       [[18, 19],
        [20, 21],
        [22, 23]]])

- **II.3.5. Resizing**: 

Size of an array can be changed with `ndarray.resize`:

In [111]:
a = np.arange(4)
print(a)
a.resize((8,))   # you give as argument the new shape of the array
a


[0 1 2 3]


array([0, 1, 2, 3, 0, 0, 0, 0])

However, it must not be referred to somewhere else:

In [112]:
b = a
a.resize((4,))   

ValueError: cannot resize an array that references or is referenced
by another array in this way.
Use the np.resize function or refcheck=False

**Exercises:**

- Use flatten as an alternative to ravel. What is the difference? (Hint: check which one returns a view and which a copy)
- Experiment with transpose for dimension shuffling.


In [113]:
a = np.arange(12).reshape(3,4)
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [117]:
a = np.arange(12).reshape(3,4)
r = a.ravel()
f = a.flatten()
r[3] = 99
f[5]=101
print('Ravelled array r=', r)
print('flattened array f=', f)
print(a)

Ravelled array r= [ 0  1  2 99  4  5  6  7  8  9 10 11]
flattened array f= [  0   1   2   3   4 101   6   7   8   9  10  11]
[[ 0  1  2 99]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [118]:
# Experiment with transpose()
a = np.arange(48).reshape(2,6,-1)
print(a.shape)
print(a.transpose(1,2,0).shape)
print(a.transpose().shape)

(2, 6, 4)
(6, 4, 2)
(4, 6, 2)


=> Ravels returns a view, and flatten() returns a copy. 

- **II.3.6. Meshgrid**: 

A very useful method that returns coordinate matrices from coordinate vectors. This is extremely useful when you want to evaluate a function on a grid (i.e. $z = f(x, y)$) ... which is something very common in observational astronomy ! This is also useful when you want to do contour plots (to e.g. interpolate over a regular grid). 

The way to proceed is to define your `x` and `y` vectors (corresponding to the (x,y) coordinates on a grid is the following:
``` python
x_vec, y_vec = np.linspace(0, 5, 6), np.linspace(0, 5, 3)
X, Y = meshgrid(x_vec,y_vec)

# Now you can evaluate the function z = (x**2 + y**2)
Z = X**2 + Y**2
```

`X` and `Y` created  with meshgrid() are now arrays of shape (3, 6) (3 rows and 6 columns) containing respectively coordinate x (for X) and y (for Y) of each grid element. This can be generalised to larger dimensions !

So, the array `Z` of shape (3,6) corresponds to points with the following coordinates:

['(0.0,0.0)', '(1.0,0.0)', '(2.0,0.0)', '(3.0,0.0)', '(4.0,0.0)', '(5.0,0.0)']   
['(0.0,2.5)', '(1.0,2.5)', '(2.0,2.5)', '(3.0,2.5)', '(4.0,2.5)', '(5.0,2.5)']   
['(0.0,5.0)', '(1.0,5.0)', '(2.0,5.0)', '(3.0,5.0)', '(4.0,5.0)', '(5.0,5.0)']   

**Note:**    
This function supports both indexing conventions through the indexing keyword argument.  Giving the string 'ij' returns a meshgrid with matrix indexing, while 'xy' returns a meshgrid with Cartesian indexing. In the 2-D case with inputs of length M and N, the outputs are of shape (N, M) for 'xy' indexing and (M, N) for 'ij' indexing.  In the 3-D case with inputs of length M, N and P, outputs are of shape (N, M, P) for 'xy' indexing and (M, N, P) for 'ij' indexing. In other words, indexing 'ij' yields a transposed version of the array obtained with indices i,j. See `help(meshgrid)` for more details. 

In [120]:
#import numpy as np 
x_vec, y_vec = np.linspace(0, 5, 6), np.linspace(0, 5, 3)
print(x_vec, y_vec)
X, Y = np.meshgrid(x_vec,y_vec)
print('')
print(Y)

[0. 1. 2. 3. 4. 5.] [0.  2.5 5. ]

[[0.  0.  0.  0.  0.  0. ]
 [2.5 2.5 2.5 2.5 2.5 2.5]
 [5.  5.  5.  5.  5.  5. ]]


In [123]:
# Experiment with "meshgrid()" following the code above. 
#import numpy as np

# Try to write a command that prints at the screen the coordinates of the grid elements (as above) (TIP: you do not need meshgrid)
row = []
for i in range(3):
    col = []
    for j in range(6):
        val = str( (x_vec[j], y_vec[i]))
        col.append(val)
    print(col)
#    row.append(col)
#row
# This can also be done with list comprehension

['(0.0, 0.0)', '(1.0, 0.0)', '(2.0, 0.0)', '(3.0, 0.0)', '(4.0, 0.0)', '(5.0, 0.0)']
['(0.0, 2.5)', '(1.0, 2.5)', '(2.0, 2.5)', '(3.0, 2.5)', '(4.0, 2.5)', '(5.0, 2.5)']
['(0.0, 5.0)', '(1.0, 5.0)', '(2.0, 5.0)', '(3.0, 5.0)', '(4.0, 5.0)', '(5.0, 5.0)']


**Exercise**: 

We will use meshgrid [later](Modules_in__python_matplotlib.ipynb#meshgrid), after we have learned how to visualise results with `python`. 

### II.4 What makes `numpy` arrays useful structures ?  <a class="anchor" id="II.4"></a>

Python is fast *for coding and developping* but python is slow when it comes to *execution*, especially when it comes to execution of `for` loops.    
The reason behind this low speed is e.g. that when it does `for a in range(10): a + b`, it has to check the `type` of `a`, of `b` and of *each value* in those lists before executing. 

`numpy` helps speeding up code through 4 strategies:
1. `ufunc`
2. aggregation
3. broadcasting
4. slicing, masking, fancy indexing

#### II.4.1 `ufunc`: operates elementwise on objects. <a class="anchor" id="II.4.1"></a>

Those `ufunc` (universal functions) are included (compiled) in `numpy` and consist of fast elementwise operations. They include: 

- all mathematic operation: +, -, /, *, `***` 
- Mathematical expressions: sin, exp, cos, log10, ... 
- Comparison operators <, >, =, ...
- etc ... 

**Example:**
``` python
import numpy as np
# Basic python
a = [1,2,3,4,5]
b = [ val + 5 for val in a]   # add 5 to each element of the list  
# In numpy
a = np.array(a)
b = a + 5                     # add 5 to each element of the array.
```

In [124]:
a = [1,2,3,4,5]
b = [ val + 5 for val in a]   # add 5 to each element of the list
b

[6, 7, 8, 9, 10]

In [125]:
a = np.array(a)
b = a + 5   
b

array([ 6,  7,  8,  9, 10])

In [126]:
# implement the above example for a list of 1000 elements 
# use %timeit before calculating b to see improvement in speed
a_a = np.arange(1000)
%timeit a_a+5

The slowest run took 13.71 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 1.5 µs per loop


In [127]:
a_l = range(1000)
%timeit [val + 5 for val in a_l]

10000 loops, best of 5: 53.5 µs per loop


#### II.4.2 *aggregation*:   <a class="anchor" id="II.4.2"></a>

Functions which summarize values of an array such as `min`, `max`, `sum`, `mean`, ... 

**Example:**

``` python
# python version of an aggregation
from random import random
c = [ random() for i in range(10000) ]
%timeit min(c)
#same in numpy:
c = np.array(c)
%timeit c.min()  
```
This also works on multidimensional arrays: 

``` python 
M = np.random.randint(0, 10, (10,4))
M.sum(axis=0)
M.sum(axis=1)
```

Aggregation available: 
`np.min()`, `np.max`, `np.prod()`, `np.mean()`, `np.std()`, `np.median()`, `np.any()`, `np.all()`, `np.nanmin()` (and nan versions of above aggregation), `np.argmin()`, `np.argmax()`, `np.percentile()`, ...


In [128]:
from random import random
c = [ random() for i in range(10000) ]
%timeit min(c)
#same in numpy:
c = np.array(c)
%timeit c.min()  

10000 loops, best of 5: 170 µs per loop
The slowest run took 9.49 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 6.28 µs per loop


#### II.4.3 *Broadcasting*:   <a class="anchor" id="II.4.3"></a>

Set of rules by which `ufuncs` operates on arrays of different sizes and/or dimensions. 

The term [broadcasting](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) describes how `numpy` treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.
Application to three cases: 

![From astroML book](../Figures/fig_broadcast_visual_1.png)



The rules / how this works:

* If array shapes differ, left-pad the smaller shape with 1s 
* If any dimension does not match, broadcast the dimension with size 1
* If neither non matching dimensions is 1 raise an error

This broadcasting strategy allows one to avoid doing `for` loops for some operations. 


#### II.4.4 Slicing, masking and fancy indexing:    <a class="anchor" id="II.4.4"></a>
	 
- **Mask**: a mask is a boolean array that can be used to "mask" some indices of an array: 

``` python
mask = np.array([False, False, True, False, True, False])
c = np.array([1, 3, 6, 9, 10, 2])
c[mask]
    Out: array([6, 10])
    
mask = (c < 4) | (c > 8)
c[mask]
    Out: array([1, 3, 9, 10, 2])
```
 

In [129]:
mask = np.array([False, False, True, False, True, False])
c = np.array([1, 3, 6, 9, 10, 2])
c[mask]

array([ 6, 10])

In [130]:
mask = (c < 4) | (c > 8)
c[mask]

array([ 1,  3,  9, 10,  2])

- **Fancy indexing**: passing a list/array of indices to get elements of a numpy array  (this only works for arrays !) This avoids to loop over the indices. 

``` python
ind = [1, 3, 4]
c[ind]  
   Out: array([3, 9, 10])
```

In [131]:
ind = [1, 3, 4]
c[ind]  

array([ 3,  9, 10])

- **Multi-dimensional** array: 

We can apply mask and fancy indexing in multidimension.   
Remember that first index is row, and second is column.   
Remember how slicing works: `a[start:end:step]`   : 
- Omitting one value goes up to the end of the sequence. 
- Omitting the second "colon" implies step=1.  
- With negative steps you count backward
- Start/step can be either positive or negative indices (but then you count from the end). 

In [132]:
a = np.arange(10)
print(a)
a[a>3]

[0 1 2 3 4 5 6 7 8 9]


array([4, 5, 6, 7, 8, 9])

In [134]:
M = np.arange(12).reshape((3,4))
print(M)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


array([], shape=(3, 0), dtype=int64)

In [135]:
print(M[0,1]) # gives value at row 0 and column 1. 

1


In [137]:
print(M[:, 1])  # Combines slices and indices -> all rows of column one

[1 5 9]


In [138]:
M[M-3 < 2]# can also do masking of n dimensional array

array([0, 1, 2, 3, 4])

In [139]:
M[[1,0], :2] # Use fancy indexing and slicing - 1st 2 elements, of rows 1 and 2

array([[4, 5],
       [0, 1]])

In [140]:
M[M.sum(axis=1) > 2, 4:] # mixing masking and slicing 

array([], shape=(3, 0), dtype=int64)

``` python
M = np.arange(12).reshape((3,4))
    Out: 
    array([[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11]])

M[0,1] # gives value at row 0 and column 1. 
M[:, 1]  # Combines slices and indices -> all rows of column one
M[M-3 < 2]# can also do masking of n dimensional array
M[[1,0], :2] # Use fancy indexing and slicing - 1st 2 elements, of rows 1 and 2
M[M.sum(axis=1) > 2, 4:] # mixing masking and slicing 
```

An illustration of indexing in numpy arrays:
![Illustration of `np` indexing](../Figures/numpy_indexing.png)

**Exercise**:
- Try the different flavours of slicing, using start, end and step: starting from a linspace, try to obtain odd numbers counting backwards, and even numbers counting forwards.

- Reproduce the slices in the diagram above. You may use the following expression to create the array:    
`np.arange(6) + np.arange(0, 51, 10)[:, np.newaxis]`

In [142]:
aa = np.linspace(0, 10, 11, dtype=int)
# odd numbers counting backwards
print(aa[-2::-2])
print(aa[::-1][1::2])   # using only slicing
print(aa[aa%2 == 1][::-1])  # using masking and slicing
# Even numbers counting forward
print(aa[0::2])   # slicing
print(aa[aa%2 == 0])  # using masking 

[9 7 5 3 1]
[9 7 5 3 1]
[9 7 5 3 1]
[ 0  2  4  6  8 10]
[ 0  2  4  6  8 10]


In [144]:
# Implement the exercise above
a = np.arange(6) + np.arange(0, 51, 10)[:, np.newaxis]

In [145]:
a[0, 3:5]

array([3, 4])

#### II.5 Reading arrays from a file and string formatting:    <a class="anchor" id="II.5"></a>

Reading tables saved in a formated text file can be done with `numpy.loadtxt('myfile.txt')`, while saving your array is done with `numpy.savetxt('myfile.txt')`.   
Clever loading of text/csv files: `numpy.genfromtxt()`/`numpy.recfromcsv()`. Those commands can fill missing values in a table, read column names, exclude some columns, and guess data-type using `dtype = None`.   
Fast and efficient, but numpy-specific, binary format: `numpy.save()`/`numpy.load()`.

There is another flexible way to read/write in file, which is through the use of the `file()` object. For this, three operations are generally needed: 
``` python
f = open('myfile.txt', 'r')  # 'r' for read mode, 'w' for write mode, 'a' for append mode
f.read()  # this would read the whole file as a single string ; other methods allow one more flexible read
f.close() 
```
If you do `f.read()` twice, you will see an empty string ... as the object instance then "points" to the end of the file, and there is nothing left to read. Somehow, the methods that access the file object go sequentially through the "string content" of that object. With `read()` you take the string as a whole (which could be a problem memory-wise if the file is large !).    

There is several ways to do this. One is by using a `for` loop:
``` python
f = open('myfile.txt', 'r')
for line in f:
    print repr(line)
```

In [147]:
f = open('data.txt', 'r')
f.read()

'# Name ID RA DEC z z_err zQF \n0011 11.0 69.60398 -12.33483 0.14502699731 5.39539923983e-05 0.0 \n0038 38.0 69.5969 -12.32047 0.502938602945 0.000222303791245 1.0 \n0042 42.0 69.57136 -12.31795 0.0 0.001 0.0 \n0055 55.0 69.59832 -12.31157 0.0 0.001 0.0 \n0057 57.0 69.5978429 -12.311442 0.0 0.001 0.0 \n0072 72.0 69.61111 -12.3037 0.0 0.001 0.0\n0080 80.0 69.55023 -12.30339 0.0 0.001 0.0\n0083 83.0 69.58752 -12.30232 0.0 0.001 0.0\n0085 85.0 69.56567 -12.30147 0.0 0.001 0.0\n0111 111.0 69.59927 -12.29414 0.0 0.001 0.0\n0114 114.0 69.52129 -12.2893 0.0 0.001 0.0\n0119 119.0 69.5651 -12.28924 0.0 0.001 0.0\n0125 125.0 69.53808 -12.28883 0.0 0.001 0.0\n0126 126.0 69.54177 -12.28782 0.0 0.001 0.0\n0128 128.0 69.60646 -12.28668 0.369864207858 8.03355766875e-05 0.0\n0164 164.0 69.52007 -12.24671 0.581369862213 0.000216496440347 2.0\n0182 182.0 69.53533 -12.25025 0.0 0.001 0.0\n0185 185.0 69.52459 -12.24947 0.585115787002 0.000104162060008 0.0\n0190 190.0 69.55228 -12.25292 0.0 0.001 0.0\n'

In [148]:
f.read()

''

In [149]:
f.close()

In [153]:
f = open('data.txt', 'r')
for line in f:
    print(repr(line))   # repr(object) return the canonical string representation of the object

'# Name ID RA DEC z z_err zQF \n'
'0011 11.0 69.60398 -12.33483 0.14502699731 5.39539923983e-05 0.0 \n'
'0038 38.0 69.5969 -12.32047 0.502938602945 0.000222303791245 1.0 \n'
'0042 42.0 69.57136 -12.31795 0.0 0.001 0.0 \n'
'0055 55.0 69.59832 -12.31157 0.0 0.001 0.0 \n'
'0057 57.0 69.5978429 -12.311442 0.0 0.001 0.0 \n'
'0072 72.0 69.61111 -12.3037 0.0 0.001 0.0\n'
'0080 80.0 69.55023 -12.30339 0.0 0.001 0.0\n'
'0083 83.0 69.58752 -12.30232 0.0 0.001 0.0\n'
'0085 85.0 69.56567 -12.30147 0.0 0.001 0.0\n'
'0111 111.0 69.59927 -12.29414 0.0 0.001 0.0\n'
'0114 114.0 69.52129 -12.2893 0.0 0.001 0.0\n'
'0119 119.0 69.5651 -12.28924 0.0 0.001 0.0\n'
'0125 125.0 69.53808 -12.28883 0.0 0.001 0.0\n'
'0126 126.0 69.54177 -12.28782 0.0 0.001 0.0\n'
'0128 128.0 69.60646 -12.28668 0.369864207858 8.03355766875e-05 0.0\n'
'0164 164.0 69.52007 -12.24671 0.581369862213 0.000216496440347 2.0\n'
'0182 182.0 69.53533 -12.25025 0.0 0.001 0.0\n'
'0185 185.0 69.52459 -12.24947 0.585115787002 0.000104162060008 

In [151]:
f = open('data.txt', 'r')
a = f.readlines()
a

['# Name ID RA DEC z z_err zQF \n',
 '0011 11.0 69.60398 -12.33483 0.14502699731 5.39539923983e-05 0.0 \n',
 '0038 38.0 69.5969 -12.32047 0.502938602945 0.000222303791245 1.0 \n',
 '0042 42.0 69.57136 -12.31795 0.0 0.001 0.0 \n',
 '0055 55.0 69.59832 -12.31157 0.0 0.001 0.0 \n',
 '0057 57.0 69.5978429 -12.311442 0.0 0.001 0.0 \n',
 '0072 72.0 69.61111 -12.3037 0.0 0.001 0.0\n',
 '0080 80.0 69.55023 -12.30339 0.0 0.001 0.0\n',
 '0083 83.0 69.58752 -12.30232 0.0 0.001 0.0\n',
 '0085 85.0 69.56567 -12.30147 0.0 0.001 0.0\n',
 '0111 111.0 69.59927 -12.29414 0.0 0.001 0.0\n',
 '0114 114.0 69.52129 -12.2893 0.0 0.001 0.0\n',
 '0119 119.0 69.5651 -12.28924 0.0 0.001 0.0\n',
 '0125 125.0 69.53808 -12.28883 0.0 0.001 0.0\n',
 '0126 126.0 69.54177 -12.28782 0.0 0.001 0.0\n',
 '0128 128.0 69.60646 -12.28668 0.369864207858 8.03355766875e-05 0.0\n',
 '0164 164.0 69.52007 -12.24671 0.581369862213 0.000216496440347 2.0\n',
 '0182 182.0 69.53533 -12.25025 0.0 0.001 0.0\n',
 '0185 185.0 69.52459 -12.24

In [154]:
a[10].replace('.', ',')

'0111 111,0 69,59927 -12,29414 0,0 0,001 0,0\n'

In [155]:
a

['# Name ID RA DEC z z_err zQF \n',
 '0011 11.0 69.60398 -12.33483 0.14502699731 5.39539923983e-05 0.0 \n',
 '0038 38.0 69.5969 -12.32047 0.502938602945 0.000222303791245 1.0 \n',
 '0042 42.0 69.57136 -12.31795 0.0 0.001 0.0 \n',
 '0055 55.0 69.59832 -12.31157 0.0 0.001 0.0 \n',
 '0057 57.0 69.5978429 -12.311442 0.0 0.001 0.0 \n',
 '0072 72.0 69.61111 -12.3037 0.0 0.001 0.0\n',
 '0080 80.0 69.55023 -12.30339 0.0 0.001 0.0\n',
 '0083 83.0 69.58752 -12.30232 0.0 0.001 0.0\n',
 '0085 85.0 69.56567 -12.30147 0.0 0.001 0.0\n',
 '0111 111.0 69.59927 -12.29414 0.0 0.001 0.0\n',
 '0114 114.0 69.52129 -12.2893 0.0 0.001 0.0\n',
 '0119 119.0 69.5651 -12.28924 0.0 0.001 0.0\n',
 '0125 125.0 69.53808 -12.28883 0.0 0.001 0.0\n',
 '0126 126.0 69.54177 -12.28782 0.0 0.001 0.0\n',
 '0128 128.0 69.60646 -12.28668 0.369864207858 8.03355766875e-05 0.0\n',
 '0164 164.0 69.52007 -12.24671 0.581369862213 0.000216496440347 2.0\n',
 '0182 182.0 69.53533 -12.25025 0.0 0.001 0.0\n',
 '0185 185.0 69.52459 -12.24

Each line is being returned as a string. Notice the `\n` at the end of each line - this is a line return character, which indicates the end of a line.

Alternatively, you could also do:
``` python
f = open('myfile.txt', 'r')
for line in f.readlines():
    print(repr(line))
```
BUT `f.readlines()` actually reads in the whole file and splits it into a **list** of lines (while `for line in f` reads one line at a time), so for large files this can be memory intensive. The above option is therefore prefered.     

In [156]:
f = open('data.txt', 'r')
for line in f.readlines():
    print(repr(line))

'# Name ID RA DEC z z_err zQF \n'
'0011 11.0 69.60398 -12.33483 0.14502699731 5.39539923983e-05 0.0 \n'
'0038 38.0 69.5969 -12.32047 0.502938602945 0.000222303791245 1.0 \n'
'0042 42.0 69.57136 -12.31795 0.0 0.001 0.0 \n'
'0055 55.0 69.59832 -12.31157 0.0 0.001 0.0 \n'
'0057 57.0 69.5978429 -12.311442 0.0 0.001 0.0 \n'
'0072 72.0 69.61111 -12.3037 0.0 0.001 0.0\n'
'0080 80.0 69.55023 -12.30339 0.0 0.001 0.0\n'
'0083 83.0 69.58752 -12.30232 0.0 0.001 0.0\n'
'0085 85.0 69.56567 -12.30147 0.0 0.001 0.0\n'
'0111 111.0 69.59927 -12.29414 0.0 0.001 0.0\n'
'0114 114.0 69.52129 -12.2893 0.0 0.001 0.0\n'
'0119 119.0 69.5651 -12.28924 0.0 0.001 0.0\n'
'0125 125.0 69.53808 -12.28883 0.0 0.001 0.0\n'
'0126 126.0 69.54177 -12.28782 0.0 0.001 0.0\n'
'0128 128.0 69.60646 -12.28668 0.369864207858 8.03355766875e-05 0.0\n'
'0164 164.0 69.52007 -12.24671 0.581369862213 0.000216496440347 2.0\n'
'0182 182.0 69.53533 -12.25025 0.0 0.001 0.0\n'
'0185 185.0 69.52459 -12.24947 0.585115787002 0.000104162060008 

In [157]:
print(repr(line))
a = line.strip().split()  
float(a[2])

'0190 190.0 69.55228 -12.25292 0.0 0.001 0.0\n'


69.55228

 Once a line is read, it is possible to apply string methods, as on normal string:    
- Remove `\n`: `line.strip()`
- Split the string into list of strings: `line.split()`
- Replace a specific character by another: `line.replace(',', '.')`  replaces each comma by a dot.
- Access a specific element of a splitted list and convert it to float: `float(line.split()[2])`

To write a file, you basically follow the same procedure: 
``` python
f = open('myfile.txt', 'w')
f.writelines(mylist_of_lines)   # mylist_of_lines contains the lines you want to write. Ensure that they end with `\n`

# you can also use:
f.write(mylist_of_lines[0]+mylist_of_lines[1]+ ... + mylist_of_lines_[n])  # you can use list comprenhesion as argument
f.close()
```

**Exercise:**

Read the file `data.txt` and display the some columns you care about for that file using:
- the file object
- Try to do the same using `numpy.loadtxt()` 
- Try to do the same using using `numpy.genfromtxt()`.   
Bonus:      
- Try to build a numpy array with the data in data.txt as read using f = open('data.txt'). 
- Modify 1 column of the file (replace it with 0) and write the results in `data_new.txt`

**Note:**

Those methods/functions for reading ascii files are not always optimal to read tables containing both strings and floats. Other packages, such as `pandas` and `astropy`, offer more flexible functions to read large variety and formats of tables.    

In [159]:
data_ltxt = np.loadtxt('data.txt',skiprows=1,  dtype=None)
print(data_ltxt.shape)
print(data_ltxt[0])
print(type(data_ltxt))

(19, 7)
[ 1.10000000e+01  1.10000000e+01  6.96039800e+01 -1.23348300e+01
  1.45026997e-01  5.39539924e-05  0.00000000e+00]
<class 'numpy.ndarray'>


In [146]:
# Note about the existence of a dedicated tool to read and manipulate tables in astropy 
from astropy.table import Table  
apy_tab = Table.read('data.txt', format='ascii')
apy_tab

Name,ID,RA,DEC,z,z_err,zQF
int64,float64,float64,float64,float64,float64,float64
11,11.0,69.60398,-12.33483,0.14502699731,5.39539923983e-05,0.0
38,38.0,69.5969,-12.32047,0.502938602945,0.000222303791245,1.0
42,42.0,69.57136,-12.31795,0.0,0.001,0.0
55,55.0,69.59832,-12.31157,0.0,0.001,0.0
57,57.0,69.5978429,-12.311442,0.0,0.001,0.0
72,72.0,69.61111,-12.3037,0.0,0.001,0.0
80,80.0,69.55023,-12.30339,0.0,0.001,0.0
83,83.0,69.58752,-12.30232,0.0,0.001,0.0
85,85.0,69.56567,-12.30147,0.0,0.001,0.0
111,111.0,69.59927,-12.29414,0.0,0.001,0.0


#### Formatting Strings

It often happens that you do not need to save all the decimals of a number, or would like to see it in scientific notation. There are [multiple ways to do it](https://docs.python.org/3/tutorial/inputoutput.html). One could spend (boring) hours describing all possible ways to format strings. The main 2 options are described below. You may look at https://pyformat.info/ to skim through various examples of formatting. The options described below explains you the basics and points yoi to relevant documentation. 

- **Option 1**: `printf-style` (simple (old style) but not universal) 

You can use the `%` operator to specify the formatting of the variable you want to show at the screen or save in a file. The variable does not appear explicitly in the string but after it in a tuple, preceded by the `%`. Within the string, the `%` operator will be followed by a format string such as `%f` for a float or `%e` for scientific notation. The sequence `'%.2f'%variable` basically tells that the `%` operator converts the `variable` into a float with 2 digits after the dot. This is generalized to a sequence of variable, by defining the tuple object that contains all the variables to be formatted (but you need to specify the format you want for those, the association between the format and the variable being done easily as you have put your variable into a tuple-object). 

Example:
``` python
print('%i is the square of %i' %(4.000, 2))
    Out: 4 is the square of 2
```
Here are some commonly used formatting characters:
- `%s`: String (or any object with a string representation, like numbers)
- `%d` or `%i`: Integers
- `%.<number_of_digits>f`: Floating point numbers with fixed number of digits to the right of the dot. 
- `%.<number_of_digits>e`: scientific notation with fixed number of digits to the right of the dot.
You may find more about string formatting in [python 2 documentation](https://docs.python.org/2/library/stdtypes.html#string-formatting).  


In [None]:
# Experiment with the above examples 

- **Option 2**: `str.format()` method

This is a much more flexible and general method described in details at https://docs.python.org/3/library/string.html#formatstrings. Format strings contain `replacement fields` surrounded by curly braces `{}`. Anything that is not contained in braces is considered literal text, which is *copied unchanged to the output*. If you need to include a brace character in the literal text, it can be escaped by doubling: {{ and }}. The `replacement field` can start with a `field_name` that specifies the object whose value is to be formatted and inserted into the output instead of the replacement field. The field_name is optionally followed by a `conversion field`, which is preceded by an exclamation point '!', and a `format_spec`, which is preceded by a colon `:`. These specify a non-default format for the replacement value. The `conversion field` causes a type coercion before formatting. You can in general ignore it). The `format_spec` is more advanced than in the printf style, allowing for alignement, signing, filling empty spaces, .... See [here](https://docs.python.org/3/library/string.html#format-specification-mini-language) and [here](https://pyformat.info/) for more details and EXAMPLES. 

Example:
``` python
print('{0:d} is the square of {1:n}'.format(4.000, 2))
    Out: 4 is the square of 2
```
`d` is there to output a base 10 representation of an integers. If you wich a float representation with 2 decimals: 
You can also use the positional argument to revert the output:
``` python
print('{1:d} is the square of {0:n}'.format(2, 4.000))
    Out: 4 is the square of 2
```

**Note**: 
- About `conversion field`: There are 3 possible conversions flags: `!s` which calls [str()](https://docs.python.org/3/library/stdtypes.html#str) on the value, `!r` which calls [repr()](https://docs.python.org/3/library/functions.html#repr) and `!a` which calls [ascii()](https://docs.python.org/3/library/functions.html#ascii).
- About `format_spec`: The general form of the formatter is
``` python
format_spec     ::=  [[fill]align][sign][#][0][width][grouping_option][.precision][type]
fill            ::=  <any character>
align           ::=  "<" | ">" | "=" | "^"
sign            ::=  "+" | "-" | " "
width           ::=  digit+
grouping_option ::=  "_" | ","
precision       ::=  digit+
type            ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"
```
See https://docs.python.org/3/library/string.html#format-specification-mini-language

In [None]:
# Experiment with the above examples 

In [161]:
'{0:.3f}'.format(12.23325)

'12.233'

In [None]:
# Create three float variables a, b, c and give them some value (e.g. a=2.3, b=3, c=-5). 
# Print the sentence: `a=2.00, b=3 and c=-5.00e+00` using the formating format described above.

In [None]:
# Create a 1-D array of 5 floats and print their value with 2 digits floats. TIP: use list comprehension

**Note**: There is another very useful way in python to save "full objects" and access and use them later using all their characteristics. This can be done by importing the `pickle` [module](https://docs.python.org/2/library/pickle.html), or even better (faster) [cPickle]( http://docs.python.org/library/pickle.html#module-cPickle). When you want to write a pickle into a file, simply open your file (`pkl_file = open()`), use `pickle.dump(obj, pkl_file, protocol=-1)`, and close your file (`pkl_file.close()`). To read an object saved in a pickle file, you can follow the same procedure but use `	obj = pickle.load(pkl_file)` instead of `pickle.dump()`. The `pandas` module also allows you to read/write pickle objects: see `pandas.read_pickle()` and `pandas.to_pickle()`

### II.6 Summary:   <a class="anchor" id="II.6"></a>

What do you need to know to get started?

- Know how to create arrays : `np.array`, `np.arange`, `np.ones`, `np.zeros`, `np.linspace()`.

- Know the shape of the array with `array.shape`, then use *slicing* to obtain different views of the array: `array[start:end:step]` (and variations around that syntax). Adjust the shape of the array using reshape or flatten it with ravel.

- Obtain a subset of the elements of an array and/or modify their values with masks (`a[a < 0] = 0`).

- Know miscellaneous operations on arrays, such as finding the mean or max (`ufunct`: `array.max()`, `array.mean()`). Have the reflex to search in the documentation (online docs, `help()`, `np.lookfor()`) when you do not remember exact syntax of a function !!

- Master the *indexing* with arrays of integers, as well as *broadcasting*. Know more NumPy functions to handle various array operations.

- Be able to read/write data into a file, and format numbers at screen (or when writing them into files): `open()`, `close()`, `np.savetxt()/np.loadtxt()`, use of `%` operator and the `.format()` string method. 


## II.7 References and supplementary material: <a class="anchor" id="VI"></a>

- Excellent video introducing numpy (and that inspired part of the numpy section of this notebook) by J. Vandeplas: https://www.youtube.com/watch?v=EEUXKG97YRw

- Numpy quick-start: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html

- About string formatting: https://docs.python.org/3/tutorial/inputoutput.html