# Introduction to `numpy` - Part 1

This Notebook provides an overview of the capabilities of the `numpy` module. It covers Sect. II of [Modules_in__python.ipynb](Modules_in__python.ipynb). 

## Table of Content

- [II. Numpy](#II)
    * [II.1 Array Definition and construction](#II.1)
    * [II.2 Array copies and views](Modules_in__python_numpy_Part2.ipynb/#II.2)
    * [II.3 Shape manipulation](Modules_in__python_numpy_Part2.ipynb/#II.3)
    * [II.4 What makes numpy Arrays useful structures ?](Modules_in__python_numpy_Part2.ipynb/#II.4)
        - II.4.1 ufunc
        - II.4.2 Aggregation
        - II.4.3 Broadcasting
        - II.4.4 Slicing, masking, fancy indexing
    * [II.5 Reading arrays from a file and string formatting](Modules_in__python_numpy_Part2.ipynb/#II.5)
    * [II.6 Useful Numpy functions](Modules_in__python_numpy_Part2.ipynb/#II.6)
    * [II.7 Summary](Modules_in__python_numpy_Part2.ipynb/#II.7)
    * [II.8 References](#VI)

## II. `numpy`:  <a class="anchor" id="II"></a>

`numpy` can be seen as the implementation of mathematical functions and operations for python language. It also introduces one key object `arrays`. 

### II.1 `array` definition and construction:  <a class="anchor" id="II.1"></a>

- A `numpy` array is an object of the type `np.ndarray` (although this type specifier is rarely used directly). Instead one can create arrays in several ways: 

``` python
import numpy as np
np.array([1,2,3,4])   # creates an array from a python list
np.array([[0, 1, 2], [3, 4, 5]])   # Creates a 2D array from a python list
np.empty(shape=(2,3)) # Creates an "empty" (entry not initialised) array with 2 rows and 3 columns 
np.arange(5) # similar to the built-in range() function.
np.linspace(1, 10, 10) # creates an array of 10 elements from 1 to 10
np.zeros(10)  # creates an array of 10 elements filled with 0
np.ones(10)   # creates an array of 5 elements filled with 1
np.zeros((2, 5))  # multidimensional arrays of 2 rows and 5 columns

```
- 2-D arrays of `shape=(r, c)` are arrays with `r` *rows* and `c` *columns*. 

In [1]:
import numpy as np
a = np.array([1,2,3,4])
a

array([1, 2, 3, 4])

In [2]:
b = [1,2,3,4]

In [3]:
print(a*2)
print(b*2)

[2 4 6 8]
[1, 2, 3, 4, 1, 2, 3, 4]


In [4]:
print(a+2)
print(b+[2])

[3 4 5 6]
[1, 2, 3, 4, 2]


In [5]:
np.array([[0, 1, 2], 
          [3, 4, 5]])

array([[0, 1, 2],
       [3, 4, 5]])

In [6]:
np.empty(shape=(2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [7]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [8]:
np.linspace(1, 10, 5)

array([ 1.  ,  3.25,  5.5 ,  7.75, 10.  ])

In [9]:
np.geomspace(1, 10, 5)

array([ 1.        ,  1.77827941,  3.16227766,  5.62341325, 10.        ])

In [10]:
print(np.zeros(10))
print(np.ones(10))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


In [11]:
np.zeros((2, 5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [12]:
# Let's try the above commands and visualise the output. 
a = np.array([[1,22,3], [3,5,5]])
a

array([[ 1, 22,  3],
       [ 3,  5,  5]])

### Intermezo: python is an object-oriented language

At this stage, it is important to make one thing clear: in python, creating a variable of a given type generally corresponds to creating an object. You do not have to have a deep understanding of object-oriented programing (OOP) to work in python (at least at an introduction level - in this course) but there are a few concepts that may be helpful to know. We will illustrate the jargon by taking the example of the object `array` in numpy, but the concepts illustrated herebelow are *not* specific to numpy. 

- The object that `numpy` works with is called an `array`. 
- Once you assign an array to a variable, such as  
```python   
a = np.array([[1,2,3], [3,5,5]])
```
you create an **instance** of an array object (in practice you mostly allocate memory). Note that you can work with arrays created on the fly without saving them as a variable. You can see this as a dummy instantiation.  
- An object has **attributes**. As a chair can be in wood, plastic, metal, ... objects have attributes. For example, the `shape` is an attribute of an array. This is a property which is attributed to the instance of the object. You can access an object attribute using `instance.attribute`, i.e. to know the shape of `a` you can do: 
```python
a.shape
```
You can however also get to know the shape of `a` by writing `module.attribute(instance)`: 
```python
np.shape(a) 
``` 
- There are also a functionality of an object, namely function that do an operation on the object and return something. Those functions are formally called **methods**. Again, you can call a method by doing `module.method(a, args)` or using the `.` directly on the instance `instance.function(arg)`: 
```python
np.cumsum(a, axis=1)
#or 
a.cumsum(axis=1)  
```

In [13]:
a = np.array([[[1,2,3], [3,5,5], [4,5,6]],
             [[1,2,3], [3,5,5], [4,5,6]]])

In [14]:
a.shape   # The shape is an attribute of an array 

(2, 3, 3)

In [15]:
a.ndim

3

In [16]:
# you can also find the shape of an array by seeking the shape of an array with the command np.shape()
np.shape(a)        # this is equivalent 

(2, 3, 3)

In [17]:
empty_array = np.empty(shape=(2,3))
empty_array

array([[0., 0., 0.],
       [0., 0., 0.]])

In [18]:
zero_array = np.zeros(shape=(2,3))
zero_array

array([[0., 0., 0.],
       [0., 0., 0.]])

In [19]:
ones_array = np.ones(shape=(2,2,3))
ones_array  # [1,1,0]

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

In [20]:
type(zero_array)

numpy.ndarray

In [21]:
zero_array.dtype

dtype('float64')

In [22]:
array_of_string = np.array(['qqqq', 'a', 'f'], dtype=str)
array_of_string

array(['qqqq', 'a', 'f'], dtype='<U4')

In [23]:
for i in range(5):
    print(i)

0
1
2
3
4


In [24]:
np.arange(0., 5., 0.5)

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [25]:
np.linspace(0, 5, 9)

array([0.   , 0.625, 1.25 , 1.875, 2.5  , 3.125, 3.75 , 4.375, 5.   ])

- numpy has also tools to create arrays filled with random elements:

``` python
np.random.random(size=4)  # uniform between 0 and 1
np.random.normal(size=4)  # elements are std-normal distributed

```

In [26]:
np.random.random(size=4)

array([0.80257974, 0.62232242, 0.8101001 , 0.03170536])

In [27]:
np.random.normal(loc=10.0, scale=2.0, size=4)

array([ 7.55204421,  7.60464161,  9.28428993, 12.47906541])

- You can explicitly specify which **data-type** you want:

``` python 
c = np.array([1, 2, 3], dtype=float)
c.dtype
    Out: dtype('float64')
```

In [28]:
c = np.array([1, 2, 3], dtype=np.int32)
c.dtype

dtype('int32')

The default data type is floating point. Other possible data types are: 

* **COMPLEX** numbers: 
``` python
d = np.array([1+2j, 3+4j, 5+6*1j])
d.dtype
    Out: dtype('complex128')
```

In [29]:
d = np.array([1+2j, 3+4j, 5+6*1j])
d.dtype

dtype('complex128')

* **BOOL**:
``` python
e = np.array([True, False, False, True])
e.dtype
    Out: dtype('bool')
```

In [30]:
e = np.array([True, False, False, True])
e.dtype

dtype('bool')

* **String**:
``` python
f = np.array(['abc', 'eddafg', 'hjk'])
f.dtype
    Out: dtype('S6')   # <--- String of 6 characters (by default largest elements of the array 
```

In [31]:
f = np.array(['abc', 'eddafg', 'hjk'])
f.dtype

dtype('<U6')

* **Other data types**:  `int32`, `int64`, `uint32`, `uint64`  (uint = unsigned integer => only positive integers)

Note that `type(f)` tells you that `f` is a numpy array, while `f.dtype` gives you the *type of the elements* containted in `f`. `dtype` is an attribute of the object `np.array`. If you try to access the attribute dtype of a List, you will get an error message. 

In [32]:
# Difference between type/dtype; application to List/arrays.
f = np.array(['abc', 'eddafg', 'hjk'])
print(type(f))
print(f.dtype)
print('----------')
L = ['abc', 'eddafg', 'hjk']
print(type(L))
print(L.dtype)

<class 'numpy.ndarray'>
<U6
----------
<class 'list'>


AttributeError: 'list' object has no attribute 'dtype'

- Last but not least, `numpy` is also the package that allows you to calculate many common mathematical function (see also [`ufunc`](#II.4.1)): `np.log10()` (base 10 log), `np.log()` (natural log), `np.exp()`, `np.sin()`, `np.cos()`, etc. See the list of `numpy` mathematical functions [here](https://docs.scipy.org/doc/numpy/reference/routines.math.html)

In [33]:
# create an array of floats and calculate its log / sin / ... 
#x = np.linspace(-2*np.pi, 2*np.pi, 20.)
np.log(2.3)

0.8329091229351039

**Exercise:**   
For the array:
``` python
a = np.array([[1,2,3,4], [4,5,6,7], [2,3,4,5] ])
```
- What is the output of `a.ndim`, `a.shape`, `len(a)` ?     
- How does the above commands relate to the rows, columns, dimensions ?       
- How do you access 2nd item of the first row ?   

*Note:* 
Try to do the same with the following array:
``` python
b = np.array([[1,2,3], []])
```

In [34]:
a = np.array([[1,2,3,4],
              [4,5,6,7],
              [2,3,4,5]])
a.ndim
a.shape

(3, 4)

In [35]:
len(a)

3

**Exercises:** Elementwise operations

In the code cell below, try simple arithmetic elementwise operations: 
- add even numbers with their consecutive odd number, up to 10 and up to 1000, using 2 different techniques (slicing and list comprehension)
- Time the two solution using %timeit.

In [36]:
# basic solution with numpy (involves 2 arrays)
a = np.arange(0,10,2)
b = np.arange(1,11,2)
a+b

array([ 1,  5,  9, 13, 17])

In [37]:
# more elegant solution with numpy (involves 1 array)
a = np.arange(0,10,1)
a[0::2]+a[1::2]

array([ 1,  5,  9, 13, 17])

In [38]:
%timeit a[0::2]+a[1::2]

756 ns ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [39]:
%timeit np.arange(0,10,2)+np.arange(1,11,2)

1.7 µs ± 4.07 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [40]:
# solution with list comprehension
c = [i+i+1 for i in range(0,10,2)]
c

[1, 5, 9, 13, 17]

In [41]:
%timeit [i+i+1 for i in range(0,10,2)]

457 ns ± 3.51 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


One can notice that for 10 elements, the list comprehension solution is ~twice faster than the solution involving the instanciation of one numpy array, which is itself ~twice faster than the solution involving two numpy arrays. This is most likely due to the process of creating instance of numpy array objects.  

Let's now try the same as above but with 1000 elements:

In [42]:
%timeit np.arange(0,1000,2)+np.arange(1,1001,2)

2.36 µs ± 34.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [43]:
%timeit [i+i+1 for i in range(0,1000,2)]

30.1 µs ± 96.2 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


We can notice that the numpy solution is now over 10x faster. The numpy solution is in fact barely longer than in the case of 10 elements. The main reason the list solution gets slower is that for each element of the list that is created, python has to check the types involved (and if any operation, like a sum here, is allowed between these types). This is not the case with numpy arrays - as their type is defined with dtype, and only a single check is enough to know whether the operation between the two arrays (here a sum) is allowed.

- Generate 2 arrays such that their elements are as follow :    
   `[2^0, 2^1, 2^2, 2^3, 2^4]`    
   `a_i = 2^(3*i) - i `    
   
Expected output: 
``` python
[1 2 4 8 16]    
[  1   7  62 509]    
```

In [44]:
a = np.arange(5)
2**a

array([ 1,  2,  4,  8, 16])

In [45]:
a = np.arange(4)
a = 2**(3*a) - a # note that we override the first definition of 'a' above. Python is dynamical.
a

array([  1,   7,  62, 509])

It is now time for more practicing exercises: GO TO [Python_Exercises_starter.ipynb](../Exercises/Python_Exercises_starter.ipynb)

To take full advantage of numpy arrays, you need to understand in more details how they work. For this let's go to the second part of this notebook: [Modules_in__python_numpy_Part2.ipynb](Modules_in__python_numpy_Part2.ipynb)

## II.8 References and supplementary material: <a class="anchor" id="VI"></a>

- Good video introducing numpy (and that inspired part of the numpy section of this notebook) by J. Vandeplas: https://www.youtube.com/watch?v=EEUXKG97YRw

- Numpy quick-start:  [https://numpy.org/doc/stable/user/quickstart.html](https://numpy.org/doc/stable/user/quickstart.html)

- About string formatting: https://docs.python.org/3/tutorial/inputoutput.html