## What are NumPy and NumPy arrays?

### NumPy arrays

**Python** objects  

> -   high-level number objects: integers, floating point
> -   containers: lists (costless insertion and append), dictionaries
>     (fast lookup)

**NumPy** provides  

> -   extension package to Python for multi-dimensional arrays
> -   closer to hardware (efficiency)
> -   designed for scientific computation (convenience)
> -   Also known as *array oriented computing*

In [None]:
import numpy as np
a = np.array([0, 1, 2, 3])
a

For example, An array containing:

-   values of an experiment/simulation at discrete time steps
-   signal recorded by a measurement device, e.g. sound wave
-   pixels of an image, grey-level or colour
-   3-D data measured at different X-Y-Z positions, e.g. MRI scan
-   ...

**Why it is useful:** Memory-efficient container that provides fast
numerical operations.

In [None]:
In [1]: L = range(1000)

In [2]: %timeit [i**2 for i in L]
1000 loops, best of 3: 403 us per loop

In [3]: a = np.arange(1000)

In [4]: %timeit a**2
100000 loops, best of 3: 12.7 us per loop

In [None]:
    In [5]: np.array?
    String Form:<built-in function array>
    Docstring:
    array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0, ...

### Import conventions

The recommended convention to import numpy is:

In [None]:
import numpy as np

## Creating arrays

### Manual construction of arrays

-   **1-D**:

In [None]:
    a = np.array([0, 1, 2, 3])
    a

In [None]:
    a.ndim

In [None]:
    a.shape

In [None]:
    len(a)

-   **2-D, 3-D, ...**:

In [None]:
    b = np.array([[0, 1, 2], [3, 4, 5]])    # 2 x 3 array
    b

In [None]:
    b.ndim

In [None]:
    b.shape

In [None]:
    len(b)     # returns the size of the first dimension

In [None]:
    c = np.array([[[1], [2]], [[3], [4]]])
    c

In [None]:
    c.shape

### Functions for creating arrays

Tip

In practice, we rarely enter items one by one...

-   Evenly spaced:

In [None]:
    a = np.arange(10) # 0 .. n-1  (!)
    a

In [None]:
    b = np.arange(1, 9, 2) # start, end (exclusive), step
    b

-   or by number of points:

In [None]:
    c = np.linspace(0, 1, 6)   # start, end, num-points
    c

In [None]:
    d = np.linspace(0, 1, 5, endpoint=False)
    d

-   Common arrays:

In [None]:
    a = np.ones((3, 3))  # reminder: (3, 3) is a tuple
    a

In [None]:
    b = np.zeros((2, 2))
    b

In [None]:
    c = np.eye(3)
    c

In [None]:
    d = np.diag(np.array([1, 2, 3, 4]))
    d

-   `np.random` : random numbers (Mersenne Twister PRNG):

In [None]:
    a = np.random.rand(4)       # uniform in [0, 1]
    a  # doctest: +SKIP

In [None]:
    b = np.random.randn(4)      # Gaussian
    b  # doctest: +SKIP

In [None]:
    np.random.seed(1234)        # Setting the random seed

## Basic data types

You may have noticed that, in some instances, array elements are
displayed with a trailing dot (e.g. `2.` vs `2`). This is due to a
difference in the data-type used:

In [None]:
a = np.array([1, 2, 3])
a.dtype

In [None]:
b = np.array([1., 2., 3.])
b.dtype

Tip

Different data-types allow us to store data more compactly in memory,
but most of the time we simply work with floating point numbers. Note
that, in the example above, NumPy auto-detects the data-type from the
input.

------------------------------------------------------------------------

You can explicitly specify which data-type you want:

In [None]:
c = np.array([1, 2, 3], dtype=float)
c.dtype

The **default** data type is floating point:

In [None]:
a = np.ones((3, 3))
a.dtype

There are also other types:

Complex  

In [None]:
d = np.array([1+2j, 3+4j, 5+6*1j])
d.dtype

There are also other types:

Complex  

In [None]:
e = np.array([True, False, False, True])
e.dtype

In [None]:
f = np.array(['Bonjour', 'Hello', 'Hallo'])
f.dtype 

Much more  

> -   `int32`
> -   `int64`
> -   `uint32`
> -   `uint64`

## Indexing and slicing

The items of an array can be accessed and assigned to the same way as
other Python sequences (e.g. lists):

In [None]:
a = np.arange(10)
a

In [None]:
a[0], a[2], a[-1]

The usual python idiom for reversing a sequence is supported:

In [None]:
a[::-1]

For multidimensional arrays, indexes are tuples of integers:

In [None]:
a = np.diag(np.arange(3))
a

In [None]:
a[1, 1]

In [None]:
a[2, 1] = 10 # third line, second column
a

In [None]:
a[1]

Note

-   In 2D, the first dimension corresponds to **rows**, the second to
    **columns**.
-   for multidimensional `a`, `a[0]` is interpreted by taking all
    elements in the unspecified dimensions.

**Slicing**: Arrays, like other Python sequences can also be sliced:

In [None]:
a = np.arange(10)
a

In [None]:
a[2:9:3] # [start:end:step]

Note that the last index is not included! :

In [None]:
a[:4]

All three slice components are not required: by default, $start$ is 0,
$end$ is the last and $step$ is 1:

In [None]:
a[1:3]

In [None]:
a[::2]

In [None]:
a[3:]

A small illustrated summary of NumPy indexing and slicing...


<img src="images/numpy_indexing.png" class="align-center" style="width:70.0%" alt="image" />

You can also combine assignment and slicing:

In [None]:
a = np.arange(10)
a[5:] = 10
a

In [None]:
b = np.arange(5)
a[5:] = b[::-1]
a

## Copies and views

A slicing operation creates a **view** on the original array, which is
just a way of accessing array data. Thus the original array is not
copied in memory. You can use `np.may_share_memory()` to check if two
arrays share the same memory block. Note however, that this uses
heuristics and may give you false positives.

**When modifying the view, the original array is modified as well**:

In [None]:
a = np.arange(10)
a

In [None]:
b = a[::2]
b

In [None]:
np.may_share_memory(a, b)

In [None]:
b[0] = 12
b

In [None]:
a   # (!)

In [None]:
a = np.arange(10)
c = a[::2].copy()  # force a copy
c[0] = 12
a

In [None]:
np.may_share_memory(a, c)

## Fancy indexing

NumPy arrays can be indexed with slices, but also with boolean or
integer arrays (**masks**). This method is called *fancy indexing*. It
creates **copies not views**.

### Using boolean masks

In [None]:
np.random.seed(3)
a = np.random.randint(0, 21, 15)
a

In [None]:
(a % 3 == 0)

In [None]:
mask = (a % 3 == 0)
extract_from_a = a[mask] # or,  a[a%3==0]
extract_from_a           # extract a sub-array with the mask

Indexing with a mask can be very useful to assign a new value to a
sub-array:

In [None]:
a[a % 3 == 0] = -1
a

### Indexing with an array of integers

In [None]:
a = np.arange(0, 100, 10)
a

Indexing can be done with an array of integers, where the same index is
repeated several time:

In [None]:
a[[2, 3, 2, 4, 2]]  # note: [2, 3, 2, 4, 2] is a Python list

New values can be assigned with this kind of indexing:

In [None]:
a[[9, 7]] = -100
a

Tip

When a new array is created by indexing with an array of integers, the
new array has the same shape as the array of integers:

In [None]:
a = np.arange(10)
idx = np.array([[3, 4], [9, 7]])
idx.shape

In [None]:
a[idx]

The image below illustrates various fancy indexing applications

<img src="images/numpy_fancy_indexing.png" class="align-center" style="width:80.0%" alt="image" />


# Numerical operations on arrays

### Basic operations

With scalars:

In [None]:
a = np.array([1, 2, 3, 4])
a + 1

In [None]:
2**a

All arithmetic operates elementwise:

In [None]:
b = np.ones(4) + 1
a - b

In [None]:
a * b

In [None]:
j = np.arange(5)
2**(j + 1) - j

These operations are of course much faster than if you did them in pure
python:

In [None]:
a = np.arange(10000)
%timeit a + 1  # doctest: +SKIP

In [None]:
l = range(10000)
%timeit [i+1 for i in l] # doctest: +SKIP

Warning

**Array multiplication is not matrix multiplication:**

In [None]:
c = np.ones((3, 3))
c * c                   # NOT matrix multiplication!

Note

**Matrix multiplication:**

In [None]:
c.dot(c)

### Other operations

**Comparisons:**

In [None]:
a = np.array([1, 2, 3, 4])
b = np.array([4, 2, 2, 4])
a == b

In [None]:
a > b

Tip

Array-wise comparisons:

In [None]:
a = np.array([1, 2, 3, 4])
b = np.array([4, 2, 2, 4])
c = np.array([1, 2, 3, 4])
np.array_equal(a, b)

In [None]:
np.array_equal(a, c)

**Logical operations:**

In [None]:
a = np.array([1, 1, 0, 0], dtype=bool)
b = np.array([1, 0, 1, 0], dtype=bool)
np.logical_or(a, b)

In [None]:
np.logical_and(a, b)

**Transcendental functions:**

In [None]:
a = np.arange(5)
np.sin(a)

In [None]:
np.log(a)

In [None]:
np.exp(a)

**Shape mismatches**

In [None]:
a = np.arange(4)
a + np.array([1, 2])  # doctest: +SKIP

*Broadcasting?* We'll return to that [later](broadcasting.ipynb).

**Transposition:**

In [None]:
a = np.triu(np.ones((3, 3)), 1)   # see help(np.triu)
a

In [None]:
a.T

Warning

**The transposition is a view**

As a result, the following code **is wrong** and will **not make a
matrix symmetric**:

In [None]:
a += a.T

It will work for small arrays (because of buffering) but fail for large
one, in unpredictable ways.

Note

**Linear algebra**

The sub-module `numpy.linalg` implements basic linear algebra, such as
solving linear systems, singular value decomposition, etc. However, it
is not guaranteed to be compiled using efficient routines, and thus we
recommend the use of `scipy.linalg`, as detailed in section
[scipy\_linalg](scipy_linalg.ipynb)

## Basic reductions

### Computing sums

In [None]:
x = np.array([1, 2, 3, 4])
np.sum(x)

In [None]:
x.sum()

<img src="images/reductions.png" class="align-right" alt="image" />

Sum by rows and by columns:

In [None]:
x = np.array([[1, 1], [2, 2]])
x

In [None]:
x.sum(axis=0)   # columns (first dimension)

In [None]:
x[:, 0].sum(), x[:, 1].sum()

In [None]:
x.sum(axis=1)   # rows (second dimension)

In [None]:
x[0, :].sum(), x[1, :].sum()

Tip

Same idea in higher dimensions:

In [None]:
x = np.random.rand(2, 2, 2)
x.sum(axis=2)[0, 1]     # doctest: +ELLIPSIS

In [None]:
x[0, 1, :].sum()     # doctest: +ELLIPSIS

### Other reductions

--- works the same way (and take `axis=`)

**Extrema:**

In [None]:
x = np.array([1, 3, 2])
x.min()

In [None]:
x.max()

In [None]:
x.argmin()  # index of minimum

In [None]:
x.argmax()  # index of maximum

**Logical operations:**

In [None]:
np.all([True, True, False])

In [None]:
np.any([True, True, False])

Note

Can be used for array comparisons:

In [None]:
a = np.zeros((100, 100))
np.any(a != 0)

In [None]:
np.all(a == a)

In [None]:
a = np.array([1, 2, 3, 2])
b = np.array([2, 2, 3, 2])
c = np.array([6, 4, 4, 5])
((a <= b) & (b <= c)).all()

**Statistics:**

In [None]:
x = np.array([1, 2, 3, 1])
y = np.array([[1, 2, 3], [5, 6, 1]])
x.mean()

In [None]:
np.median(x)

In [None]:
np.median(y, axis=-1) # last axis

In [None]:
x.std()          # full population standard dev.

... and many more (best to learn as you go).