# The Numpy array object

### Section contents

* What are Numpy and Numpy arrays?
* Reference documentation
* Import conventions
* Creating arrays
* Functions for creating arrays
* Basic data types
* Basic visualization
* Indexing and slicing
* Copies and views
* Fancy indexing


## What are Numpy and Numpy arrays?

**Python** objects


* high-level number objects: integers, floating point

* containers: lists (costless insertion and append), dictionaries (fast
lookup)


**Numpy** provides


* extension package to Python for multi-dimensional arrays

* closer to hardware (efficiency)

* designed for scientific computation (convenience)

* Also known as *array oriented computing*


In [None]:
import numpy as np
a = np.array([0, 1, 2, 3])
a

For example, An array containing:


* values of an experiment/simulation at discrete time steps

* signal recorded by a measurement device, e.g. sound wave

* pixels of an image, grey-level or colour

* 3-D data measured at different X-Y-Z positions, e.g. MRI scan

* ...


**Why it is useful:** Memory-efficient container that provides fast
numerical operations.


In [2]:
L = range(1000)
print L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,

In [3]:
%timeit [i**2 for i in L]

10000 loops, best of 3: 71 µs per loop


In [4]:
a = np.arange(1000)

NameError: name 'np' is not defined

In [None]:
%timeit a**2

## Reference documentation

* On the web: [http://docs.scipy.org](http://docs.scipy.org)/

* Interactive help:


In [None]:
np.array?

* Looking for something:


In [None]:
np.lookfor('create array') 

In [None]:
np.con*?

## Import conventions

The general convention to import numpy is:


In [None]:
import numpy as np

Using this style of import is recommended.


## Creating arrays

* **1-D**:


In [None]:
a = np.array([0, 1, 2, 3])
a

In [None]:
a.ndim

In [None]:
a.shape

In [None]:
len(a)

* **2-D, 3-D, ...**:


In [None]:
b = np.array([[0, 1, 2], [3, 4, 5]])    # 2 x 3 array
b

In [None]:
b.ndim

In [None]:
b.shape

In [None]:
len(b)     # returns the size of the first dimension

In [None]:
c = np.array([[[1], [2]], [[3], [4]]])
c

In [None]:
c.shape

## Exercise: Simple arrays

* Create simple one and two dimensional arrays. First, redo the examples
from above. And then create your own.

* Use the functions `len`, `shape` and `ndim` on some of those arrays and
observe their output.


## Functions for creating arrays

In practice, we rarely enter items one by one...


* Evenly spaced:


In [None]:
a = np.arange(10) # 0 .. n-1  (!)
a

In [None]:
b = np.arange(1, 9, 2) # start, end (exclusive), step
b

* or by number of points:


In [None]:
c = np.linspace(0, 1, 6)   # start, end, num-points
c

In [None]:
d = np.linspace(0, 1, 5, endpoint=False)
d

* Common arrays:


In [None]:
a = np.ones((3, 3))  # reminder: (3, 3) is a tuple
a

In [None]:
b = np.zeros((2, 2))
b

In [None]:
c = np.eye(3)
c

In [None]:
d = np.diag(np.array([1, 2, 3, 4]))
d

* `np.random` random numbers (Mersenne Twister PRNG):

In [None]:
a = np.random.rand(4)       # uniform in [0, 1]
a  

In [None]:
b = np.random.randn(4)      # Gaussian
b  

In [None]:
np.random.seed(1234)        # Setting the random seed

## Exercise: Creating arrays using functions

* Experiment with `arange`, `linspace`, `ones`, `zeros`, `eye` and `diag`.

* Create different kinds of arrays with random numbers.

* Try setting the seed before creating an array with random values.

* Look at the function `np.empty`. What does it do? When might this be
useful?


## Basic data types

You may have noticed that, in some instances, array elements are
displayed with a trailing dot (e.g. `2.` vs `2`). This is due to a
difference in the data-type used:


In [None]:
a = np.array([1, 2, 3])
a.dtype

In [None]:
b = np.array([1., 2., 3.])
b.dtype

## Tip

Different data-types allow us to store data more compactly in memory,
but most of the time we simply work with floating point numbers. Note
that, in the example above, NumPy auto-detects the data-type from the
input.


You can explicitly specify which data-type you want:


In [None]:
c = np.array([1, 2, 3], dtype=float)
c.dtype

The **default** data type is floating point:


In [None]:
a = np.ones((3, 3))
a.dtype

There are also other types:


Complex


In [None]:
d = np.array([1+2j, 3+4j, 5+6*1j])
d.dtype

Bool


In [None]:
e = np.array([True, False, False, True])
e.dtype

Strings


In [None]:
f = np.array(['Bonjour', 'Hello', 'Hallo',])
f.dtype     # <--- strings containing max. 7 letters

Much more


* `int32`

* `int64`

* `unit32`

* `unit64`


## Basic visualization

## Tip

Now that we have our first data arrays, we are going to visualize them.


Start by launching IPython in *pylab* mode.


Or the notebook:


Alternatively, if IPython has already been started:


In [None]:
%pylab  

Or, from the notebook:


In [None]:
%pylab inline

The `inline` is important for the notebook, so that plots are displayed
in the notebook and not in a new window.


*Matplotlib* is a 2D plotting package. We can import its functions as
below:


In [None]:
import matplotlib.pyplot as plt  # the tidy way

And then use (note that you have to use `show` explicitly):


In [None]:
plt.plot(x, y)       # line plot    
plt.show()           # <-- shows the plot (not needed with pylab) 

Or, if you are using *pylab*:


In [None]:
plot(x, y)       # line plot    

Using `import matplotlib.pyplot as plt` is recommended for use in
scripts. Whereas `pylab` is recommended for interactive exploratory
work.


* **1D plotting**:


In [None]:
x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)
plt.plot(x, y)       # line plot    

In [None]:
plt.plot(x, y, 'o')  # dot plot    

* **2D arrays** (such as images):


In [None]:
image = np.random.rand(30, 30)
plt.imshow(image, cmap=plt.cm.hot)    
plt.colorbar()    

More in the Matplotlib tutorial this afternoon


## Exercise: Simple visualizations

* Plot some simple arrays.

* Try to use both the IPython shell and the notebook, if possible.

* Try using the `gray` colormap.


## Indexing and slicing

The items of an array can be accessed and assigned to the same way as
other Python sequences (e.g. lists):


In [None]:
a = np.arange(10)
a

In [None]:
a[0], a[2], a[-1]

## Warning

Indices begin at 0, like other Python sequences (and C/C++). In
contrast, in Fortran or Matlab, indices begin at 1.


The usual python idiom for reversing a sequence is supported:


In [None]:
a[::-1]

For multidimensional arrays, indexes are tuples of integers:


In [None]:
a = np.diag(np.arange(3))
a

In [None]:
a[1, 1]

In [None]:
a[2, 1] = 10 # third line, second column
a

In [None]:
a[1]

Note that:


* In 2D, the first dimension corresponds to rows, the second to columns.

* Let us repeat together: the first dimension corresponds to **rows**, the
second to **columns**.

* for multidimensional `a`, `a[0]` is interpreted by taking all elements
in the unspecified dimensions.


**Slicing** Arrays, like other Python sequences can also be sliced:


In [None]:
a = np.arange(10)
a

In [None]:
a[2:9:3] # [start:end:step]

Note that the last index is not included! :


In [None]:
a[:4]

All three slice components are not required: by default, \`start\` is 0,
\`end\` is the last and \`step\` is 1:


In [None]:
a[1:3]

In [None]:
a[::2]

In [None]:
a[3:]

A small illustrated summary of Numpy indexing and slicing...


In [None]:
from IPython.display import Image
Image(filename='images/numpy_indexing.png')

You can also combine assignment and slicing:


In [None]:
a = np.arange(10)
a[5:] = 10
a

In [None]:
b = np.arange(5)
a[5:] = b[::-1]
a

## Exercise: Indexing and slicing

* Try the different flavours of slicing, using `start`, `end` and `step`.

* Verify that the slices in the diagram above are indeed correct. You may
use the following expression to create the array:


In [None]:
np.arange(6) + np.arange(0, 51, 10)[:, np.newaxis]

* Try assigning a smaller 2D array to a larger 2D array, like in the 1D
example above.

* Use a different step, e.g. `-2`, in the reversal idiom above. What
effect does this have?


## Exercise: Array creation

Create the following arrays (with correct data types):

Par on course: 3 statements for each


*Hint*: Individual array elements can be accessed similarly to a list,
e.g. `a[1]` or `a[1, 2]`.


*Hint*: Examine the docstring for `diag`.


## Exercise: Tiling for array creation

Skim through the documentation for `np.tile`, and use this function to
construct the array:

## Copies and views

A slicing operation creates a **view** on the original array, which is
just a way of accessing array data. Thus the original array is not
copied in memory. You can use `np.may_share_memory()` to check if two
arrays share the same memory block. Note however, that this uses
heuristics and may give you false positives.


**When modifying the view, the original array is modified as well**:


In [None]:
a = np.arange(10)
a

In [None]:
b = a[::2]
b

In [None]:
np.may_share_memory(a, b)

In [None]:
b[0] = 12
b

In [None]:
a   # (!)

In [None]:
a = np.arange(10)
c = a[::2].copy()  # force a copy
c[0] = 12
a

In [None]:
np.may_share_memory(a, c)

This behavior can be surprising at first sight... but it allows to save
both memory and time.


## Worked example: Prime number sieve

In [None]:
from IPython.display import Image
Image(filename='images/prime-sieve.png')

Compute prime numbers in 0--99, with a sieve


* Construct a shape (100,) boolean array `is_prime`, filled with True in
the beginning:


In [None]:
is_prime = np.ones((100,), dtype=bool)

* Cross out 0 and 1 which are not primes:


In [None]:
is_prime[:2] = 0

* For each integer `j` starting from 2, cross out its higher multiples:


In [None]:
N_max = int(np.sqrt(len(is_prime)))
for j in range(2, N_max):
    is_prime[2*j::j] = False

* Skim through `help(np.nonzero)`, and print the prime numbers

* Follow-up:

    * Move the above code into a script file named `prime_sieve.py`

    * Run it to check it works

    * Use the optimization suggested in [the sieve of
Eratosthenes](http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes):

      * Skip `j` which are already known to not be primes

      * The first number to cross out is $j^2$


## Fancy indexing

## Tip

Numpy arrays can be indexed with slices, but also with boolean or
integer arrays (**masks**). This method is called *fancy indexing*. It
creates **copies not views**.


### Using boolean masks

In [None]:
np.random.seed(3)
a = np.random.random_integers(0, 20, 15)
a

In [None]:
(a % 3 == 0)

In [None]:
mask = (a % 3 == 0)
extract_from_a = a[mask] # or,  a[a%3==0]
extract_from_a           # extract a sub-array with the mask

Indexing with a mask can be very useful to assign a new value to a
sub-array:


In [None]:
a[a % 3 == 0] = -1
a

### Indexing with an array of integers

In [None]:
a = np.arange(0, 100, 10)
a

Indexing can be done with an array of integers, where the same index is
repeated several time:


In [None]:
a[[2, 3, 2, 4, 2]]  # note: [2, 3, 2, 4, 2] is a Python list

New values can be assigned with this kind of indexing:


In [None]:
a[[9, 7]] = -100
a

### Tip

When a new array is created by indexing with an array of integers, the
new array has the same shape than the array of integers:


In [None]:
a = np.arange(10)
idx = np.array([[3, 4], [9, 7]])
idx.shape

In [None]:
a[idx]

The image below illustrates various fancy indexing applications


In [None]:
from IPython.display import Image
Image(filename='images/numpy_fancy_indexing.png')

### Exercise: Fancy indexing

* Again, verify the fancy indexing shown in the diagram above.

* Use fancy indexing on the left and array creation on the right to assign
values from a smaller array to a larger array.
