# The Numpy array object

### Section contents

* What are Numpy and Numpy arrays?
* Reference documentation
* Import conventions
* Creating arrays
* Functions for creating arrays
* Basic data types
* Basic visualization
* Indexing and slicing
* Copies and views
* Fancy indexing


## What are Numpy and Numpy arrays?

**Python** objects


* high-level number objects: integers, floating point

* containers: lists (costless insertion and append), dictionaries (fast
lookup)


**Numpy** provides


* extension package to Python for multi-dimensional arrays

* closer to hardware (efficiency)

* designed for scientific computation (convenience)

* Also known as *array oriented computing*


In [1]:
import numpy as np
a = np.array([0, 1, 2, 3])
a

array([0, 1, 2, 3])

For example, An array containing:


* values of an experiment/simulation at discrete time steps

* signal recorded by a measurement device, e.g. sound wave

* pixels of an image, grey-level or colour

* 3-D data measured at different X-Y-Z positions, e.g. MRI scan

* ...


**Why it is useful:** Memory-efficient container that provides fast
numerical operations.


In [2]:
L = range(1000)

In [3]:
%timeit [i**2 for i in L]

1000 loops, best of 3: 292 µs per loop


In [4]:
a = np.arange(1000)

In [5]:
%timeit a**2

The slowest run took 75.99 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.27 µs per loop


## Reference documentation

* On the web: [http://docs.scipy.org](http://docs.scipy.org)/

* Interactive help:


In [6]:
np.array?

* Looking for something:


In [84]:
np.lookfor('dim') 

Search results for 'dim'
------------------------
numpy.ndim
    Return the number of dimensions of an array.
numpy.alen
    Return the length of the first dimension of the input array.
numpy.rank
    Return the number of dimensions of an array.
numpy.tile
    Construct an array by repeating A the number of times given by reps.
numpy.rot90
    Rotate an array by 90 degrees in the counter-clockwise direction.
numpy.stack
    Join a sequence of arrays along a new axis.
numpy.delete
    Return a new array with sub-arrays along an axis deleted. For a one
numpy.ma.ndim
    Return the number of dimensions of an array.
numpy.interp
    One-dimensional linear interpolation.
numpy.nditer
    Efficient multi-dimensional iterator object to iterate over arrays.
numpy.poly1d
    A one-dimensional polynomial class.
numpy.ndindex
    An N-dimensional iterator object to index arrays.
numpy.nonzero
    Return the indices of the elements that are non-zero.
numpy.squeeze
    Remove single-dimensional ent

In [8]:
np.con*?

## Import conventions

The general convention to import numpy is:


In [9]:
import numpy as np

Using this style of import is recommended.


## Creating arrays

* **1-D**:


In [51]:
a = np.array([0, 1, 2, 3])
a

array([0, 1, 2, 3])

In [83]:
a.sdim 

AttributeError: 'numpy.ndarray' object has no attribute 'sdim'

In [53]:
a.shape

(4,)

In [54]:
len(a)

4

* **2-D, 3-D, ...**:


In [14]:
b = np.array([[0, 1, 2], [3, 4, 5]])    # 2 x 3 array
b

array([[0, 1, 2],
       [3, 4, 5]])

In [15]:
b.ndim

2

In [16]:
b.shape

(2, 3)

In [17]:
len(b)     # returns the size of the first dimension

2

In [18]:
c = np.array([[[1], [2]], [[3], [4]]])
c

array([[[1],
        [2]],

       [[3],
        [4]]])

In [19]:
c.shape

(2, 2, 1)

## Exercise: Simple arrays

* Create simple one and two dimensional arrays. First, redo the examples
from above. And then create your own.

* Use the functions `len`, `shape` and `ndim` on some of those arrays and
observe their output.


## Functions for creating arrays

In practice, we rarely enter items one by one...


* Evenly spaced:


In [20]:
a = np.arange(10) # 0 .. n-1  (!)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]:
b = np.arange(1, 9, 2) # start, end (exclusive), step
b

array([1, 3, 5, 7])

* or by number of points:


In [22]:
c = np.linspace(0, 1, 6)   # start, end, num-points
c

array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

In [23]:
d = np.linspace(0, 1, 5, endpoint=False)
d

array([ 0. ,  0.2,  0.4,  0.6,  0.8])

* Common arrays:


In [24]:
a = np.ones((3, 3))  # reminder: (3, 3) is a tuple
a

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [25]:
b = np.zeros((2, 2))
b

array([[ 0.,  0.],
       [ 0.,  0.]])

In [26]:
c = np.eye(3)
c

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [27]:
d = np.diag(np.array([1, 2, 3, 4]))
d

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

* `np.random` random numbers (Mersenne Twister PRNG):

In [28]:
a = np.random.rand(4)       # uniform in [0, 1]
a  

array([ 0.85799984,  0.19691867,  0.11838514,  0.44122568])

In [29]:
b = np.random.randn(4)      # Gaussian
b  

array([-0.3497488 ,  0.46277585, -0.10173488, -1.60151519])

In [30]:
np.random.seed(1234)        # Setting the random seed

## Exercise: Creating arrays using functions

* Experiment with `arange`, `linspace`, `ones`, `zeros`, `eye` and `diag`.

* Create different kinds of arrays with random numbers.

* Try setting the seed before creating an array with random values.

* Look at the function `np.empty`. What does it do? When might this be
useful?


## Basic data types

You may have noticed that, in some instances, array elements are
displayed with a trailing dot (e.g. `2.` vs `2`). This is due to a
difference in the data-type used:


In [31]:
a = np.array([1, 2, 3])
a.dtype

dtype('int64')

In [32]:
b = np.array([1., 2., 3.])
b.dtype

dtype('float64')

## Tip

Different data-types allow us to store data more compactly in memory,
but most of the time we simply work with floating point numbers. Note
that, in the example above, NumPy auto-detects the data-type from the
input.


You can explicitly specify which data-type you want:


In [33]:
c = np.array([1, 2, 3], dtype=float)
c.dtype

dtype('float64')

The **default** data type is floating point:


In [34]:
a = np.ones((3, 3))
a.dtype

dtype('float64')

There are also other types:


Complex


In [35]:
d = np.array([1+2j, 3+4j, 5+6*1j])
d.dtype

dtype('complex128')

Bool


In [36]:
e = np.array([True, False, False, True])
e.dtype

dtype('bool')

Strings


In [37]:
f = np.array(['Bonjour', 'Hello', 'Hallo',])
f.dtype     # <--- strings containing max. 7 letters

dtype('<U7')

Much more


* `int32`

* `int64`

* `unit32`

* `unit64`


## Basic visualization

## Tip

Now that we have our first data arrays, we are going to visualize them.


Start by launching IPython in *pylab* mode.


Or the notebook:


Alternatively, if IPython has already been started:


In [38]:
%pylab  

Using matplotlib backend: MacOSX
Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy
  "\n`%matplotlib` prevents importing * from pylab and numpy"


Or, from the notebook:


In [39]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


The `inline` is important for the notebook, so that plots are displayed
in the notebook and not in a new window.


*Matplotlib* is a 2D plotting package. We can import its functions as
below:


In [40]:
import matplotlib.pyplot as plt  # the tidy way

And then use (note that you have to use `show` explicitly):


In [41]:
plt.plot(x, y)       # line plot    
plt.show()           # <-- shows the plot (not needed with pylab) 

NameError: name 'x' is not defined

Or, if you are using *pylab*:


In [None]:
plot(x, y)       # line plot    

Using `import matplotlib.pyplot as plt` is recommended for use in
scripts. Whereas `pylab` is recommended for interactive exploratory
work.


* **1D plotting**:


In [None]:
x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)
plt.plot(x, y)       # line plot    

In [None]:
plt.plot(x, y, 'o')  # dot plot    

* **2D arrays** (such as images):


In [None]:
image = np.random.rand(30, 30)
plt.imshow(image, cmap=plt.cm.hot)    
plt.colorbar()    

More in the Matplotlib tutorial this afternoon


## Exercise: Simple visualizations

* Plot some simple arrays.

* Try to use both the IPython shell and the notebook, if possible.

* Try using the `gray` colormap.


## Indexing and slicing

The items of an array can be accessed and assigned to the same way as
other Python sequences (e.g. lists):


In [None]:
a = np.arange(10)
a

In [None]:
a[0], a[2], a[-1]

## Warning

Indices begin at 0, like other Python sequences (and C/C++). In
contrast, in Fortran or Matlab, indices begin at 1.


The usual python idiom for reversing a sequence is supported:


In [None]:
a[::-1]

For multidimensional arrays, indexes are tuples of integers:


In [None]:
a = np.diag(np.arange(3))
a

In [None]:
a[1, 1]

In [None]:
a[2, 1] = 10 # third line, second column
a

In [None]:
a[1]

Note that:


* In 2D, the first dimension corresponds to rows, the second to columns.

* Let us repeat together: the first dimension corresponds to **rows**, the
second to **columns**.

* for multidimensional `a`, `a[0]` is interpreted by taking all elements
in the unspecified dimensions.


**Slicing** Arrays, like other Python sequences can also be sliced:


In [None]:
a = np.arange(10)
a

In [None]:
a[2:9:3] # [start:end:step]

Note that the last index is not included! :


In [None]:
a[:4]

All three slice components are not required: by default, \`start\` is 0,
\`end\` is the last and \`step\` is 1:


In [None]:
a[1:3]

In [None]:
a[::2]

In [None]:
a[3:]

A small illustrated summary of Numpy indexing and slicing...


In [None]:
from IPython.display import Image
Image(filename='images/numpy_indexing.png')

You can also combine assignment and slicing:


In [None]:
a = np.arange(10)
a[5:] = 10
a

In [None]:
b = np.arange(5)
a[5:] = b[::-1]
a

## Exercise: Indexing and slicing

* Try the different flavours of slicing, using `start`, `end` and `step`.

* Verify that the slices in the diagram above are indeed correct. You may
use the following expression to create the array:


In [None]:
np.arange(6) + np.arange(0, 51, 10)[:, np.newaxis]

* Try assigning a smaller 2D array to a larger 2D array, like in the 1D
example above.

* Use a different step, e.g. `-2`, in the reversal idiom above. What
effect does this have?


## Exercise: Array creation

Create the following arrays (with correct data types):

Par on course: 3 statements for each


*Hint*: Individual array elements can be accessed similarly to a list,
e.g. `a[1]` or `a[1, 2]`.


*Hint*: Examine the docstring for `diag`.


## Exercise: Tiling for array creation

Skim through the documentation for `np.tile`, and use this function to
construct the array:

## Copies and views

A slicing operation creates a **view** on the original array, which is
just a way of accessing array data. Thus the original array is not
copied in memory. You can use `np.may_share_memory()` to check if two
arrays share the same memory block. Note however, that this uses
heuristics and may give you false positives.


**When modifying the view, the original array is modified as well**:


In [None]:
a = np.arange(10)
a

In [None]:
b = a[::2]
b

In [None]:
np.may_share_memory(a, b)

In [None]:
b[0] = 12
b

In [None]:
a   # (!)

In [None]:
a = np.arange(10)
c = a[::2].copy()  # force a copy
c[0] = 12
a

In [None]:
np.may_share_memory(a, c)

This behavior can be surprising at first sight... but it allows to save
both memory and time.


## Worked example: Prime number sieve

In [None]:
from IPython.display import Image
Image(filename='images/prime-sieve.png')

Compute prime numbers in 0--99, with a sieve


* Construct a shape (100,) boolean array `is_prime`, filled with True in
the beginning:


In [None]:
is_prime = np.ones((100,), dtype=bool)

* Cross out 0 and 1 which are not primes:


In [None]:
is_prime[:2] = 0

* For each integer `j` starting from 2, cross out its higher multiples:


In [None]:
N_max = int(np.sqrt(len(is_prime)))
for j in range(2, N_max):
    is_prime[2*j::j] = False

* Skim through `help(np.nonzero)`, and print the prime numbers

* Follow-up:

    * Move the above code into a script file named `prime_sieve.py`

    * Run it to check it works

    * Use the optimization suggested in [the sieve of
Eratosthenes](http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes):

      * Skip `j` which are already known to not be primes

      * The first number to cross out is $j^2$


## Fancy indexing

## Tip

Numpy arrays can be indexed with slices, but also with boolean or
integer arrays (**masks**). This method is called *fancy indexing*. It
creates **copies not views**.


### Using boolean masks

In [None]:
np.random.seed(3)
a = np.random.random_integers(0, 20, 15)
a

In [None]:
(a % 3 == 0)

In [None]:
mask = (a % 3 == 0)
extract_from_a = a[mask] # or,  a[a%3==0]
extract_from_a           # extract a sub-array with the mask

Indexing with a mask can be very useful to assign a new value to a
sub-array:


In [None]:
a[a % 3 == 0] = -1
a

### Indexing with an array of integers

In [None]:
a = np.arange(0, 100, 10)
a

Indexing can be done with an array of integers, where the same index is
repeated several time:


In [None]:
a[[2, 3, 2, 4, 2]]  # note: [2, 3, 2, 4, 2] is a Python list

New values can be assigned with this kind of indexing:


In [None]:
a[[9, 7]] = -100
a

### Tip

When a new array is created by indexing with an array of integers, the
new array has the same shape than the array of integers:


In [None]:
a = np.arange(10)
idx = np.array([[3, 4], [9, 7]])
idx.shape

In [None]:
a[idx]

The image below illustrates various fancy indexing applications


In [None]:
from IPython.display import Image
Image(filename='images/numpy_fancy_indexing.png')

### Exercise: Fancy indexing

* Again, verify the fancy indexing shown in the diagram above.

* Use fancy indexing on the left and array creation on the right to assign
values from a smaller array to a larger array.
