Today we'll cover:

1. [Creating numpy arrays](#Creating-numpy-arrays)
2. [Basic operations](#Basic-operations)
3. [Indexing and slicing](#Indexing-and-slicing)

# Creating numpy arrays

In [1]:
import numpy as np  # this is the standard abbreviation of numpy

In [2]:
import random

In [3]:
my_list = [random.random() for i in range(5)]; print my_list

[0.8176142538740026, 0.7437416916786432, 0.49347699115133814, 0.70114328152635, 0.38155494666275935]


In [4]:
type(my_list)  # right now we have a list object

list

In [5]:
my_array = np.array(my_list)  # create an ndarray

In [6]:
print my_array

[ 0.81761425  0.74374169  0.49347699  0.70114328  0.38155495]


In [7]:
type(my_array)  # now we have an ndarray object

numpy.ndarray

In [8]:
nested_list = [[random.random() for i in range(3)] for j in range(3)]  # a list containing 3 3-element lists

In [9]:
print nested_list

[[0.35795635650325064, 0.9675529600512336, 0.45751405093364284], [0.6192066833460406, 0.10184913839031384, 0.036075807233291735], [0.5571095515370658, 0.6798052741439377, 0.17154825265959894]]


In [10]:
my_2d_array = np.array(nested_list)  # create a 2-d array

In [11]:
print my_2d_array  # 2-d arrays get printed nicely

[[ 0.35795636  0.96755296  0.45751405]
 [ 0.61920668  0.10184914  0.03607581]
 [ 0.55710955  0.67980527  0.17154825]]


In [12]:
my_2d_array.ndim  # returns number of dimensions in an array

2

In [13]:
my_2d_array.shape  # return tuple desribing array shape

(3, 3)

In [14]:
np.zeros((2,4))  # create zero matrix of specified shape

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

In [15]:
np.ones((3,2))  # create matrix of ones with a specified shape

array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

Sometimes we want to create an array of some size with the intention of overwriting all elements very soon. It is better to use `np.empty()` rather than `np.zeros()` in such cases since the former is faster. However, the entries an array returned by `np.empty()` can be garbage. Never rely on `np.empty()` returning you a zero array.

In [16]:
from timeit import Timer

In [17]:
Timer('zero_array = np.zeros((100, 50))', 'import numpy as np').timeit()

1.4560229778289795

In [18]:
Timer('empty_array = np.empty((100, 50))', 'import numpy as np').timeit()

0.4185659885406494

In [19]:
np.empty((2, 3))  # garbage values may be returned

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [20]:
np.eye(2)  # identity matrix

array([[ 1.,  0.],
       [ 0.,  1.]])

In [21]:
np.arange(10)  # returns an array containing 0 through 9

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
np.arange(10, 15)  # returns an array containing 10 through 14

array([10, 11, 12, 13, 14])

In [23]:
np.arange(0, 1, .1)  # returns number from 0 (inclusive) to 1 (exclusive) in steps of .1

array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])

# Basic operations

In [24]:
from math import cos, sin, pi

In [25]:
def rotation_matrix(angle):
    """ Return a rotation matrix that rotates points (anticlockwise) in 2-d by angle. """
    
    return np.array([[cos(angle), -sin(angle)],\
                     [sin(angle),  cos(angle)]])

In [26]:
rotate_30_degrees = rotation_matrix(pi/6)

In [27]:
print rotate_30_degrees

[[ 0.8660254 -0.5      ]
 [ 0.5        0.8660254]]


In [28]:
rotate_60_degrees = rotate_30_degrees.dot(rotate_30_degrees)  # use dot() method to multiply matrices

In [29]:
rotate_60_degrees - rotation_matrix(pi/3)  # equal up to machine precision issues

array([[  1.11022302e-16,   0.00000000e+00],
       [  0.00000000e+00,   1.11022302e-16]])

One can easily do basic arithmetic operations elementwise.

In [30]:
array1 = np.array([[1, 2], [3, 4]], 'float')  # if we don't provide the 'float' argument, numpy will infer int

In [31]:
array2 = np.array([[5, 6], [7, 8]], 'float')

In [32]:
array2 + array1  # elementwise addition

array([[  6.,   8.],
       [ 10.,  12.]])

In [33]:
array2 - array1  # elementwise subtraction

array([[ 4.,  4.],
       [ 4.,  4.]])

In [34]:
array2 * array1  # elementwise multiplication (warning for MATLAB users!)

array([[  5.,  12.],
       [ 21.,  32.]])

In [35]:
array2 / array1  # elementwise division

array([[ 5.        ,  3.        ],
       [ 2.33333333,  2.        ]])

In [36]:
1 / array1  # scalar divided by array, scalar operates elementwise on array

array([[ 1.        ,  0.5       ],
       [ 0.33333333,  0.25      ]])

In [37]:
1 + array1

array([[ 2.,  3.],
       [ 4.,  5.]])

In [38]:
array1 - 1

array([[ 0.,  1.],
       [ 2.,  3.]])

In [39]:
array1 ** 2  # squares each entry

array([[  1.,   4.],
       [  9.,  16.]])

# Indexing and slicing

Lets start with 1-d arrays. They seem similar to Python lists but there are differences. First, is that *broadcasting* works for 1-d arrays but not for lists.

In [40]:
list1 = range(5); print list1

[0, 1, 2, 3, 4]


In [41]:
array1 = np.array(list1)

In [42]:
list1[2:4]

[2, 3]

In [43]:
array1[2:4]

array([2, 3])

In [44]:
list1[2:4] = 0  # causes an error

TypeError: can only assign an iterable

So one has to resort to the following.

In [45]:
list1[2:4] = [0, 0]; print list1

[0, 1, 0, 0, 4]


However, for 1-d arrays, there's no problem. The value 0 is *broadcasted*.

In [46]:
array1[2:4] = 0; print array1

[0 1 0 0 4]


Another (big) difference is that slices for lists creates copies whereas for 1-d arrays it produces *views*: changing the view will change the original list the view was obtained from.

In [47]:
list_part = list1[2:4]; print list_part  # extract the 2 zeros, returns a copy

[0, 0]


In [48]:
list_part[:] = [2, 3]; print list1  # original list unchanged

[0, 1, 0, 0, 4]


In [49]:
array_part = array1[2:4]; print array_part

[0 0]


In [50]:
array_part[:] = [2, 3]; print array1  # array1 gets changed!

[0 1 2 3 4]


Let's look at indexing 2d arrays now.

In [51]:
my_2d_array = np.arange(9).reshape((3, 3)); print my_2d_array  # reshape changes shape of an array

[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [52]:
my_2d_array[1:]  # rows indexed 1 through the end (note row index 1 is actually row 2)

array([[3, 4, 5],
       [6, 7, 8]])

In [53]:
my_2d_array[0]  # row indexed 0, i.e. first row

array([0, 1, 2])

In [54]:
my_2d_array[0][2]  # 1st row, 3rd column

2

In [55]:
my_2d_array[0, 2]  # shorter, more convenient access to individual elements

2

But what if you wanted, rows indexed 0 and 2 (i.e. 1st and 3rd row)?

In [56]:
my_2d_array[[0, 2]]  # here, we used a list to index

array([[0, 1, 2],
       [6, 7, 8]])

When we use integers or `start:stop:step` notation to slice, it is called *basic slicing* in numpy. Basic slicing does *not* copy data but produces views. When we use a list as an index, it is called *advanced slicing* (or *fancy slicing*) in numpy. Advanced slicing does *copy* data.

In [57]:
rows23 = my_2d_array[1:]; print rows23

[[3 4 5]
 [6 7 8]]


In [58]:
rows23[1, 2] = 100; print my_2d_array  # original array changes since we used basic slicing

[[  0   1   2]
 [  3   4   5]
 [  6   7 100]]


In [59]:
rows23adv = my_2d_array[[1, 2]]; print rows23adv  # a case of advanced slicing

[[  3   4   5]
 [  6   7 100]]


In [60]:
rows23adv[1, 2] = 200; print my_2d_array  # original array remains unchanged

[[  0   1   2]
 [  3   4   5]
 [  6   7 100]]


Just as in indexing lists, we can use negative indices that count from the end.

In [61]:
my_2d_array[-2:]  # last two rows, basic indexing

array([[  3,   4,   5],
       [  6,   7, 100]])

In [62]:
my_2d_array[[-2,-1]]  # last two rows, advanced indexing

array([[  3,   4,   5],
       [  6,   7, 100]])

How to select a subarray (all elements in a rectangular regions formed by certain rows and columns) of an array? You might expect the following code to extract the 4 corner elements of a 3x3 array, i.e. elements:

    0,0  0,2
    2,0  2,2

In [63]:
my_2d_array[[0, 2], [0, 2]]  # this actually gives the elements 0,0 and 2,2 in a list!

array([  0, 100])

One way to get the desired result is to first get the desired rows followed by the desired columns.

In [64]:
desired_rows = my_2d_array[[0, 2]]; print desired_rows

[[  0   1   2]
 [  6   7 100]]


In [65]:
desired_rows[:, [0, 2]]

array([[  0,   2],
       [  6, 100]])

Of course, we can combine the two steps in one.

In [66]:
my_2d_array[[0, 2]][:, [0, 2]]

array([[  0,   2],
       [  6, 100]])

Perhaps a more readable way is to use the `np.ix_()` function to create an indexer that selects the rectangular region specified by two lists.

In [67]:
my_2d_array[np.ix_([0, 2], [0, 2])]

array([[  0,   2],
       [  6, 100]])

Finally let us see how *boolean indexing* works. Boolean indexing is a type of advanced indexing and so returns copies, not views.

In [68]:
cities = np.array(['Shanghai', 'Karachi', 'Beijing', 'Lagos', 'Istanbul', \
                   'Guangzhou', 'Mumbai', 'Moscow', 'Dhaka', 'Cairo'])  # top 10 cities by population

In [69]:
populations = np.array([24.15, 23.5, 21.15, 17.06, 14.16, \
                        12.7, 12.48, 12.11, 12.04, 11.92])  # their populations (in millions)

In [70]:
over_fifteen = populations > 15  # populations over 15 million

In [71]:
print over_fifteen  # this is a boolean ndarray

[ True  True  True  True False False False False False False]


In [72]:
print cities[over_fifteen]  # return those elements on cities where the boolean array over_fifteen has True values

['Shanghai' 'Karachi' 'Beijing' 'Lagos']


In [73]:
print cities[(10 <= populations) & (populations <= 20)]  # cities with populations bteween 10 and 20 millions

['Lagos' 'Istanbul' 'Guangzhou' 'Mumbai' 'Moscow' 'Dhaka' 'Cairo']


In [74]:
my_2d_array = np.arange(-4,5).reshape((3, 3)); print my_2d_array

[[-4 -3 -2]
 [-1  0  1]
 [ 2  3  4]]


In [75]:
my_2d_array[my_2d_array < 0] = 0  # set negative entries to zero

In [76]:
print my_2d_array

[[0 0 0]
 [0 0 1]
 [2 3 4]]
