### Lesson outline

If you're familiar with NumPy (esp. the following operations), feel free to skim through this lesson.

- #### Create a NumPy array:
  - from a pandas dataframe: [pandas.DataFrame.values](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.values.html)
  - from a Python sequence: [numpy.array](http://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html)
  - with constant initial values: [numpy.ones, numpy.zeros](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html)
  - with random values: [numpy.random](http://docs.scipy.org/doc/numpy/reference/routines.random.html)

- #### Access array attributes: [shape](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html), [ndim](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ndim.html), [size](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.size.html), [dtype](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.dtype.html)
- #### Compute statistics: [sum](http://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html), [min](http://docs.scipy.org/doc/numpy/reference/generated/numpy.min.html), [max](http://docs.scipy.org/doc/numpy/reference/generated/numpy.max.html), [mean](http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html)
- #### Carry out arithmetic operations: [add](http://docs.scipy.org/doc/numpy/reference/generated/numpy.add.html), [subtract](http://docs.scipy.org/doc/numpy/reference/generated/numpy.subtract.html), [multiply](http://docs.scipy.org/doc/numpy/reference/generated/numpy.multiply.html), [divide](http://docs.scipy.org/doc/numpy/reference/generated/numpy.divide.html)
- #### Measure execution time: [time.time](https://docs.python.org/2/library/time.html#time.time), [profile](https://docs.python.org/2/library/profile.html)
- #### Manipulate array elements: [Using simple indices and slices](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#basic-slicing-and-indexing), [integer arrays](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#integer-array-indexing), [boolean arrays](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#boolean-array-indexing)

In [5]:
'''Creating NumPy arrays.'''

import numpy as np

def test_run():
    # List to 1D array
    print np.array([2, 3, 4])
    print ''
    
    #List of tuples to 2D array
    print np.array([(2, 3, 4), (5, 6, 7)])

if __name__ == '__main__':
    test_run()

[2 3 4]

[[2 3 4]
 [5 6 7]]


In [9]:
'''Arrays with initial values.'''

import numpy as np

def test_run():
    # Empty array
    print np.empty(5)
    print np.empty((5,4))
    
    #Arrays of 1s
    print np.ones((5,4))

if __name__ == '__main__':
    test_run()

[  0.00000000e+000   6.89941071e-310   6.89943153e-310   3.39387131e-317
   2.37151510e-322]
[[  6.89944150e-310   6.56353958e-317   1.01006264e+261   4.48861446e-120]
 [  1.67709469e+243   3.03574399e-152   8.05084511e+173   6.69433480e+151]
 [  2.44514154e-154   3.63537905e+233   4.83245960e+276   8.82085571e+199]
 [  9.34634029e+218   2.65856669e-260   2.13505411e+161   2.18072748e-153]
 [  4.97745931e+151   6.01346953e-154   1.24039117e+224   6.09109151e-114]]
[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]


In [13]:
'''Specify the datatype.'''

import numpy as np

def test_run():
    
    #Arrays of integers 1s
    print np.ones((5,4), dtype=np.int)

if __name__ == '__main__':
    test_run()

[[1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]
 [1 1 1 1]]


In [15]:
'''Generating random numbers.'''

import numpy as np

def test_run():
    
    #Generate an anrray full of rando, numbers, uniformly sampled from [0.0, 1.0)
    print np.random.random((5,4)) # Pass in a size tuple
    print ''
    
    # Sample numbers from a Gaussian (normal) distribution
    print 'Standard Normal'
    print np.random.normal(size=(2, 3)) # "Standard normal" (mean =0, s.d. = 1)
    print ''
    print 'Standard Normal'
    print np.random.normal(50,10, size=(2, 3)) # Change mean to 50 and s.d. = 10
    print ''
    
    #Random integers
    print 'A single integer'
    print np.random.randint(10) # A single integer in [0, 10)
    print ''
    print 'A single integer'
    print np.random.randint(0, 10) # Same as above, especifying [low, high) explicit
    print ''
    print '1d-array'
    print np.random.randint(0, 10, size = 5) # 5 random integers as a 1D array
    print ''
    print '2d-array'
    print np.random.randint(0, 10, size = (2, 3)) # 2x3 array of random integers
    
if __name__ == '__main__':
    test_run()

[[ 0.98002498  0.57936353  0.40416187  0.45760029]
 [ 0.66914955  0.0749194   0.55522073  0.04126315]
 [ 0.10187618  0.75788742  0.44318673  0.41842539]
 [ 0.87198883  0.80890351  0.48565408  0.68981486]
 [ 0.38924023  0.83339295  0.19694214  0.84770307]]

Standard Normal
[[ 0.20255373 -0.36719115 -0.20939396]
 [-0.97229616 -0.44531705 -0.1617242 ]]

Standard Normal
[[ 49.02978119  41.66887027  65.92429881]
 [ 41.58021424  50.39193175  46.07780296]]

A single integer
0

A single integer
9

1d-array
[0 1 0 3 7]

2d-array
[[9 8 5]
 [2 3 6]]


In [21]:
'''Array attributes.'''

import numpy as np

def test_run():
    
    a = np.random.random((5,4)) # 5x4 array of random numbers
    print a
    print a.shape
    print a.shape[0] # Number of rows
    print a.shape[1] # Number of columns
    print len(a.shape)
    print a.size
    print a.dtype
    

if __name__ == '__main__':
    test_run()

[[ 0.76444591  0.42017101  0.78568757  0.47444077]
 [ 0.77287955  0.68360112  0.71139951  0.34694677]
 [ 0.72109319  0.9797951   0.22239037  0.06959021]
 [ 0.09883535  0.17842835  0.79085698  0.885343  ]
 [ 0.46956974  0.84621499  0.58448892  0.65181972]]
(5, 4)
5
4
2
20
float64


In [29]:
'''Operations on arrays.'''

import numpy as np

def test_run():
    
    a = np.random.randint(0,10, size = (5,4)) # 5x4 random integers in [0, 10)
    print 'Array:\n', a
    
    #Sum of all elements
    print 'Sum of all elements:', a.sum()
    
    #Iterate over rows, to compute sum of each column
    print 'Sum of each column:', a.sum(axis=0)
    
    #Iterate over columns, to compute sum of each row
    print 'Sum of each row:', a.sum(axis=1)
    
    #Statistics: min, max, mean (accross rows, cols, and overall)
    print 'Minimum of each column:\n', a.min(axis=0)
    print 'Maximum of each row:\n', a.min(axis=1)
    print 'Mean of all elements:\n', a.min() # Leave out axis arg.
    
    
if __name__ == '__main__':
    test_run()

Array:
[[0 9 4 4]
 [6 9 0 8]
 [8 1 4 8]
 [1 8 2 1]
 [2 3 1 5]]
Sum of all elements: 84
Sum of each column: [17 30 11 26]
Sum of each row: [17 23 21 12 11]
Minimum of each column:
[0 1 0 1]
Maximum of each row:
[0 0 1 1 1]
Mean of all elements:
0


---
## Quiz: Locate Maximum Value

In [47]:
"""Locate maximum value."""

import numpy as np


def get_max_index(a):
    """Return the index of the maximum value in given 1D array."""
    return np.argmax(a)


def test_run():
    a = np.array([9, 6, 2, 3, 12, 14, 7, 10], dtype=np.int32)  # 32-bit integer array
    print "Array:", a
    
    # Find the maximum and its index in array
    print "Maximum value:", a.max()
    print "Index of max.:", get_max_index(a)


if __name__ == "__main__":
    test_run()


Array: [ 9  6  2  3 12 14  7 10]
Maximum value: 14
Index of max.: 5


---

In [48]:
'''Using time function.'''

import numpy as np
import time

def test_run():
    t1 = time.time()
    print 'ML4T'
    t2 = time.time()
    print 'The time taken by print statement is ', t2 - t1,'seconds'
    

if __name__ == '__main__':
    test_run()

ML4T
The time taken by print statement is  9.08374786377e-05 seconds


In [55]:
'''How fast is NumPy.'''

import numpy as np
from time import time

def how_long(func, *args):
    '''Execute funcion with given arguments, and measure execution time.'''
    t0 = time()
    result = func(*args) # All arguments are passed in as-is
    t1 = time()
    return result, t1- t0

def manual_mean(arr):
    '''Compute mean (average) of all elements in the given 2D array'''
    sum = 0
    for i in xrange(0, arr.shape[0]):
        for j in xrange (0, arr.shape[1]):
            sum = sum + arr[i, j]
    return sum / arr.size

def numpy_mean(arr):
    '''Compute mean (average) using NumPy'''
    return arr.mean()

def test_run():
    '''Function called by Test Run.'''
    nd1 = np.random.random((1000, 10000)) # Use a sufficiently large array
    
    #Time the two functions, retrieving results and execution times
    res_manual, t_manual = how_long(manual_mean, nd1)
    res_numpy, t_numpy = how_long(numpy_mean, nd1)
    print 'Manual: {:.6f} ({:.3f} secs.) vs NumPy: {:.6f} ({:.3f} secs.)'.format(res_manual, t_manual, res_numpy, t_numpy) 
    
    #Make sure both give us the same answer (upto some precision)
    assert abs(res_manual - res_numpy) <= 10e-6, 'Results aren´t equal!'  
    
    #Compute speedup
    speedup = t_manual / t_numpy
    print 'NumPy mean is', speedup, 'times faster than manual for loops.'

if __name__ == '__main__':
    test_run()

Manual: 0.500004 (1.491 secs.) vs NumPy: 0.500004 (0.007 secs.)
NumPy mean is 199.771391143 times faster than manual for loops.


In [22]:
'''Accessing array elements.'''

import numpy as np

def test_run():
    a = np.random.rand(5, 4)
    print 'Array:\n', a
    print''
    
    #Accessing element at position (3, 2)
    element = a[3, 2]
    print 'Position (3, 2):\n', element
    print''
    
    #Elements in defined range
    print 'Range (0, 1:3):\n', a[0, 1:3]
    print''
    
    #Top-left corner
    print 'Top-left corner :\n', a[0:2, 0:2]
    print''
    
    #Slicing
    #Note: Slice n:m:t specifies a range that starts at n, and stops before m, in steps of sizet
    print 'Slicing:', a[:, 0:3:2]

if __name__ == '__main__':
    test_run()

Array:
[[ 0.70471966  0.488766    0.72131719  0.59584046]
 [ 0.11730061  0.85680481  0.49135416  0.95255406]
 [ 0.00229705  0.24127687  0.33643233  0.65719103]
 [ 0.11624821  0.57739567  0.3062893   0.894219  ]
 [ 0.88178216  0.16968218  0.46442419  0.98048617]]

Position (3, 2):
0.306289299939

Range (0, 1:3):
[ 0.488766    0.72131719]

Top-left corner :
[[ 0.70471966  0.488766  ]
 [ 0.11730061  0.85680481]]

Slicing: [[ 0.70471966  0.72131719]
 [ 0.11730061  0.49135416]
 [ 0.00229705  0.33643233]
 [ 0.11624821  0.3062893 ]
 [ 0.88178216  0.46442419]]


In [26]:
'''Modifying array elements.'''

import numpy as np

def test_run():
    a = np.random.rand(5, 4)
    print 'Array:\n', a
    print''
    
    #Assigning a value to aa particular location
    a[0, 0] = 1
    print '\nModified (replaced one element):\n', a
    print''
    
    #Assingning a single value to an entire row
    a[0, :] = 2
    print '\nModified (replaced a row with a single value):\n', a
    print''
    
    #Assingning a list to a column in an array
    a[:, 3] = [1, 2, 3, 4, 5]
    print '\nModified (replaced a column with a list):\n', a
    print''
if __name__ == '__main__':
    test_run()

Array:
[[ 0.03000799  0.29923522  0.63106626  0.40993497]
 [ 0.14535125  0.07935721  0.02244874  0.75989961]
 [ 0.93776966  0.30107738  0.25096299  0.67044536]
 [ 0.95579093  0.81059579  0.29811526  0.42337705]
 [ 0.68052147  0.59356922  0.77569521  0.58312439]]


Modified (replaced one element):
[[ 1.          0.29923522  0.63106626  0.40993497]
 [ 0.14535125  0.07935721  0.02244874  0.75989961]
 [ 0.93776966  0.30107738  0.25096299  0.67044536]
 [ 0.95579093  0.81059579  0.29811526  0.42337705]
 [ 0.68052147  0.59356922  0.77569521  0.58312439]]


Modified (replaced a row with a single value):
[[ 2.          2.          2.          2.        ]
 [ 0.14535125  0.07935721  0.02244874  0.75989961]
 [ 0.93776966  0.30107738  0.25096299  0.67044536]
 [ 0.95579093  0.81059579  0.29811526  0.42337705]
 [ 0.68052147  0.59356922  0.77569521  0.58312439]]


Modified (replaced a column with a list):
[[ 2.          2.          2.          1.        ]
 [ 0.14535125  0.07935721  0.02244874  2.     

In [28]:
'''Indexing an array with another array.'''

import numpy as np

def test_run():
    a = np.random.rand(5)
    
    #Accessing using list of indices
    indices = np.array([1, 1, 2, 3])
    
    print a
    print a[indices]

if __name__ == '__main__':
    test_run()

[ 0.01286659  0.72581137  0.59752278  0.58280926  0.19408949]
[ 0.72581137  0.72581137  0.59752278  0.58280926]


In [40]:
'''Boolean or "mask" index arrays.'''

import numpy as np

def test_run():
    a = np.array([(20, 25, 10, 23, 26, 32, 10, 5, 0), (0, 2, 50, 20, 0, 1, 28, 5, 0)])
    print 'Array:\n', a
    print ''
    
    #Calculating mean
    mean = a.mean()
    print 'Mean:\n', mean
    print ''
    
    #Masking
    a[a<mean] = mean
    print 'Masking:\n', a
    
if __name__ == '__main__':
    test_run()

Array:
[[20 25 10 23 26 32 10  5  0]
 [ 0  2 50 20  0  1 28  5  0]]

Mean:
14.2777777778

Masking:
[[20 25 14 23 26 32 14 14 14]
 [14 14 50 20 14 14 28 14 14]]


In [46]:
'''Arithmetic operations.'''

import numpy as np

def test_run():
    a = np.array([(1, 2, 3, 4, 5), (10, 20, 30, 40, 50)])
    print 'Original array a:\n', a
    print ''
    
    b = np.array([(100, 200, 300, 400, 500), (1, 2, 3, 4, 5)])
    print 'Original array b:\n', b
    print ''
    
    #Multiply a by 2
    mean = a.mean()
    print 'Multiply a by 2:\n', 2*a
    print ''
    
    #Divide a by 2
    mean = a.mean()
    print 'Divide a by 2:\n', a/2.0
    
    #Add the two arrays
    print '\nAdd a + b:\n', a + b
    
    #Multiply a and b
    print '\nMultiply a * b:\n', a * b
    
    #Divide a and b
    print '\nDivide a / b:\n', a / b
    
if __name__ == '__main__':
    test_run()

Original array a:
[[ 1  2  3  4  5]
 [10 20 30 40 50]]

Original array b:
[[100 200 300 400 500]
 [  1   2   3   4   5]]

Multiply a by 2:
[[  2   4   6   8  10]
 [ 20  40  60  80 100]]

Divide a by 2:
[[  0.5   1.    1.5   2.    2.5]
 [  5.   10.   15.   20.   25. ]]

Add a + b:
[[101 202 303 404 505]
 [ 11  22  33  44  55]]

Multiply a * b:
[[ 100  400  900 1600 2500]
 [  10   40   90  160  250]]

Divide a / b:
[[ 0  0  0  0  0]
 [10 10 10 10 10]]


### Learning more NumPy

Resources from NumPy [User Guide](http://docs.scipy.org/doc/numpy/user/index.html) and [Reference](http://docs.scipy.org/doc/numpy/reference/index.html):

- #### [The N-dimensional array](http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html)
- #### [Data types](http://docs.scipy.org/doc/numpy/user/basics.types.html)
- #### [Array creation](http://docs.scipy.org/doc/numpy/user/basics.creation.html) [[more]](http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html)
- #### [Indexing](http://docs.scipy.org/doc/numpy/user/basics.indexing.html) [[more]](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html)
- #### [Broadcasting](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
- #### [Random sampling](http://docs.scipy.org/doc/numpy/reference/routines.random.html)
- #### [Mathematical functions](http://docs.scipy.org/doc/numpy/reference/routines.math.html)
- #### [Linear algebra](http://docs.scipy.org/doc/numpy/reference/routines.linalg.html)
