# Tutorial Brief

Numpy is a powerful set of tools to perform mathematical operations of on lists of numbers. It works faster than normal python in operations and can manipulate high dimentional arrays too.

## Table of Contents

[Importing the library](#Importing-the-library)

[Arrays](#Arrays)

[Creating Arrays](#Creating-arrays)

[Examining ndarray](#Examining-ndarray)

[Statistical Analysis](#Statistical-Analysis)

[Reshaping](#Reshaping)

[Acessing elements](#Acessing-elements)

- [Indexing](#Indexing)

- [Slicing](#Slicing)

- [Steping](#Steping)

[Array Math](#Array-Math)

[Broadcasting](#Broadcasting)

## Importing the library

This helps in writing code and it's almost a standard in scientific computing

In [1]:
import numpy as np

## Arrays

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the **rank** of the array; the **shape** of an array is a tuple of integers giving the size of the array along each dimension.

In [111]:
a = np.array([1, 2, 3]) # Create a rank 1 array
b = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int64) # Create a rank 2 array
print(a)
print(b)

[1 2 3]
[[1 2 3]
 [4 5 6]]


## Creating arrays

In [3]:
a = np.zeros((3,3))
b = np.ones((3,3))
c = np.full((3,3), 7)
d = np.eye(3)
e,f = np.mgrid[1:4, 1:4] # similar to meshgrid in Matlab
g = np.diag([1,2,3]) 
h = np.diag([1,2,3], k=1) # offset from the main diagonal
i = np.random.rand(3,3)
j = np.random.randn(3,3) # normal distribution
k = np.random.randint(3,3)

print(a,b,c,d,e,f,g,h,i,j,k,sep='\n\n')

SyntaxError: invalid syntax (<ipython-input-3-eedbc467ef16>, line 12)

**np.arange([start,] stop[,step,], dtype=None)**

In [113]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [114]:
np.arange(1,10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [115]:
np.arange(1, 10, 0.5)

array([ 1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,  5.5,  6. ,
        6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5])

In [116]:
np.arange(1, 10, 3)

array([1, 4, 7])

In [117]:
np.arange(1, 10, 2, dtype=np.float64)

array([ 1.,  3.,  5.,  7.,  9.])

**np.linspace(start, stop, num=50, endpoint=True, retstep=False)**

In [118]:
np.linspace(1,5)

array([ 1.        ,  1.08163265,  1.16326531,  1.24489796,  1.32653061,
        1.40816327,  1.48979592,  1.57142857,  1.65306122,  1.73469388,
        1.81632653,  1.89795918,  1.97959184,  2.06122449,  2.14285714,
        2.2244898 ,  2.30612245,  2.3877551 ,  2.46938776,  2.55102041,
        2.63265306,  2.71428571,  2.79591837,  2.87755102,  2.95918367,
        3.04081633,  3.12244898,  3.20408163,  3.28571429,  3.36734694,
        3.44897959,  3.53061224,  3.6122449 ,  3.69387755,  3.7755102 ,
        3.85714286,  3.93877551,  4.02040816,  4.10204082,  4.18367347,
        4.26530612,  4.34693878,  4.42857143,  4.51020408,  4.59183673,
        4.67346939,  4.75510204,  4.83673469,  4.91836735,  5.        ])

In [119]:
np.linspace(0,2,num=4)

array([ 0.        ,  0.66666667,  1.33333333,  2.        ])

In [120]:
np.linspace(0,2,num=4, endpoint=False)

array([ 0. ,  0.5,  1. ,  1.5])

## Examining ndarray

In [121]:
ds = np.array([[1,2,3],[4,5,6],[7,8,9]])
ds.ndim

2

In [122]:
ds.shape

(3, 3)

In [123]:
ds.size

9

In [124]:
ds.dtype

dtype('int64')

In [125]:
ds.itemsize # amount of bytes used per item

8

In [126]:
ds.size * ds.itemsize # Memory usage

72

## Statistical Analysis

In [127]:
data_set = np.random.random((2,3))
data_set

array([[ 0.61432007,  0.61719019,  0.03573928],
       [ 0.79542759,  0.09321297,  0.43294435]])

**np.max(a, axis=None, out=None, keepdims=False)**

In [128]:
np.max(data_set)

0.79542759469062052

In [129]:
np.max(data_set, axis=0)

array([ 0.79542759,  0.61719019,  0.43294435])

In [130]:
np.max(data_set, axis=1)

array([ 0.61719019,  0.79542759])

**np.min(a, axis=None, out=None, keepDims=False)**

In [131]:
np.min(data_set)

0.03573927518865283

**np.mean(a, axis=None, dtype=None, out=None, keepdims=False)**

In [132]:
np.mean(data_set)

0.43147240812386234

**np.median(a, axis=None, out=None, overwrite_input=False)**

In [133]:
np.median(data_set)

0.52363221038282881

**np.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)**

In [134]:
np.std(data_set)

0.28030164351135584

**np.sum(a, axis=None, dtype=None, out=None, keepdims=False)**

In [135]:
np.sum(data_set)

2.588834448743174

**np.prod(a, axis=None, dtype=None, out=None, keepdims=False)**

In [136]:
np.prod(data_set)

0.00043497923178453931

**np.cumsum(a, axis=None, dtype=None, out=None)**

In [137]:
np.cumsum(data_set)

array([ 0.61432007,  1.23151026,  1.26724953,  2.06267713,  2.1558901 ,
        2.58883445])

**np.cumprod(a, axis=None, dtype=None, out=None)**

In [138]:
np.cumprod(data_set)

array([  6.14320067e-01,   3.79152320e-01,   1.35506291e-02,
         1.07785443e-02,   1.00470009e-03,   4.34979232e-04])

## Reshaping

**np.reshape(a, newshape, order='C')**

In [139]:
np.reshape(data_set, (3,2))

array([[ 0.61432007,  0.61719019],
       [ 0.03573928,  0.79542759],
       [ 0.09321297,  0.43294435]])

In [140]:
np.reshape(data_set, (6,1))

array([[ 0.61432007],
       [ 0.61719019],
       [ 0.03573928],
       [ 0.79542759],
       [ 0.09321297],
       [ 0.43294435]])

In [141]:
np.reshape(data_set, (6))

array([ 0.61432007,  0.61719019,  0.03573928,  0.79542759,  0.09321297,
        0.43294435])

**np.ravel(a, order='C')**

In [142]:
np.ravel(data_set)

array([ 0.61432007,  0.61719019,  0.03573928,  0.79542759,  0.09321297,
        0.43294435])

## Acessing elements

### Indexing

In [143]:
data_set = np.random.random((5,10))
data_set

array([[ 0.03187512,  0.12752271,  0.25063379,  0.94554735,  0.44207334,
         0.44410079,  0.82455503,  0.58782105,  0.71200143,  0.58822202],
       [ 0.34508709,  0.55596429,  0.90256791,  0.55994242,  0.79408938,
         0.71954079,  0.52214751,  0.86999099,  0.26261972,  0.08448081],
       [ 0.81115699,  0.74206144,  0.14184639,  0.12022273,  0.44491771,
         0.93778794,  0.17722433,  0.49981591,  0.1410999 ,  0.96685005],
       [ 0.42135815,  0.57426511,  0.23643609,  0.46349003,  0.02976759,
         0.08478938,  0.46221483,  0.83345666,  0.13337179,  0.54992466],
       [ 0.03034135,  0.20239425,  0.36740003,  0.17138623,  0.40327059,
         0.2116867 ,  0.61496291,  0.94980147,  0.50153656,  0.88357325]])

In [144]:
data_set[1] # second line

array([ 0.34508709,  0.55596429,  0.90256791,  0.55994242,  0.79408938,
        0.71954079,  0.52214751,  0.86999099,  0.26261972,  0.08448081])

In [145]:
data_set[1][0]

0.34508708657955289

In [146]:
data_set[1,0]

0.34508708657955289

#### Integer array indexing

When you index into numpy arrays using slicing, the resulting array view will always be a subarray of the original array. In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array. Here is an example:

In [147]:
a = np.array([[1,2], [3,4], [5,6]])
a

array([[1, 2],
       [3, 4],
       [5, 6]])

In [148]:
print(a[[0,1,2],[0,1,0]])

[1 4 5]


In [149]:
print(np.array([a[0,0], a[1,1], a[2,0]])) # equivalent to above

[1 4 5]


#### Boolean array indexing

Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [150]:
a = np.array([[1,2], [3,4], [5,6]])
a

array([[1, 2],
       [3, 4],
       [5, 6]])

In [151]:
bool_idx = (a > 2)
print(bool_idx)

[[False False]
 [ True  True]
 [ True  True]]


In [152]:
print(a[bool_idx])

[3 4 5 6]


In [153]:
print(a[a>2])

[3 4 5 6]


### Slicing

In [154]:
data_set[2:4] # 3rd and 4th rows

array([[ 0.81115699,  0.74206144,  0.14184639,  0.12022273,  0.44491771,
         0.93778794,  0.17722433,  0.49981591,  0.1410999 ,  0.96685005],
       [ 0.42135815,  0.57426511,  0.23643609,  0.46349003,  0.02976759,
         0.08478938,  0.46221483,  0.83345666,  0.13337179,  0.54992466]])

In [155]:
data_set[2:4,0]

array([ 0.81115699,  0.42135815])

In [156]:
data_set[2:4,0:2]

array([[ 0.81115699,  0.74206144],
       [ 0.42135815,  0.57426511]])

In [157]:
data_set[:,0]

array([ 0.03187512,  0.34508709,  0.81115699,  0.42135815,  0.03034135])

### Steping

In [158]:
data_set[:,0:10:2] # 1st, 3rd, 5th, 7th, and 9th cols for all rows

array([[ 0.03187512,  0.25063379,  0.44207334,  0.82455503,  0.71200143],
       [ 0.34508709,  0.90256791,  0.79408938,  0.52214751,  0.26261972],
       [ 0.81115699,  0.14184639,  0.44491771,  0.17722433,  0.1410999 ],
       [ 0.42135815,  0.23643609,  0.02976759,  0.46221483,  0.13337179],
       [ 0.03034135,  0.36740003,  0.40327059,  0.61496291,  0.50153656]])

In [159]:
data_set[::]

array([[ 0.03187512,  0.12752271,  0.25063379,  0.94554735,  0.44207334,
         0.44410079,  0.82455503,  0.58782105,  0.71200143,  0.58822202],
       [ 0.34508709,  0.55596429,  0.90256791,  0.55994242,  0.79408938,
         0.71954079,  0.52214751,  0.86999099,  0.26261972,  0.08448081],
       [ 0.81115699,  0.74206144,  0.14184639,  0.12022273,  0.44491771,
         0.93778794,  0.17722433,  0.49981591,  0.1410999 ,  0.96685005],
       [ 0.42135815,  0.57426511,  0.23643609,  0.46349003,  0.02976759,
         0.08478938,  0.46221483,  0.83345666,  0.13337179,  0.54992466],
       [ 0.03034135,  0.20239425,  0.36740003,  0.17138623,  0.40327059,
         0.2116867 ,  0.61496291,  0.94980147,  0.50153656,  0.88357325]])

In [160]:
data_set[::2] # 1st, 3rd and 5th rows, all cols

array([[ 0.03187512,  0.12752271,  0.25063379,  0.94554735,  0.44207334,
         0.44410079,  0.82455503,  0.58782105,  0.71200143,  0.58822202],
       [ 0.81115699,  0.74206144,  0.14184639,  0.12022273,  0.44491771,
         0.93778794,  0.17722433,  0.49981591,  0.1410999 ,  0.96685005],
       [ 0.03034135,  0.20239425,  0.36740003,  0.17138623,  0.40327059,
         0.2116867 ,  0.61496291,  0.94980147,  0.50153656,  0.88357325]])

## Array Math

In [161]:
x = np.array([[1,2], [3,4]])
y = np.array([[5,6], [7,8]])

print(x+y)
print(np.add(x,y))

[[ 6  8]
 [10 12]]
[[ 6  8]
 [10 12]]


In [162]:
print(x-y)
print(np.subtract(x,y))

[[-4 -4]
 [-4 -4]]
[[-4 -4]
 [-4 -4]]


In [163]:
print(x*y)
print(np.multiply(x,y))

[[ 5 12]
 [21 32]]
[[ 5 12]
 [21 32]]


In [164]:
print(x/y)
print(np.divide(x,y))

[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]
[[ 0.2         0.33333333]
 [ 0.42857143  0.5       ]]


In [165]:
print(np.sqrt(x))

[[ 1.          1.41421356]
 [ 1.73205081  2.        ]]


Note that unlike MATLAB, * is elementwise multiplication, not matrix multiplication. We instead use the dot function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:

In [166]:
v = np.array([9,10])
w = np.array([11,12])

# Inner product
print(v.dot(w))
print(np.dot(v,w))

219
219


In [167]:
# Matrix/vector product
print(x.dot(v))
print(np.dot(x,v))

[29 67]
[29 67]


In [168]:
# Matrix/vector product
print(x.dot(y))
print(np.dot(x,y))

[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


In [169]:
# Transpose
print(x.T)

[[1 3]
 [2 4]]


List of all mathematical functions: <a href="http://docs.scipy.org/doc/numpy/reference/routines.math.html"> documentation</a>.

## Broadcasting

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:

In [170]:
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])
v = np.array([1,0,1])
print(x)
print(v)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[1 0 1]


In [171]:
vv = np.tile(v, (4,1)) # Stack 4 copies of v on top of each other
print(vv)

[[1 0 1]
 [1 0 1]
 [1 0 1]
 [1 0 1]]


In [172]:
y = x + vv
print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:

In [173]:
y = np.add(x,v)
print(y)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


Broadcasting two arrays together follows these rules:

- If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.
- The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.
- The arrays can be broadcast together if they are compatible in all dimensions.
- After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.
- In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension