# NumPy - Numerical Python

## What is it?
- it is the fundamental package for scientific programs in Python
- it provides multidimensional array objects
- densely packaged **arrays of homogenous type** -> benefits of locality of reference
- many operations on numpy arrays are **implemented in C**: Avoids general costs of loops in Python
- performance improvement can lead to a few orders of magnitude in speed boost
- sklearn, pandas, scipy, etc. all use numpy as building blocks
- Check out this post for more details: [StackOverflow post](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists).
- Fixed length. Cannot be extended -> create a new array and deletes the old one.

## How to create a Numpy array

In [7]:
import numpy as np

### 1-dimensional array

In [4]:
myList = [1,2,3,4]
myList

[1, 2, 3, 4]

In [5]:
np.array(myList)

array([1, 2, 3, 4])

### 2-dimensional array (Matrix)

In [10]:
# list of lists
myMat = [[1,2,3],[4,5,6]]
myMat

[[1, 2, 3], [4, 5, 6]]

In [9]:
np.array(myMat)

array([[1, 2, 3],
       [4, 5, 6]])

### Built-In Methods

In [11]:
np.arange(0,10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]:
# range returns an iterator (differs from python 2)
list(range(0,10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [20]:
list(range(0,10,2))

[0, 2, 4, 6, 8]

In [25]:
# no non-integer stepsize
#range(0,10,0.5)

In [29]:
# floating number as stepsize
np.arange(0,10,0.5)

array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,
        5.5,  6. ,  6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5])

In [28]:
# create array of length 3 with all elements zero
np.zeros(3)

array([ 0.,  0.,  0.])

In [34]:
# take a tuple representing the shape of the array as arg
np.zeros((2,2))

array([[ 0.,  0.],
       [ 0.,  0.]])

In [38]:
# creates array of evenly spaced values from min to max
np.linspace(0,2,9)

array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ,  1.25,  1.5 ,  1.75,  2.  ])

In [39]:
# identity matrix
np.eye(4)

array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

### random

#### uniform distributed

In [44]:
np.random.rand(5)

array([ 0.32500228,  0.57522857,  0.06786317,  0.21465654,  0.59846832])

In [45]:
np.random.rand(5,5)

array([[ 0.44158456,  0.16352295,  0.63666695,  0.70296788,  0.33978722],
       [ 0.83317187,  0.90850827,  0.29869523,  0.68684881,  0.02858418],
       [ 0.4821356 ,  0.19803163,  0.78104571,  0.40859967,  0.54422952],
       [ 0.89485177,  0.4961735 ,  0.30999902,  0.54726678,  0.06542967],
       [ 0.00523696,  0.33850881,  0.85550465,  0.43157827,  0.24174502]])

#### normally distributed
samples from standard normal dist: $\mu=0, \sigma=1$

In [46]:
np.random.randn(5)

array([-0.58051503,  0.41985119, -0.50242887,  0.89899488,  1.08071087])

There are many more built-in distributions like gamma, poisson, binomial, etc.

## Inspect your array

In [49]:
myArr = np.random.rand(2,4)
myArr

array([[ 0.31143535,  0.42312665,  0.32745027,  0.50898665],
       [ 0.23133176,  0.68460498,  0.88095821,  0.74177099]])

In [50]:
myArr.shape

(2, 4)

In [51]:
myArr.ndim

2

In [52]:
myArr.size

8

In [57]:
myArr.dtype

dtype('float64')

In [54]:
myArr.astype(int)

array([[0, 0, 0, 0],
       [0, 0, 0, 0]])

### Useful mathematical functions

In [59]:
a = np.random.rand(2,3)
a

array([[ 0.21659611,  0.79690572,  0.33964258],
       [ 0.817059  ,  0.24172437,  0.30365601]])

In [61]:
b = np.random.rand(2,3)
b

array([[ 0.87320455,  0.94781186,  0.77873384],
       [ 0.23927622,  0.42592952,  0.51555454]])

In [62]:
a + b

array([[ 1.08980066,  1.74471758,  1.11837642],
       [ 1.05633522,  0.66765389,  0.81921055]])

In [65]:
# concats the two lists -> basic arithemtics work differently
a.tolist() + b.tolist()

[[0.21659611191303763, 0.7969057164586218, 0.33964258025316585],
 [0.8170589973923345, 0.24172436737646108, 0.3036560132846985],
 [0.8732045454926779, 0.9478118601015023, 0.7787338413965581],
 [0.2392762218506147, 0.4259295226976141, 0.5155545415713951]]

In [67]:
a * 10

array([[ 2.16596112,  7.96905716,  3.3964258 ],
       [ 8.17058997,  2.41724367,  3.03656013]])

In [68]:
a.tolist() * 10

[[0.21659611191303763, 0.7969057164586218, 0.33964258025316585],
 [0.8170589973923345, 0.24172436737646108, 0.3036560132846985],
 [0.21659611191303763, 0.7969057164586218, 0.33964258025316585],
 [0.8170589973923345, 0.24172436737646108, 0.3036560132846985],
 [0.21659611191303763, 0.7969057164586218, 0.33964258025316585],
 [0.8170589973923345, 0.24172436737646108, 0.3036560132846985],
 [0.21659611191303763, 0.7969057164586218, 0.33964258025316585],
 [0.8170589973923345, 0.24172436737646108, 0.3036560132846985],
 [0.21659611191303763, 0.7969057164586218, 0.33964258025316585],
 [0.8170589973923345, 0.24172436737646108, 0.3036560132846985],
 [0.21659611191303763, 0.7969057164586218, 0.33964258025316585],
 [0.8170589973923345, 0.24172436737646108, 0.3036560132846985],
 [0.21659611191303763, 0.7969057164586218, 0.33964258025316585],
 [0.8170589973923345, 0.24172436737646108, 0.3036560132846985],
 [0.21659611191303763, 0.7969057164586218, 0.33964258025316585],
 [0.8170589973923345, 0.24172436

In [69]:
a / 10

array([[ 0.02165961,  0.07969057,  0.03396426],
       [ 0.0817059 ,  0.02417244,  0.0303656 ]])

In [70]:
a / b

array([[ 0.24804739,  0.84078471,  0.43614719],
       [ 3.41471038,  0.56752198,  0.58898912]])

In [71]:
np.divide(a,b)

array([[ 0.24804739,  0.84078471,  0.43614719],
       [ 3.41471038,  0.56752198,  0.58898912]])

In [74]:
c = np.random.rand(3,2)
c

array([[ 0.70376346,  0.33563841],
       [ 0.03007559,  0.09628492],
       [ 0.95985764,  0.330878  ]])

In [76]:
# inner product
np.dot(a,c)

array([[ 0.50240837,  0.26180823],
       [ 0.87375281,  0.39798389]])

In [87]:
# outer product
x = np.array([2,4,4])
y = np.array([1,2,3])

In [89]:
## A_ij = x_i * y_j
mat = np.outer(x,y)
mat

array([[ 2,  4,  6],
       [ 4,  8, 12],
       [ 4,  8, 12]])

### aggregations

In [90]:
mat.sum()

60

In [94]:
# column-wise
mat.sum(axis=0)

array([10, 20, 30])

In [95]:
# row-wise
mat.sum(axis=1)

array([12, 24, 24])

Summation over axis 0 means to sum over the first index, while keeping the other indices fixed.

$$A_j = \sum_i A_{ij}$$

In [98]:
tensor = np.random.rand(2,3,4)
tensor

array([[[ 0.46049735,  0.28459644,  0.25116909,  0.3359915 ],
        [ 0.28228591,  0.62887908,  0.40399372,  0.11138678],
        [ 0.98106258,  0.83669902,  0.00897513,  0.26268782]],

       [[ 0.66239688,  0.15276772,  0.61808073,  0.29216972],
        [ 0.16930657,  0.53448272,  0.58553405,  0.01580855],
        [ 0.60728688,  0.91084975,  0.26826993,  0.2631071 ]]])

In [101]:
tensor.sum(axis=0)
# shape (3,4)

array([[ 1.12289422,  0.43736416,  0.86924982,  0.62816123],
       [ 0.45159248,  1.1633618 ,  0.98952777,  0.12719534],
       [ 1.58834945,  1.74754878,  0.27724506,  0.52579492]])

In [102]:
tensor.sum(axis=2)
# shape(2,3)

array([[ 1.33225438,  1.42654549,  2.08942455],
       [ 1.72541505,  1.3051319 ,  2.04951366]])

## Indexing

In [106]:
listA = a.tolist()
listA

[[0.21659611191303763, 0.7969057164586218, 0.33964258025316585],
 [0.8170589973923345, 0.24172436737646108, 0.3036560132846985]]

In [108]:
listA[0][2]

0.33964258025316585

In [109]:
a[0][2]

0.33964258025316585

In [110]:
# better
a[0,2]

0.33964258025316585

### slicing

In [113]:
myMat = np.arange(1,10).reshape(3,3)
myMat

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [116]:
myMat[0,0:2]

array([1, 2])

In [117]:
myMat[0,:2]

array([1, 2])

In [120]:
myMat[0,:-1]

array([1, 2])

In [122]:
myMat[:2,:2]

array([[1, 2],
       [4, 5]])

### Reference vs. Copy
Almost everything in python is an object. Objects are in general passed by a reference (e.g. like Pointers in C).

Let me show you an example:

In [123]:
myMat

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [129]:
mySlice = myMat[:2,:2]
mySlice

array([[1, 2],
       [4, 5]])

In [131]:
mySlice[:] = 1
mySlice

array([[1, 1],
       [1, 1]])

In [132]:
myMat

array([[1, 1, 3],
       [1, 1, 6],
       [7, 8, 9]])

In [133]:
mySlice = myMat.copy()

In [134]:
mySlice[:] = 2

In [135]:
mySlice

array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

In [136]:
myMat

array([[1, 1, 3],
       [1, 1, 6],
       [7, 8, 9]])

### do not copy too large objects too often. Take care of your memory usage.

## Conditional Indexing

In [139]:
myMat > 1

array([[False, False,  True],
       [False, False,  True],
       [ True,  True,  True]], dtype=bool)

In [140]:
myMat[myMat > 1]

array([3, 6, 7, 8, 9])

In [141]:
myMat[(myMat > 1) & (myMat < 8)]

array([3, 6, 7])

We will use conditional indexing very often!

### additional information

In [143]:
# reshape often used to transform a 1 d object in a 2 d object (e.g. as a row)
vec = np.arange(1,10)
vec

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [144]:
vec.shape

(9,)

In [146]:
vec.reshape(9,1)

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])

In [148]:
vec.reshape(1,9)

array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [151]:
myMat.argmax()

8

In [152]:
myMat.argmin()

0

### Speed

In [12]:
%%time
sum(list(range(10**6)))

CPU times: user 16 ms, sys: 12 ms, total: 28 ms
Wall time: 29.6 ms


499999500000

In [13]:
%%time
np.sum(np.arange(10**6))

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 1.68 ms


499999500000