# Short Numpy Tutorial

In [2]:
import numpy as np

## What is an ndarray?

The `ndarray` is the biggest contribution of numpy. An ndarray is

- a regular grid of N-dimensions,
- homogeneous by default (all the elements have the same type),
- contiguous block of memory with types corresponding to machine types (8-bit ints, 32 bit floats, 64-bit longs, ...).

## Building an array (inline)

We can build an array from Python lists:

In [3]:
arr = np.array([
    [1.2, 2.3, 4.0],
    [1.2, 3.4, 5.2],
    [0.0, 1.0, 1.3],
    [0.0, 1.0, 2e-1]])
print(arr)

[[1.2 2.3 4. ]
 [1.2 3.4 5.2]
 [0.  1.  1.3]
 [0.  1.  0.2]]


### Inspecting array properties

In [4]:
print(arr.dtype)
print(arr.ndim)  # why is two dim
print(arr.shape)

float64
2
(4, 3)


This array is of `float64` (at least on my computer, probably on yours too), it has 2 dimensions and its shape is 4 rows and 3 columns.

When constructing an array, we can explicitly specify the type:

In [5]:
iarr = np.array([1,2,3], np.uint8)

Arithmetic operations on the array **respect the type and can including rounding and overflow**!

In [5]:
arr *= 2.5
iarr *= 3
print(arr)
print(iarr)

[[  3.     5.75  10.  ]
 [  3.     8.5   13.  ]
 [  0.     2.5    3.25]
 [  0.     2.5    0.5 ]]
[3 6 9]


### Boolean operations

An important subset of operations with numpy arrays concerns using logical operators to build boolean arrays. For example:


In [6]:
is_greater_one = (arr >= 1.)
print(is_greater_one)

[[ True  True  True]
 [ True  True  True]
 [False  True  True]
 [False  True False]]


### Slicing & Dicing

We can use Python's `[]` operator to slice and dice the array:

In [7]:
print(arr[0,0]) # First row, first column
print(arr[1]) # The whole second row
print(arr[:,2]) # The third column

3.0
[  3.    8.5  13. ]
[ 10.    13.     3.25   0.5 ]


### Slices are views

Slices share memory with the original array!

In [8]:
print (arr)
print("Before: {}".format(arr[1,0]))
view = arr[1]
view[0] += 100
print("After: {}".format(arr[1]))

[[  3.     5.75  10.  ]
 [  3.     8.5   13.  ]
 [  0.     2.5    3.25]
 [  0.     2.5    0.5 ]]
Before: 3.0
After: [ 103.     8.5   13. ]


#### Visual illustration of slicing

In [9]:
a = np.array([
       [ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

sum = a.sum(axis = 1)
sum
percentage = 100 * a / sum
percentage
#a[0,3:5]
#a[4:,4:]
#a[2::2,2::2]

array([[   0.        ,    1.33333333,    1.48148148,    1.53846154,
           1.56862745,    1.58730159],
       [  66.66666667,   14.66666667,    8.88888889,    6.66666667,
           5.49019608,    4.76190476],
       [ 133.33333333,   28.        ,   16.2962963 ,   11.79487179,
           9.41176471,    7.93650794],
       [ 200.        ,   41.33333333,   23.7037037 ,   16.92307692,
          13.33333333,   11.11111111],
       [ 266.66666667,   54.66666667,   31.11111111,   22.05128205,
          17.25490196,   14.28571429],
       [ 333.33333333,   68.        ,   38.51851852,   27.17948718,
          21.17647059,   17.46031746]])

## Basic functions on arrays

In [10]:
print(arr)
arr.mean()

[[   3.      5.75   10.  ]
 [ 103.      8.5    13.  ]
 [   0.      2.5     3.25]
 [   0.      2.5     0.5 ]]


12.666666666666666

Also available: `max`, `min`, `sum`, `ptp` (point-to-point, i.e., difference between maximum and minimum values).

These functions can also work *axis-wise*:

In [11]:
arr.mean(axis=0)

array([ 26.5   ,   4.8125,   6.6875])

An important trick is to combine logical operations with A

In [12]:
arr.mean(axis=1)

array([  6.25      ,  41.5       ,   1.91666667,   1.        ])

In [13]:
is_greater_one = (arr > 1)
print(is_greater_one.mean())

0.75


### Broadcasting

<img src="./media/02/Broadcasting.jpg" alt="Broadcasting example" title="Broadcasting example" style="width: 500px;"/>

You can often perform operations

In [14]:
print(arr)
print("Now adding [1,1,0] to *every row*")
print()
arr += np.array([1,1,0])
print(arr)

[[   3.      5.75   10.  ]
 [ 103.      8.5    13.  ]
 [   0.      2.5     3.25]
 [   0.      2.5     0.5 ]]
Now adding [1,1,0] to *every row*

[[   4.      6.75   10.  ]
 [ 104.      9.5    13.  ]
 [   1.      3.5     3.25]
 [   1.      3.5     0.5 ]]


The exact [rules of how broadcasting work](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) are a bit complex to explain, but it generally works as expected. For example, if your data is a set of measurements for a sample, and your columns are the different types of measurements, then, you can easily remove the mean like this:

In [15]:
print(arr.mean(0))
arr -= arr.mean(0)
print(arr.mean(0))

[ 27.5      5.8125   6.6875]
[ 0.  0.  0.]
