# Data Computing with Python

### Quick start to Numpy

#### 1. Creating Arrays
With Numpy, we can create multi-dimensional arrays:

In [164]:
import numpy as np

# Create a 1-D Array
arr = np.array([1, 2, 3, 4, 5])
arr

array([1, 2, 3, 4, 5])

In [165]:
# Create a 2-D Array
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [166]:
# Create a 3-D Array
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
arr

array([[[1, 2, 3],
        [4, 5, 6]],

       [[1, 2, 3],
        [4, 5, 6]]])

#### 2. Array Indexing
Indexing arrays is similar to Python lists:

In [167]:
arr = np.array([1, 2, 3, 4])

# Get the first element from the array
arr[0]

1

In [168]:
# Get third and fourth elements from the array and sum them.
arr[2] + arr[3]

7

#### 3. Array Slicing
We can get a slice of an array like this: [start:end].

In [169]:
arr = np.array([1, 2, 3, 4, 5, 6, 7])

# Slice elements from index 1 to index 5 from the array
arr[1:5]

array([2, 3, 4, 5])

#### 4. Array Shape
The shape of an array is the number of elements in each dimension. We can get the current shape of an array like this:

In [170]:
# Create a 2-D array
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

# Get the shape
arr.shape
# (2, 4) means that the array has two dimensions, the first dimension has two elements and the second has four.

(2, 4)

#### 5. Reshaping arrays
Reshape an array from 1-D to 2-D:

In [171]:
# Create a 1-D array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [172]:
# Convert the 1-D array with 12 elements into a 2-D array. 
# The outermost dimension will have 4 arrays, each with 3 elements
new_arr = arr.reshape(4, 3)
new_arr

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

### Strategies for speeding up code with Numpy
#### 1. Ufuncs
Ufuncs, short for universal functions, offers a bunch of operators that operate element by element on entire arrays.

For example, we want to add the elements of list a and b. 

In [173]:
# Without Numpy, we can use the built-in zip() method:
a = list(range(1001))             # [0, 1, 2, 3, 4..., 999, 1000]
b = list(reversed(range(1001)))   # [1000, 999, 998..., 1, 0]

c = [i+j for i, j in zip(a, b)]  # returns [1000, 1000..., 1000, 1000]

In [174]:
# Instead of looping the list, we can just simply use the add() from Numpy ufunc:
a = np.array(list(range(1001)))
b = np.array(list(reversed(range(1001))))

c = np.add(a, b)
c

array([1000, 1000, 1000, ..., 1000, 1000, 1000])

Let’s compare the execution time:

In [175]:
a = list(range(1000))            
b = list(reversed(range(1000)))  
%timeit [i+j for i, j in zip(a, b)]

52.4 µs ± 1.24 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [176]:
a = np.array(list(range(1000)))
b = np.array(list(reversed(range(1000))))

%timeit np.add(a, b)

832 ns ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


We got 75x faster with Numpy version!

#### 2. Aggregations
Aggregations are functions which summarize the values in an array. (e.g. min, max, sum, mean, etc.)

Let’s see how long it takes Python loop to get the mininum value of a list of 100k random values.

In [177]:
# With Python built-in min()
from random import random
ls = [random() for i in range(100000)]
%timeit min(ls)

1.3 ms ± 73.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [178]:
# With Numpy aggregate functions min()
arr = np.array(ls)
%timeit arr.min()

35.9 µs ± 686 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


We got about 40x speedup without writing a single loop with Numpy!
In addition, aggregations can also work on multi-dimensional arrays:

In [179]:
# Create a 3x5 array with random numbers
arr = np.random.randint(0, 10, (3, 5))
arr

array([[8, 6, 7, 2, 5],
       [7, 4, 3, 1, 9],
       [1, 8, 9, 0, 0]])

In [180]:
# Get the sum of the entire array
arr.sum()

70

In [181]:
# Get the sum of all the columns in the array
arr.sum(axis=0)

array([16, 18, 19,  3, 14])

In [182]:
# Get the sum of all the rows in the array
arr.sum(axis=1)

array([28, 24, 18])

#### 3. Slicing, Masking and Fancy indexing
##### Masking
With masks, we can index an array with another array. 

For example, we want to index an array a with a boolean mask, what we get out is that only the elements that line up with True in that mask will be returned:

In [183]:
# Create an array
arr = np.array([1, 2, 3, 4, 5])
arr

array([1, 2, 3, 4, 5])

In [184]:
# Create a boolean mask
mask = np.array([False, False, True, True, True])
arr[mask]

array([3, 4, 5])

Masks are often constructed with operators, for example:

In [185]:
# Create an array
arr = np.array([1, 2, 3, 4, 5])
arr

array([1, 2, 3, 4, 5])

In [188]:
# Create a mask
mask = (arr % 2 ==0) | (arr > 4)
arr[mask]

array([2, 4, 5])

And the values that meet the criteria will be returned.
##### Fancy indexing
The idea of fancy indexing is very simple. It allows us to access multiple array elements at once by passing an array of indices.

For example, we get the 0th and 1st element from array a by passing another array of the indices:

In [189]:
arr = np.array([1, 2, 3, 4, 5])
arr

array([1, 2, 3, 4, 5])

In [190]:
index_to_select = [0, 1]
arr[index_to_select]

array([1, 2])

Let’s combine all these together.

Combine with multi-dimensional arrays:

In [191]:
arr = np.arange(6).reshape(2, 3)
arr

array([[0, 1, 2],
       [3, 4, 5]])

In [192]:
# Ask for row 0 and column 1
arr[0, 1]

1

In [193]:
# Mixing slices and indices
arr[:, 1]

array([1, 4])

Masking multi-dimensional arrays:

In [194]:
arr = np.arange(6).reshape(2, 3)
arr

array([[0, 1, 2],
       [3, 4, 5]])

In [195]:
mask = abs(arr-3)<2
arr[mask]

array([2, 3, 4])

Mix masking and slicing:

In [196]:
arr = np.arange(6).reshape(2, 3)
arr

array([[0, 1, 2],
       [3, 4, 5]])

In [197]:
mask = arr.sum(axis=1) > 4
arr[mask, 1:]

array([[4, 5]])