<img src="../imgs/CampQMIND_banner.png">

# Numpy 

Numpy is a scientific computing package for python. It introduces a new type of array which speeds up Python array computations through vectorizing loops in C.


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Numpy" data-toc-modified-id="Numpy-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Numpy</a></span></li><li><span><a href="#Creating-Arrays" data-toc-modified-id="Creating-Arrays-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Creating Arrays</a></span></li><li><span><a href="#Indexing-Arrays" data-toc-modified-id="Indexing-Arrays-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Indexing Arrays</a></span><ul class="toc-item"><li><span><a href="#One-dimensional-arrays" data-toc-modified-id="One-dimensional-arrays-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>One dimensional arrays</a></span></li><li><span><a href="#Multi-dimensional-arrays" data-toc-modified-id="Multi-dimensional-arrays-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Multi dimensional arrays</a></span></li></ul></li><li><span><a href="#Numpy-math" data-toc-modified-id="Numpy-math-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Numpy math</a></span></li><li><span><a href="#Broadcasting" data-toc-modified-id="Broadcasting-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Broadcasting</a></span></li><li><span><a href="#Numpy-Random" data-toc-modified-id="Numpy-Random-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Numpy Random</a></span></li><li><span><a href="#Resources" data-toc-modified-id="Resources-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Resources</a></span></li></ul></div>

In [1]:
import numpy as np

In [2]:
from IPython.display import display

# Creating Arrays

```np.array()``` function takes a list as an input and returns an ndarray with many usefuls methods defined for it.

Along with the list you can also choose to provide a dtype argument:
- The dtype argument stands for data type and takes the following values:
    - "int8" to "int64"
    - "float16" to "float64"



In [3]:
display(np.array([1,2,3,4,5]))

display(np.array(range(1,6)))

display(np.arange(1,6))

array([1, 2, 3, 4, 5])

array([1, 2, 3, 4, 5])

array([1, 2, 3, 4, 5])

In [4]:
# Shape is row x column
shape = (3,3) # shape is always a tuple
display(np.ones(shape, dtype=np.int32)) # supply dtype to change the datatype
# default is np.float
# you can also use this to down cast to save memory
display(np.ones(shape, dtype=np.int8))

# Likewise we can create an array of zeroes.
display(np.zeros(shape))

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]], dtype=int32)

array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]], dtype=int8)

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [5]:
# evenly spaced numbers
np.linspace( 0, 2, 9 )

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

In [6]:
# Reshaping
display(np.ones(shape))
display("To")
display(np.ones(shape).reshape(1, -1)) # -1 is used to fill
display(np.ones(shape).reshape(-1, 9)) # it is the same as this

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

'To'

array([[1., 1., 1., 1., 1., 1., 1., 1., 1.]])

array([[1., 1., 1., 1., 1., 1., 1., 1., 1.]])

# Indexing Arrays

## One dimensional arrays

Think of python lists with many useful methods.

In [7]:
my_1d = np.array([1,2,3,4,5, 6])
display(my_1d)

# slicing works the same way as regular python lists for 1d arrays

display(my_1d[1:3]) # Second and third element

display(my_1d[::2]) # Every second element

display(my_1d[::-1]) # Reversing


array([1, 2, 3, 4, 5, 6])

array([2, 3])

array([1, 3, 5])

array([6, 5, 4, 3, 2, 1])

In [8]:
display("Original array",my_1d[(my_1d> 2)])

display("Boolean masks",my_1d[(my_1d> 2)])

display("The mask", (my_1d> 2))

display("We can make it more complicated by combining logic", my_1d[(my_1d> 2) & (my_1d < 5)])
# Note to use python and you need & here and for or you need |
# The difference is and, or only applies to single elements whereas & and | evaluates the whole.

'Original array'

array([3, 4, 5, 6])

'Boolean masks'

array([3, 4, 5, 6])

'The mask'

array([False, False,  True,  True,  True,  True])

'We can make it more complicated by combining logic'

array([3, 4])

In [9]:
# We can also change the direction of the operator using ~

display("Original array",my_1d[(my_1d> 2)])

display("Boolean masks",my_1d[~(my_1d> 2)])

display("The mask", (~my_1d> 2))

display("We can make it more complicated by combining logic", my_1d[~((my_1d> 2) & (my_1d < 5))])
# Note to use python and you need & here and for or you need |

'Original array'

array([3, 4, 5, 6])

'Boolean masks'

array([1, 2])

'The mask'

array([False, False, False, False, False, False])

'We can make it more complicated by combining logic'

array([1, 2, 5, 6])

## Multi dimensional arrays

In [10]:
my_2d = my_1d.reshape(2,-1)
display(my_2d)

# We can index through rows like so
display(my_2d[0])

# We can index through columns like so
display(my_2d[:,0])

# To get the first element of 2nd column
display(my_2d[0,1])

array([[1, 2, 3],
       [4, 5, 6]])

array([1, 2, 3])

array([1, 4])

2

In [11]:
# tensors in numpy
# This tensor is 3x3x3 (3 3x3 matrices)
tensor = np.array([
  [[1,2,3],    [4,5,6],    [7,8,9]],
  [[11,12,13], [14,15,16], [17,18,19]],
  [[21,22,23], [24,25,26], [27,28,29]],
  ])
display("Original tensor",tensor)
# To index through matrices
display("First matrix",tensor[0])

# To index through 0th row of each matrix
display("0th row of each matrix",tensor[:,0])

# To index through 0th column of each matrix
display("0th column of each matrix",tensor[:,:,0])

'Original tensor'

array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[11, 12, 13],
        [14, 15, 16],
        [17, 18, 19]],

       [[21, 22, 23],
        [24, 25, 26],
        [27, 28, 29]]])

'First matrix'

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

'0th row of each matrix'

array([[ 1,  2,  3],
       [11, 12, 13],
       [21, 22, 23]])

'0th column of each matrix'

array([[ 1,  4,  7],
       [11, 14, 17],
       [21, 24, 27]])

# Numpy math

In [12]:
# Similar to the math module numpy provides common mathematical operations.
# The main justification to use numpy is that it is significantly faster because of the way it is written. (see c-bindings)
import math

%timeit [math.pow(i, 2) for i in range(1,10000)]

%timeit np.power(np.arange(1,10000),2)

# Thats a lot faster
# Also notice how we did not need to put np.power in a list comprehension (a for loop)
# this is because numpy apply its functions elementwise.
# Numpy applies

2.16 ms ± 92.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
30.6 µs ± 692 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [13]:
# We can write a vectorized function as follows
import random
def greater_than_5(a) -> int:
    """
    Returns 1 if the values is greater than one, zero otherwise
    """
    if a > 5:
        return 1
    else:
        return 0
        
vec_greater_than_5 = np.vectorize(greater_than_5)
random_nums = np.array([random.uniform(1,10) for i in range(1,1000)])

In [14]:
# Vectorized
%timeit vec_greater_than_5(random_nums)

163 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [15]:
%%timeit
# For loop

for i,j in enumerate(random_nums):
    if j > 5:
        random_nums[i] = 1
    else:
        random_nums[i] = 0

446 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [16]:
# With regular numpy conditional indexing
%timeit random_nums[random_nums>5]

2.09 µs ± 29.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [17]:
# Numpy arrays make information about the array readily available
display("Original array", my_2d)

display("Shape of array", my_2d.shape) # row x column

display("Maximum and minimum", my_2d.max(), my_2d.min())

display("Mean along columns", my_2d.mean(0))

display("Mean along rows", my_2d.mean(1))

display("Array as a list", my_2d.tolist())

# and many more


'Original array'

array([[1, 2, 3],
       [4, 5, 6]])

'Shape of array'

(2, 3)

'Maximum and minimum'

6

1

'Mean along columns'

array([2.5, 3.5, 4.5])

'Mean along rows'

array([2., 5.])

'Array as a list'

[[1, 2, 3], [4, 5, 6]]

# Broadcasting

Numpy works extremely fast by "broadcasting" the smaller sized array to the same shape of the bigger array.

This results in two things:
- The operations are faster than regular python because the looping occurs in C and not python.
- Different sized arrays can be used together if their shapes match.

In [1]:
display("Original Array",my_1d)

# Behind the scenes numpy broadcasts that one to match the shape of the array.
display("Original Array + 1",my_1d + 1)

[1, 2, 3, 4, 5, 6] + 1 # Doesn't work in lists

NameError: name 'my_1d' is not defined

In [19]:
display("Original Array",my_2d)

# Behind the scenes numpy broadcasts that one to match the shape of the array.
display("Original Array + 1",my_2d + 1)

'Original Array'

array([[1, 2, 3],
       [4, 5, 6]])

'Original Array + 1'

array([[2, 3, 4],
       [5, 6, 7]])

# Numpy Random

In [20]:
# The usage of numpy.random is very similar to random.random
np.random.seed(0) # setting the seed for reproduceable results.

# one key difference is numpy optionally takes size as an argument
display(np.random.randint(0, 5, size = (1,5)))

display(np.random.normal(size = (1,5)))

a = [1, 2, 3, 4, 5]

# np.random.shuffle() # inplace operator to shuffle elements

display(np.random.permutation(a)) # returns a shuffled copy

array([[4, 0, 3, 3, 3]])

array([[ 0.37025538,  1.04053075, -1.51698273, -0.86627621, -0.05503512]])

array([5, 1, 4, 3, 2])

# Resources

https://www.labri.fr/perso/nrougier/from-python-to-numpy/ For an in-depth exploration of numpy

https://www.machinelearningplus.com/python/101-numpy-exercises-python/ To challenge yourself 