<a href="https://colab.research.google.com/github/DJCordhose/ml-workshop/blob/master/notebooks/data-science/1-1-numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numpy Hello World

* numpy offers optimized data structures
* many ML libraries expect input as a numpy array and also give results as a numpy array
* a large part of machine learning consists of massaging data into the format you need
* numpy is a low level tool to do that

https://numpy.org/

### Things to note

* numpy creates multi dimensional arrays: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html
* indices always start with 0, not 1


In [0]:
# np is the common name
import numpy as np

In [13]:
# create a simple array
simple_array = np.array([1, 2, 3])
simple_array

array([1, 2, 3])

In [14]:
# https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html

type(simple_array)

numpy.ndarray

In [2]:
# create more dimensions, same type

array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
array

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [3]:
# https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html
# multi dimensional array
type(array)

numpy.ndarray

In [4]:
# you can always ask for the shape of an numpy arrary

array.shape

(4, 3)

In [10]:
# accessing a single element
# be careful, indices always start with 0, not 1

array[0, 0]

1

In [0]:
# accessing a single row

array[0]

array([1, 2, 3])

In [0]:
# accessing a single column

array[:, 0]

array([ 1,  4,  7, 10])

In [0]:
# accessing a range

array[1:3, 0:2]

array([[4, 5],
       [7, 8]])

In [0]:
# accessing a some elements

array[[1, 3], 0:2]

array([[ 4,  5],
       [10, 11]])

In [0]:
# transpostion

array.T

array([[ 1,  4,  7, 10],
       [ 2,  5,  8, 11],
       [ 3,  6,  9, 12]])

In [6]:
# give it a new shape

array.reshape(2, 6)

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [0]:
# use -1 for unknown dimensions 
# (dimension will be inferred by other dimensions and what is there)

array.reshape(-1, 1)

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12]])

In [0]:
array.reshape(1, -1)

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])

In [0]:
# flatten
# array.reshape(-1) will do the same

array.ravel()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [0]:
# combining two arrays

In [7]:
a = np.array([[1,2,3], [5,6,7]])
a

array([[1, 2, 3],
       [5, 6, 7]])

In [8]:
a.shape

(2, 3)

In [9]:
b = np.array([4, 8])
b

array([4, 8])

In [0]:
b.shape

(2,)

In [0]:
# does not work, need to have same dimension
try:
  np.append(a, b, axis=1)
except ValueError as e:
  print(e)

all the input arrays must have same number of dimensions


In [0]:
# we can make that fit
b = b.reshape(-1, 1)
b.shape

(2, 1)

In [0]:
np.append(a, b, axis=1)

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [0]:
# max of all elements

array.max()

12

In [0]:
array

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [0]:
# max for each row

array.max(axis=1)

array([ 3,  6,  9, 12])

In [0]:
# max for each column

array.max(axis=0)

array([10, 11, 12])

In [0]:
# index for max element

array.argmax()

11

In [0]:
array.argmax(axis=1)

array([2, 2, 2, 2])

In [0]:
array.argmax(axis=0)

array([3, 3, 3])

In [0]:
# convenience function to create arrays

# start, stop, number of samples
np.linspace(0, 10, 6)

array([ 0.,  2.,  4.,  6.,  8., 10.])

In [0]:
np.zeros(6)

array([0., 0., 0., 0., 0., 0.])

In [0]:
np.arange(1, 6, 1)

array([1, 2, 3, 4, 5])

In [0]:
# get documentation to find out what np.arrange does
np.arange?

## Exercise

1. execute this notebook
    * you can execute each cell by hitting shift+enter
1. create some numpy code to transform `start_array` to `end_array`
    * you will need some of the documentation which you get by putting a questionmark after the API Call
    * e.g.: np.arrange?

In [0]:
start_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
start_array

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [0]:
end_array = np.array(([[ 2,  5], [ 8, 11]]))
end_array

array([[ 2,  5],
       [ 8, 11]])

## STOP HERE

.
.
.

---

.
.
.



---

.
.
.



---

.
.
.



---

.
.
.



---

.
.
.



---

.
.
.























In [0]:
reshaped = start_array.reshape(2, -1)
reshaped

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [0]:
reshaped[:, [1, 4]]

array([[ 2,  5],
       [ 8, 11]])