# Python beginners course - Level 1 - NumPy
(inspired by work by Numan Yilmaz and exercises by Nicolas P. Rougier)

This tutorial consists of the following parts:

 - What is NumPy?
 - How to create NumPy arrays
 - Indexing, Fancy Indexing
 - Slicing
 - Universal Functions (Ufuncs)
 - Broadcasting
 - Masking, Sorting and Comparison

## 1. What is NumPy?

NumPy can be seen as the foundation of mathematical calculations in Python. It provides a user-friendly way for the users to represent numerical data as lists or matrices objects and do calculations on these objects. For example, let's say you have a list of numbers ```[1, 2, 3]``` and want to calculate the mean of the list, then Numpy provides a simple syntax to do so. 

Because of it's ease of use and high performance, NumPy has become the basis for virtually every data science package that exists. In this notebook, we will demonstrate some of the most important functionality that NumPy has to offer, and provide you with some excercises to help you learn how to use it. 

Throughout this notebook, exercises are given that are designed such that they are similar to the examples provided. In the case that you are really lost or want to check your answer: you can find your answers [here](https://github.com/rougier/numpy-100/blob/master/100_Numpy_exercises.md).

Before we get started, let's check the version of NumPy and Python. The cell below loads the NumPy library so that we can use it in later cells. If you reload this page, you also need to run the cell below again.

In [None]:
# import numpy
import numpy as np

# sys was imported to check the python version
import sys 

# check the version of python and numpy
print('NumPy version:', np.__version__)
print('Python version',sys.version)

## 2. How to create NumPy arrays
One of the most basic objects that NumPy uses is called a ```NumPy array```. You can think of a NumPy array as an ordered list of numbers. NumPy arrays can be used to represent large amounts of data.

For example, you can use NumPy arrays to represent:
- the height of 10 family members
- the temperature every second of the last month
- the EUR-SGD Exchange rates of the past 20 years
- the first 5 million decimals of $\pi$

There are many ways to create NumPy arrays. We will take a look at a few of them here.

In [None]:
# Creates a numpy array where we specify the values
np.array([1, 2, 3])

In [None]:
# Creates a numpy array of specified length (3) where all values are 0
np.zeros(3)

In [None]:
# Creates a numpy array of specified length (3) where all values are 1
np.ones(3)

In [None]:
# Creates a numpy array with values in the range of 3 to 8
np.arange(3,8)

In [None]:
# Creates a numpy array of specified length (3) where all values are random integers between 1 and 10
np.random.randint(1, 10, 3)

### Exercise 1:
Run the cell above a couple of times. Do you ever see 10 appear? (spoiler: you won't). Adjust the code above such that 10 can also appear.

In [None]:
# Creates a numpy array of specified length (5) where all values are evenly spaced between 0 and 10
np.linspace(0, 10, 5)

All of the NumPy arrays created above are 1-dimensional. However, most of the data we  use on a day-to-day basis is 2-dimensional; for example tabular data (Excel!). Luckily, NumPy is also able to handle 2-dimensional data (called a Matrix) in the form of 2-dimensional NumPy arrays.

In [None]:
# Creates a 2-D numpy array with 3 columns and 4 rows
np.array([[1, 2, 3  ],
          [4, 5, 6 ],
          [7, 8, 9 ],
          [10,11,12]])

In [None]:
# Creates a 2-D numpy array with 3 rows and 5 rows where the values are random numbers between 0 and 1
np.random.random((3,5))

If we would store the data under a name (such as 'my_data'), then we can retrieve numbers at a specific location. 

NOTE: in Python, we always start from 0, i.e., the top left number is at location (0, 0)!

In [None]:
# Stores the data in a variable called 'my_data'
my_data = np.array([[1, 2, 3],
                    [4, 5, 6],
                    [7, 8, 9]])

# Retrieve the number at location (1, 1). Note the use of the square brackets instead of ( and ).
my_data[1,1]

### Exercise 2: Create a matrix of 3 by 3 and retrieve the number at location (2, 1)
Replace the ```___``` in the following cell to complete the exercise.

In [None]:
#store the data in a variable called 'my_data'
my_data = ___

#retrieve the number at location (2, 1)
my_data[___]

## 3. Slightly more advanced functionality
We now have seen how to create basic 1- and 2-dimensional NumPy arrays. However, we have not yet done anything with the data. Let's see what we can do with NumPy arrays once we have created them.

Below we will again create a 1-dimensional array and a 2-dimensional array and store them in the computer memory so that we can  manipulate the data.

We will store the 1-dimensional array under the name ```a``` and store the 2-dimensional array under the name ```b```. 

In [None]:
# Creates a 1-D array with predefined values and stores it under the name 'a'
a = np.array([1, 2, 3])

# Creates a 2-D array of size (3,4) with random numbers between 0 and 10 and stores it under the name 'b'
b = np.random.randint(0, 10, (3, 4))

# In order to see the data, we can print it to the screen:
print("array a:")
print(a)

print("array b:")
print(b)

In Python words, we refer to names as **variables**, as the data they store can vary over time. The variables ```a``` and ```b``` are now stored in the computers memory. Let's do some modifications to our variables: let's add a number to our 1-dimensional array.

In [None]:
print("a =", a)

# Adding the number 4 to to our existing array 'a'
a = np.append(a, 4)
print("a =", a)

# Adding a different array to the array 'a'
a = np.append(a, np.array([5,6,7]))
print("a =", a)

Sometimes, it is also convenient to know the shape of your array (number of rows, number of columns) or how many dimensions it has without actually having to count yourselves. Therefore, NumPy also provides functionality.

In [None]:
# print the shape of 'the arrays 'a' and 'b'
print("Shape of a:", np.shape(a))
print("Shape of b:", np.shape(b))

# print the dimension of the arrays 'a' and 'b'
print('Dimension of a:', np.ndim(a))
print('Dimension of b:', np.ndim(b))

Another property that a programmer is often interested in is how many elements (numbers) the array contains. You could derive this yourselves from the previously mentioned shape and number of dimensions, but NumPy provides a function which does the calculation for you.

In [None]:
#  Calculate the number of elements in arrays 'a' and 'b'
print('Number of elements in a:', np.size(a))
print('Number of elements in b:', np.size(b))

### Exercise 3:  Create a array with values ranging from 10 up to and including 49
(**hint**: np.arange)

## 4. Indexing
As mentioned earlier, you can consider NumPy arrays as an ordered list of numbers. In an ordered list, you expect the order of the number to be meaningful in some way, and that you can extract or modify specific numbers from the list of which you know the place (index) they have in the list.

In a NumPy array this is all possible. Contents of a NumPy array object can be accessed and modified through ```indexing```. In Python, an index denotes the location of a certain element in (for instance) a NumPy array. 

Two types of indexing methods are available:
- indexing: accessing a single item of an array
- slicing: accessing a subset (multiple) items of an array

Next we will illustrate these two methods below.

In [None]:
# Creates a numpy array with values between 1 and 11
X = np.arange(1,11)
print(X)

It is important to note that the **indexing in Python starts at 0**. This means that if you want to access the 1st element then you must look at index 0. We will illustrate that with an example:

In [None]:
# get the first element in the array by using the index 0
first_element = X[0]
print("The first element =", first_element)

# get the fourth element in the array by using the index 3
fourth_element = X[3]
print("The fourth element =", fourth_element)

### Exercise 4: obtain the 7th element of the array
Replace the ```___``` in the following cell to complete the excercise.

In [None]:
# replace the ___ with the appropriate code
seventh_element = X[___]
print("The seventh element =", seventh_element)

Instead of just accessing a particular element, it is also possible to modify an indexed element to a different number. Let's change the number 7 into a 77.

In [None]:
# access the element at index 6 (the number 7) and set it to 77
X[6] = 77
X

Often, you do not only want to access a single element of an array. Instead, you want to access or modify a subset (multiple) elements of an array. This can be achieved using _slicing_. The syntax is quite similar to indexing as we will see.

In [None]:
# get a 'slice' of the array by defining the start (2) and end (5) index
X[2:5]

In the slicing example above, we are getting the elements at indexes 2, 3 and 4. The same syntax works for slicing on 2-dimensional arrays.

In [None]:
# create a 2-dimensional array of size (4,4)
Y= np.array([[ 1,  2,  3,  4],
             [ 5,  6,  7,  8],
             [ 9, 10, 11, 12],
             [13, 14, 15, 16]])
Y

In [None]:
# use slicing to get the first two rows
Y[0:2, :]

In [None]:
# use slicing to get the last two columns
Y[:, 2:4]

In [None]:
# combine both previous slices to obtain the last 2 columns of the first 2 rows
Y[0:2, 2:4]

### Exercise 5: use slicing to get the first two elements of the last two rows
Replace the ```___``` in the following cell to complete the exercise.

In [None]:
# replace the ___ with the appropriate code
Y[___, ____]

### Exercise 6: use slicing to replace the first two elements of the last two rows with 0's
Replace the ```___``` in the following cell to complete the exercise.

In [None]:
# replace the ___ with the appropriate code
Y[__,__] = np.zeros(___)

### Exercise 7:  Create an array of zeros of size 10 but the fifth value is equal to 1
(**hint**: np.zeros and indexing)

## 5. Universal Functions

So far, we have only shown a very small number of the functionality that is present in NumPy. However, to make it easy to browse through all the functionality that NumPy has to offer, Numpy has added functionality for that too!

To see all defined functions, you can type ```np.``` and press ```TAB``` and the following drop-down menu will appear:
![](../assets/numpy.png)
This menu contains a list of all functions that are defined within NumPy; for example ```abs``` which calculates the absolute value of the input.

Let's have a look at some of the functions that NumPy has to offer.

In [None]:
# create an array with values from 1 to 10
X = np.arange(1, 11)
X

In [None]:
# find the maximum element of X
np.max(X)

### Exercise 8: use the drop-down menu to find a function that calculates the minimum of an array
After hitting ```tab``` it can take a couple of seconds for the drop-down menu to appear.

Replace the ```___``` in the following cell to complete the exercise.

In [None]:
# replace ___ with the appropriate code
np.___

In [None]:
# create an array with values from 1 to 10
X = np.arange(1, 11)
X

In [None]:
# find the mean of the elements in the array X
np.mean(X)

In [None]:
# raise every element of the array to the power of 4
np.power(X, 4)

In [None]:
# raise every element of the array to the power of 2 (squared)
np.square(X)

In [None]:
# calculate the square root of every element of the array
np.sqrt(X)

In [None]:
# calculate the sine of each of the elements of the array
np.sin(X)

In [None]:
# calculate the tangent of each of the elements of the array
np.tan(X)

In [None]:
# raise every element of X to the power of 3, raise every element of X to the power of 2 and add the values
np.power(X, 3) + np.square(X)

Now lets try some of these functions on a 2-dimensional array.

In [None]:
Y = np.array([[ 1,  2,  3,  4],
              [ 5,  6,  7,  8],
              [ 9, 10, 11, 12],
              [13, 14, 15, 16]])
Y

In [None]:
# multiply all elements of Y by 2
np.multiply(Y, 2)

In [None]:
# raise every element of Y to the power of 3, raise every element of Y to the power of 2 and add the values
np.power(Y, 3) + np.square(Y)

### Exercise 9: find the median of values in X
Replace the ```___``` in the following cell to complete the exercise.

In [None]:
X = np.random.randint(0, 10)

# replace ___ with the appropriate code
np.___

### Exercise 10:  Create a 10x10 array with random values and find the minimum and maximum values
Replace the ```___``` in the following cell to complete the exercise.

(**hint**: see above how to create random arrays)

In [None]:
X = ____

# replace ___ with the appropriate code
X = np.___

### Exercise 11: create an array of 30 random values and find the mean value
(**hint**: mean)

### Challenge: create an array of size (5,5) with values 1,2,3,4 just below the diagonal and 0's elsewhere
(**hint**: np.diag)

## 6. Sorting
NumPy arrays are also called an ordered list, because it has a certain ordering. There are many options to change the ordering. The most common option is using the *sort* function. Lets see how that works for one dimension:

In [None]:
# create array of 15 numbers between 1 and 10
X = np.random.randint(1, 10, 15)
X

In [None]:
# sort elements in array X
np.sort(X)

Pretty easy, right? Now, for two-dimensional data we need to specify whether we want to sort horizonally or vertically. We specify the *axis* argument to determine this.

In [None]:
# create (3,3) matrix of numbers between 1 and 5
Y = np.random.randint(1,5, (3,3))
Y

In [None]:
# sort vertically (columns / top to bottom)
np.sort(Y, axis=0)

In [None]:
# sort horizontally (rows / left to right)
np.sort(Y, axis=1)

### Challenge 2: sort array Y to be sorted from left to right AND top to bottom. 
In other words, the lowest number is in the top left corner and the highest number in the bottom right corner.

## 6. Filtering
You might want to be able to filter the numbers in an array. NumPy lets you define statements that can be either True or False. 

Some symbols used to perform a true/false comparison test on every element include:

  - **==**, which means 'equal to'
  - **!=**, which means 'not equal to'
  - **<, >**, which mean 'smaller than' and 'greater than'
  - **<=, >=**, which mean 'smaller than or equal to' and 'greater than or equal to'


In [None]:
# create array of 15 numbers between 1 and 10
X = np.random.randint(1, 10, 15)
X

In [None]:
# get alle elements greater than 3
X[X > 3]

You can also construct more complex conditions by using:
  
  - *&*, which means 'and'
  - *|*, which means 'or'

In [None]:
# all numbers greater than 1 and lower than 8
X[(X > 1) & (X < 8)]

In [None]:
# all numbers lower than 4 or greater than 8
X[(X < 4) | (X > 8)]

### Exercise 12: filter to show only numbers that end with 5, 6 or 7.
Replace the ```___``` in the following cell to complete the exercise.

In [None]:
X = np.random.randint(1, 20, 50)

X[_______]