# Exercise 01.1 Intro to ML - Numpy Recap - Solution

## Pedagogy

This notebook contains both theoretical explanations and executable cells to execute your code.

When you see the <span style="color:red">**[TBC]**</span> (To Be Completed) sign, it means that you need to perform an action else besides executing the cells of code that already exist. These actions can be:
- Complete the code with proper comments
- Respond to a question
- Write an analysis
- etc.

## Part 1. NumPy ultra-quick tutorial

[Numpy](https://numpy.org/doc/stable/index.html) is a Python library for creating and manipulating matrices, the main data structure used by ML algorithms. [Matrices](https://en.wikipedia.org/wiki/Matrix_(mathematics)) are mathematical objects used to store values in rows and columns.

Python calls matrices *lists*, NumPy calls them *arrays*.

This notebook is not an exhaustive tutorial on NumPy. Rather, the purpose of this notebook is to let you review the knowledge about NumPy you learned in the Python Bootcamp course, and set up python, anaconda, jupyter notebook and other tools for subsequent courses.

### Import NumPy library

Run the following code cell to import the NumPy library.

In [1]:
# import the NumPy library
import numpy as np

### Populate arrays with specific numbers

Call `np.array` to create a NumPy array with you own hand-picked values. For example, the following call to `np.array` creates an 8-element array.

In [2]:
# create an 8-element array
one_dimensional_array = np.array([1.2, 2.4, 3.5, 4.7, 6.1, 7.2, 8.3, 9.5])
print(one_dimensional_array)

[1.2 2.4 3.5 4.7 6.1 7.2 8.3 9.5]


You can also use `np.array` to create a two-dimensional array. To do that, specify an extra layer of square brackets. For example, the following call creates a 3$\times$2 array:

In [3]:
# create a two-dimensional array
two_dimensional_array = np.array([[6, 5], [11, 7], [4, 8]])
print(two_dimensional_array)

[[ 6  5]
 [11  7]
 [ 4  8]]


To populate an array with all zeros, call `np.zeros`.

In [4]:
# create an array with all zeros
all_zero_array = np.zeros(8)
print(all_zero_array)

[0. 0. 0. 0. 0. 0. 0. 0.]


To populate an array with all ones, call `np.ones`.

In [5]:
# create an array with all ones
all_one_array = np.ones((3,2))
print(all_one_array)

[[1. 1.]
 [1. 1.]
 [1. 1.]]


### Populate arrays with random numbers

NumPy provides various functions to populate arrays with random numbers across certain ranges. For example, `np.random.randint` generates random integers between a low and high value. The following call populates a 6-element array with random integers between 50 and 100.

In [6]:
# create a 6-element array with random integers between 50 and 100
random_integers_between_50_and_100 = np.random.randint(low = 50, high = 101, size = (6))
print(random_integers_between_50_and_100)

[86 76 60 60 77 98]


Note that the highest generated integer will be less than the `high` argument. More explanations about `np.random.randint` can be found in the official documentation: https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html#numpy.random.randint

To create random floating-point values between 0.0 and 1.0, call `np.random.random`. For example:

In [7]:
# create an array with random floating-point values between 0.0 and 1.0
random_floats_between_0_and_1 = np.random.random([6])
print(random_floats_between_0_and_1)

[0.86939924 0.15701301 0.43605626 0.68470944 0.69043385 0.03902001]


### Mathematical operations on NumPy operands

If you want to add or subtract two arrays, linear algebra requires that the two operands have the same dimensions. Furthermore, if you want to multiply two arrays, linear algebra imposes strict rules on the dimensional compatibility of operands. Fortunately, NumPy uses a trick called [**broadcasting**](https://developers.google.com/machine-learning/glossary/#broadcasting) to virtually expand the smaller operand to dimensions compatible for linear algebra. For example, the following operation uses broadcasting to add 2.0 to the value of every item in the array created in the previous code cell:

In [8]:
# add 2.0 to every item in the array
random_floats_between_2_and_3 = 2.0 + random_floats_between_0_and_1
print(random_floats_between_2_and_3)

[2.86939924 2.15701301 2.43605626 2.68470944 2.69043385 2.03902001]


The following operation also relies on broadcasting to multiply each item in an array by 3:

In [9]:
# multipy each item in the array by 3
random_integers_between_150_and_300 = 3 * random_integers_between_50_and_100
print(random_integers_between_150_and_300)

[258 228 180 180 231 294]


## Part 2. Hands-on exercises

Please complete the following two exercises.

### Task 1. Create a linear dataset

<span style="color:red">**[TBC]**</span>: Your goal is to create a simple dataset consisting of a single feature and a label as follows:
1. Assign a sequence of integers from 6 to 20 (inclusive) to a NumPy array named `feature`
2. Assign 15 values to a NumPy array named `label` such that: label = 3 $\times$ feature + 4
3. Print the created `feature` and `label`

For example, the first value for `label` should be:

3 $\times$ 6 + 4 = 22

In [10]:
# [TBC] complete your code here with proper comments
feature = np.arange(6, 21)
print("The created feature array is:")
print(feature, '\n')
label = (feature * 3) + 4
print("The created label array is:")
print(label)

The created feature array is:
[ 6  7  8  9 10 11 12 13 14 15 16 17 18 19 20] 

The created label array is:
[22 25 28 31 34 37 40 43 46 49 52 55 58 61 64]


### Task 2. Add some noise to the dataset

<span style="color:red">**[TBC]**</span>: To make your dataset a little more realistic, insert a little random noise into each element of the `label` array you already created. To be more precise, modify each value assigned to `label` by adding a different random floating-point value between -2 and +2.

In [11]:
# [TBC] complete your code here with proper comments
noise = (np.random.random([15]) * 4) - 2
print("The created noise array is:")
print(noise, '\n')
label = label + noise
print("The label array after adding noise is:")
print(label)

The created noise array is:
[ 0.0079185  -0.78944557 -1.09373335  1.2477377  -1.44391073 -0.26442623
 -0.8373811  -0.54750983  0.66253868  0.80586887  0.36892374 -0.44284636
 -0.19395797 -1.63782728  1.59350139] 

The label array after adding noise is:
[22.0079185  24.21055443 26.90626665 32.2477377  32.55608927 36.73557377
 39.1626189  42.45249017 46.66253868 49.80586887 52.36892374 54.55715364
 57.80604203 59.36217272 65.59350139]
