# Applied Machine Learning (2020), exercises


## General instructions for all exercises

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Follow the instructions and fill in your solution under the line marked by tag

> YOUR CODE HERE
  
Do not otherwise change the notebook, because it can disturb the autograding!

Having written the answer, execute the code cell by and pressing `Shift-Enter` key combination. The code is run, and it may print some information under the code cell. The focus automatically moves to the next cell and you may "execute" that cell by pressing `Shift-Enter` again, until you have reached the code cell which tests your solution. Execute that and follow the feedback. Usually it either says that the solution seems acceptable, or reports some errors. You can go back to your solution, modify it and repeat everything until you are satisfied. Then proceed to the next task.
   
Repeat the process for all tasks.

The notebook may also contain manually graded answers. Write your manualle graded answer under the line marked by tag:

> YOUR ANSWER HERE

Manually graded tasks may be text, pseudocode, or mathematical formulas. You can write formulas with $\LaTeX$-syntax by enclosing the formula with dollar signs (`$`), for example `$f(x)=2 \pi / \alpha$`, will produce $f(x)=2 \pi / \alpha$

When you have passed the tests in the notebook, and you are ready to submit your solutions, download the whole notebook, using menu `File -> Download as -> Notebook (.ipynb)`. Save the file in your hard disk, and submit it in [Moodle](https://moodle.uwasa.fi) under the corresponding excercise.

Your solution should be an executable Python code. Use the code already existing as an example of Python programing and read more from the numerous Python programming material from the Internet if necessary. 


In [None]:
NAME = ""
Student_number = ""

---

# ICAT3190, Module 1

## Task 1

Implement a function which calculates the Root Mean Square value of the input vector, according to the following formula:

$$x_{rms}=\sqrt{\frac{1}{N}\sum_{i=1}^{N} x_i^2} $$'

Name your function as `myRMS`. It should take one input vector (x) as input and it returns single RMS value as output. Code your function in the cell below. When you think it is ready, execute it by pressing `Shift-Enter`. Then you can run the test cases in the next cell by hitting `Shift-Enter` again.

- Remember to import necessary libraries, like numpy.
- You may assume that the input vector is a numpy array, for example `x=np.array([1,2,3,4,5])`, or `x=np.linspace(0,1,500)`.
- The power operator in python is `**`, while in many other programming languages it is `^`

Read basics about python programming from [Dive into Python](https://diveinto.org/python3/table-of-contents.html) and use the documentation of [Numpy](https://numpy.org/).

Check also the execution time of your program.

In [None]:
%matplotlib inline

from numpy import sqrt, square, mean

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
### Tests for your code, Do not change!
### There can be also additional hidden tests.
### -----------------------------------
from scipy.linalg import norm
import numpy as np
import sklearn

from sklearn.metrics import mean_squared_error

# Make a function for calculating rms using norm
trueRMS = lambda x: sqrt(mean_squared_error(x,x*0))

x1=np.array([2,2,2,2,2])
assert(myRMS(x1)==2)

x2=np.array([3,4,5.2,8,-2,0,1.2])
assert(myRMS(x2)==trueRMS(x2))


# Time it
x4=np.random.normal(5,2,size=1000000)
%time myRMS(x4)


## Task 2

Create a polynomial signal, $y$, and add some normal noise $\mathcal{N}(\mu, \sigma)$ into it, where, $\mu$, is the mean value of the noise and, $\sigma$, is the standard deviation. Calculate the values of the signal in 100 points between $x \in [0, 3]$. Create the signal according to the following formula:

$$y=0.1 x^2 + 1.5 x + \mathcal{N}(0,0.5) $$

Then make a scatter plot of the $y$ against $x$. You can use the standard plot function, but do not use a line plot but plot only dots for all values using syntax `plt.plot(x,y,'*')`. Remember to import `plt` module from `matplotlib.pyplot as plt`.

See how normal noise was created in the test cell (In [15]) of previous Task, using `np.random.normal()`. Use `np.linspace()` function to create a linear x-axis as was also shown in lecture notes.

See examples from [MatplotLib tutorials](https://matplotlib.org/3.1.1/tutorials/index.html).

In [3]:
# YOUR CODE HERE
import numpy as np
np.linspace


In [None]:
# Look visually, if the result looks like it should

import matplotlib.pyplot as plt
plt.plot(x,y, '*')

In [None]:
### Tests for your code.
### -----------------------------------
assert(len(x)==100)
assert((y.mean()>2.4) and (y.mean()<2.7))
assert((y.min()>-1) and (y.max()<6.4))
assert((y.std()>1.2) and (y.std()<2))

## Task 3
Fit orninary linearn regression model into the data which you just created, using the statsmodel library. Read instructions from [statsmodel documentation](https://www.statsmodels.org/stable/index.html). Use the second example for numpy arrays.

In python data models, the typical flow of action is that you first import the model class from a library, then you instanatiate the model and then you fit the model to the data. The fitted model contains the model parameters and also some statistics describing the fitness of the model.

Now do the following:
 1. Import statsmodels.api module
 1. Instantiate OLS model and store it as variable `model`
 1. Fit your model to the data, y,x and store your fitted model as variable `fitted_model`
 1. Print the summary of your model according to the instructions in the documentation
 1. Study especially three values in the summary
   1. R-squared
   1. The coef-parameter of x1
   1. The P-value of x1
   
- Does the model fit into the data? How much variance does it explain?
- The model is linear, so it tries to fit a line into the data, what is the slope of the fitted line?
- What is the probability that the slope is actually zero, but it is now found to be different by chance?

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# List the parameters of the fitted model
print(fitted_model.summary())

The end :)