# Practice problems for Module 3

In Module 3, we are refreshing our Python programming skills.

In [1]:
import numpy as np
import time

## Problem 1

In data science, a common metric is mean squared error:

\begin{equation*}
\operatorname{MSE}=\frac{1}{n}\sum_{i=1}^n(Y_i-\hat{Y_i})^2,
\end{equation*}

where $Y_i$ is the predicted value and $\hat{Y_i}$ is the observed value, for observation $i$ of $n$ observations.

Write a function that returns the mean squared error between numpy arrays <code>y</code> and <code>y_hat</code>.

In [2]:
def mse(y, y_hat):
    # Your code here
    MSE = np.mean((y - y_hat)**2)
    return MSE

### Problem 1 test

In [3]:
y = np.array([1, 2, 3])
y_hat = np.array([2, 4, 6])

print(mse(y, y_hat))
# result should be 4.666666666666667

4.666666666666667


## Problem 2

Write a function that takes a list <code>l</code> and returns a new list that contains the sorted elements of the first list minus all the duplicates.

In [8]:
def unique(l):
    # Your code here
    ar = np.array(l)
    return list(np.unique(ar))

### Problem 2 tests

In [9]:
l = [1, 2, 3, 2]
print(unique(l))
# result should be [1, 2, 3]

[1, 2, 3]


In [10]:
l = [2, 8, 4, 6, 4, 8, 2]
print(unique(l))
# result should be [2, 4, 6, 8]

[2, 4, 6, 8]


## Problem 3

See below the array <code>a</code> and two different methods for calculating its mean:

In [11]:
a = np.random.random((10000,10000))

t0 = time.clock()
mean1 = a.mean()
t1 = time.clock()
print("a.mean()={:g}, completed in {:.2g}s".format(mean1, t1-t0))


def mean(a):
    s = 0.0
    for v in np.nditer(a):
        s = s+v
    return s / a.size

t0 = time.clock()
mean2 = mean(a)
t1 = time.clock()
print("mean(a)={:g}, completed in {:.2g}s".format(mean(a), t1-t0))

a.mean()=0.500016, completed in 0.12s
mean(a)=0.500016, completed in 31s


**Question**: Why is the amount of time required to calculate the mean so different between the two methods?

**Answer**: Because Numpy's matrix operations can be done in parallel, which eliminate the need for using a for loop, which is accomplished one after another. In addition, numpy methods are written in the low-level language of C, which is more machine-readable and much faster to run than using Python, a high-level language which is also written in C.

Python is a "dynamic language", meaning when you write the code, it gets send to a interpreter, and then returned back. This means Python has a limited chance to see the entire implementation for it to optimize the performance; While C is a "static laguage", and it can see the whole code and decide what would be the best way to optimize it.

## Problem 4

There are many metrics that can be useful for describing the relationship between two arrays:

\begin{equation*}
\operatorname{L0 norm} = \operatorname{count} \left(Y_i-\hat{Y_i}  \neq 0 \right),
\end{equation*}

\begin{equation*}
\operatorname{L1 norm}=\frac{1}{n}\sum_{i=1}^n | Y_i-\hat{Y_i} |,
\end{equation*}


\begin{equation*}
\operatorname{L2 norm or MSE}=\frac{1}{n}\sum_{i=1}^n(Y_i-\hat{Y_i})^2,
\end{equation*}

and

\begin{equation*}
\operatorname{L \infty norm}=\operatorname{argmax} | Y_i-\hat{Y_i} |,
\end{equation*}

for example.

Create a Python Class that can be initialized with two arrays and has methods to compute the L0, L1, L2, and L∞ norms of the differences between the arrays.

In [12]:
class Normy():
    def __init__(self, y, y_hat):
        self.y = y
        self.y_hat = y_hat
        # Your code here
    
    def l0(self):
        ar = self.y - self.y_hat
        return np.count_nonzero(ar)
        # Your code here
    
    def l1(self):
        return np.mean(abs(self.y - self.y_hat))
        # Your code here
        
    def l2(self):
        return np.mean((self.y - self.y_hat)**2)
        # Your code here
    
    def linf(self):
        return np.argmax(abs(self.y - self.y_hat))
        # Your code here

### Problem 4 tests

In [13]:
a = np.array([1, 2, 3])
b = np.array([1, 4, 6])

n = Normy(a, b)
print("L0:", n.l0())
# Should say L0: 2
print("L1:", n.l1())
# Should say L1: 1.6666666666666667
print("L2:", n.l2())
# Should say L2: 4.333333333333333
print("L inf:", n.linf())
# Should say L inf: 2

L0: 2
L1: 1.6666666666666667
L2: 4.333333333333333
L inf: 2
