# Grading process

This notebook will be autovalidated with `papermill`. The exact command is the following:

```bash
papermill assignment.ipynb <student>-assignment.ipynb -p TEST True
```

This will inject new cell after each cell tagged as `parameters` (see `View > Cell toolbar > Tags`). Notebook will be executed from top to bottom in linear order. `solutions.py` contains correct implementations used to validate your solutions.

Different problems give different number of points. All problems in the basic section give 1 point, while all problems in intermediate section give 2 points.

Please, fill `STUDENT` variable with your name, so that we call collect the results automatically. Each problem contains specific validation details. We will do our best to review your assignments, but please keep in mind, that for this assignment automatic grade (between $0$ an $1$) is the primary source of ground truth.

In [1]:
import numpy as np
from numpy import pi

In [2]:
STUDENT = "Adam Cohn"
ASSIGNMENT = 1
TEST = False

In [3]:
if TEST:
    import solutions
    total_grade = 0
    MAX_POINTS = 12

# Basic arrays

### 1. Calculate $\sin(x)$ for $0\leq x < 2\pi$ with a step of $0.1$.

You need to implement a function, which calculates the required array. Result must be **1-dimensional**, and **will be tested against precomputed values**.

Note, that `numpy` provides [constants](https://docs.scipy.org/doc/numpy-1.15.0/reference/constants.html), you can take $\pi$ from there. 

In [4]:
def sin_basic():
    # your code goes here
    return np.sin(np.arange(0, 2*pi, 0.1))

In [5]:
PROBLEM_ID = 1

if TEST:
    total_grade += solutions.check(STUDENT, PROBLEM_ID, sin_basic)

### 2. Create a function, which calculates $n$ values of $\sin(x)$ for $0\leq x \leq 2\pi$.

Both $0$ and $2\pi$ must be included and $x$ values must be equidistant. Result must be **1-dimensional**, and **will be tested against three random values for $10 \leq n < 100 $**.

For example, $n=3$ means that the function must calculate values for $\sin(0)$,$\sin(\pi)$ and $\sin(2\pi)$. 


In [6]:
def sin_enumerated(n):
    # your code goes here
    return np.sin(np.linspace(0, 2*pi, n))

In [7]:
PROBLEM_ID = 2

if TEST:
    total_grade += solutions.check(STUDENT, PROBLEM_ID, sin_enumerated)

### 3. Create a function, which calculates $n$ values of truncated $\sin(x)$ for $0\leq x \leq 2\pi$.

Truncated $\sin(x)$ is defined as the following:

$$
\sin_{trunc}(x) =
\left\{
\begin{array}{l}
\sin(x), \sin(x)\geq 0, \\
0, \sin(x) < 0.
\end{array}\right.
$$

Otherwise, the requirements are the same as in Problem 2.

In [8]:
def sin_truncated(n):
    # your code goes here
    from numpy import pi
    result = sin_enumerated(n)
    result[result<0] = 0
    return result

In [9]:
PROBLEM_ID = 3

if TEST:
    total_grade += solutions.check(STUDENT, PROBLEM_ID, sin_truncated)

### 4. Statistics on multi-dimensional arrays.

Given the 3-dimensional array, calculate mean and standard deviation along dimensions $(1,2)$. Result must be **2-dimensional**, and **will be tested against three random combinations of input array dimensions ($10 \leq n < 100 $)**. Array values will be drawn from standard normal distribution (`np.random.randn`).

For example, for $(10, 5, 5)$ array result must be of shape $(10,2)$, with column 0 representing mean value for each "row" of the original array, and column 1 representing standard deviation.

In [10]:
def array_stats(arr):
    # your code goes here
    mean = np.mean(arr, axis=(1,2)) 
    std = np.std(arr,axis=(1,2))
    return np.column_stack([mean, std])

In [11]:
PROBLEM_ID = 4

if TEST:
    total_grade += solutions.check(STUDENT, PROBLEM_ID, array_stats)

### 5. Softmax activation function.

Given the 2-dimensional array, calculate it's $\texttt{softmax}$ for each row. $\texttt{softmax}$ activation for a vector is defined as the following:

$$
\texttt{softmax} (x_i) = \frac{e^{x_i}}{\sum_i e^{x_i}}.
$$

Correspondingly, for entire array the expression is the following:

$$
\texttt{softmax} (x_{ij}) = \frac{e^{x_{ij}}}{\sum_j e^{x_{ij}}}.
$$

For example, for $(10, 5)$ array result must be of the same shape $(10, 5)$. Result must be **2-dimensional**, and **will be tested against three random combinations of input array dimensions ($10 \leq n < 100 $)**. Array elements are drawn from standard normal distribution.

In [12]:
def softmax(arr):
    # your code goes here
    exponent_arr = np.e**arr
    return exponent_arr / np.sum(exponent_arr, axis=1).reshape(-1,1)

In [13]:
PROBLEM_ID = 5

if TEST:
    total_grade += solutions.check(STUDENT, PROBLEM_ID, softmax)

### 6. Class prediction.

$\texttt{softmax}$ is used to represent probabilities. Result of the Problem 5 may be treated as predictions of some classification model. For example, $(10, 3)$ array outputted from the function in Problem 5 may be a probabilistic prediction of 3-class classification model for 10 examples. Note, that $\texttt{softmax}$ normalizes the input, such that each row sums to $1$.

In this problem, you need to calculate the exact class, i.e. determine, which probability is the highest for each example. For example, for the following array

$$
\left(
\begin{array}{ccc}
0.3 && 0.6 && 0.1 \\
0.8 && 0.05 && 0.15
\end{array}
\right)
$$

the result must be

$$
\left(
\begin{array}{c}
1 \\
0
\end{array}
\right)
$$

Note, that result must be **2-dimensional**, such that input array of shape $(N, M)$ is transformed into output array of shape $(N,1)$. Input arrays are generated in the same way as in Problem 5 with $\texttt{softmax}$ applied on top.

In [14]:
def predict(arr):
    # your code goes here
    return np.argmax(arr, axis=0).reshape(arr.shape[0],1)

In [15]:
PROBLEM_ID = 6

if TEST:
    total_grade += solutions.check(STUDENT, PROBLEM_ID, predict)

# Intermediate arrays

### 7. One-hot encoding.

Given 1-dimensional array of class labels, construct it's one-hot encoded transformation. One-hot encoding of an array of shape $(N,)$ is defined as an array of shape $(N,L)$, such that $e_{ij}$ is $1$ if $i$-th example belongs to class $j$ and $0$ otherwise. $L$ is the number of classes.

For example, array $(1,0,2,1,1,2,0)$ is transformed to

$$
\left(
\begin{array}{ccc}
0 && 1 && 0\\
1 && 0 && 0\\
0 && 0 && 1\\
0 && 1 && 0\\
0 && 1 && 0\\
0 && 0 && 1\\
1 && 0 && 0
\end{array}
\right)
$$

This anction will be tested against three input arrays of random shape $(n,)$ ($10 \leq n < 100 $) filled with random integers.

In [16]:
def onehot(labels):
    # your code goes here
    l = len(np.unique(labels))
    n = labels.shape[0]
    shape = (n,l)
    one_hot = np.zeros(shape)
    one_hot[np.arange(n), labels] = 1
    return one_hot

In [17]:
PROBLEM_ID = 7

if TEST:
    total_grade += solutions.check(STUDENT, PROBLEM_ID, onehot)

### 8. Fixing missing values.

Given an array, which contains some $NaN$s (not-a-number, represented as `np.nan`), positive and negative infinities (represented as `np.inf`), contruct a "repaired" version of that array. All missing or broken values must be replaced by average of valid elements of an array.

For example, array $(0., np.nan, 2., np.inf)$ must be transformed to $(0., 1., 2., 1.)$. Input arrays will be drawn from standard normal distribution, with small fraction of values transformed to either `np.nan`, `np.inf` or `-np.inf`.

In [41]:
def fix(arr):
    # your code goes here
    avg = np.mean(arr[np.isfinite(arr)])
    return np.nan_to_num(arr, nan=avg, posinf=avg, neginf=avg)

In [42]:
PROBLEM_ID = 8

if TEST:
    total_grade += solutions.check(STUDENT, PROBLEM_ID, fix)

### 9. Calculate class distribution.

Given 1-dimensional array of class labels, calculate occurrence of each class.

For example, array $(1,0,2,1,1,2,0)$ is transformed to $(2/7, 3/7, 2/7)$. Note the ordering and consider using one-hot representation to calculate class counts.

In [44]:
labels = np.array([1,0,2,1,1,2,0])


array([0.28571429, 0.42857143, 0.28571429])

In [45]:
def class_freq(labels):
    # your code goes here
    return np.bincount(labels)/len(labels)

In [46]:
PROBLEM_ID = 9

if TEST:
    total_grade += solutions.check(STUDENT, PROBLEM_ID, class_freq)

# Your grade

In [None]:
if TEST:
    print(f"{STUDENT}: {int(100 * total_grade / MAX_POINTS)}")