# Assignment 5 {-}
## Due May 23 at 12:00 {-}

Please note: 

- Read the instructions in the exercise PDF and in this notebook carefully.
- Add your solutions *only* at `YOUR CODE HERE`/`YOUR ANSWER HERE` and remove the corresponding `raise NotImplementedError()`.
- Do not chance the provided code and text, if not stated.
- Do not *add* or *delete* cells.
- Do not `import` additional functionality. 
- Before submitting: Please make sure, that your notebook can be executed from top to bottom `Menu -> Kernel -> Restart & Run all`. 

# Exercise 1 (Lagrange multipliers, 2 points)

\begin{align*}
    &\max x+y \\
    &\text{s.t. } x^2+2y^2 \leq 5 \\
    L(x, y, \alpha) &= (x+y) - \alpha (x^2+2y^2-5) \\
    \frac{\partial L}{\partial x} &= 1  - 2\alpha x \overset{!}{=} 0 \\
    \alpha &= \frac{1}{2x} \\
    \frac{\partial L}{\partial y} &= 1  - 4\alpha y \overset{!}{=} 0 \\
    \alpha &= \frac{1}{4y} \\
    \frac{1}{2x} &= \frac{1}{4y} \\
    x &= 2y \\
    L(y,\alpha) &= (2y+y) - \alpha ((2y)^2+2y^2-5) \\
    \frac{\partial L}{\partial \alpha} &= (2y)^2+2y^2-5 \overset{!}{=} 0\\
    4y^2 + 2y^2 &= 5 \\
    y^2 &= \frac{5}{6} \\
    y &= \sqrt{\frac{5}{6}} \\
    x &= 2\sqrt{\frac{5}{6}} = \sqrt{\frac{10}{3}} \\
    \alpha &= \frac{1}{2x} \geq 0 \\
\end{align*}
The constraint is active. Geometrically this can be explained, by thinking about the uncontrained solution of $\max x+y$ and the ellipsoid, that $x^2+2y^2=5$ describes. We can then easily see, that the solution of the constrained problem lies at a point on said ellipsoid, not inside it.


# Exercise 2 (Linear and quadratic programs, 3+1+1=5 points)
## a)
<!---
\begin{align*}
    &\min_{x\in \mathbb{R}^n} c^T x \\
    \text{s.t. }
    &Ax \leq b \\
    &x \geq 0 \\
    L(\alpha, \beta, x) &= c^Tx - \alpha^T(Ax - b) + \beta^T x \\
    &= c^Tx - \alpha^T(Ax - b) + \beta^T x \\
    \frac{\partial L}{\partial x} &= c^T - \alpha^T A + \beta^T \overset{!}{=} 0 \\
    \frac{\partial L}{\partial \alpha} &= Ax-b \overset{!}{=} 0 \\
    \frac{\partial L}{\partial \beta} &= x \overset{!}{=} 0 \\
    \alpha^T &= (\beta^T + c^T)A^{-1} \\
    g(\alpha, \beta) &= c^Tx - \left((\beta^T + c^T)A^{-1} \right)Ax + \beta^Tx \\
    \text{Dual:} \\
    g(\alpha, \beta) &= g
\end{align*}
--->

## b) 
\begin{align*}
    E &= \frac{1}{2}x^TQx + c^Tx \\
    \frac{\partial E}{\partial x} &= x^TQ + c \\
    \frac{\partial^2 E}{\partial x^2} &= Q
\end{align*}
When we derive the problem twice, we can see the $Q$ is the Hessian matrix. Therefore we require $Q$ to be positive semi-definite for the problem to be convex.

## c)

This means that, as the solutions for the dual and the primal are equal, it does not matter wether we solve the primal or the dual problem.

# Exercise 3 (Primal hard margin SVM problem, 3 points)

Any solution of this problem is subject to:  
$$Y_i(w^TX_i+b)\geq1 \;\forall i=1,\dots,n$$  
If the hyperplane is not in canonical representation, it holds:
$$\min_{i} |w^TX_i+b| \neq 1 $$  
And thus, because $Y_i \in {-1,1}$:  
$$Y_i(w^TX_i+b)>1 \;\forall i=1,\dots,n$$  
Then $$\exists \hat w \text{ with } \frac{1}{2}\|\hat w\|^2 < \frac{1}{2}\| w\|^2 $$  
in other words, $w$ does not minimize the objective, and is not a solution.  
While $$Y_i(\hat w^TX_i+b)\geq1 \;\forall i=1,\dots,n$$
and $$\exists i\; Y_i(\hat w^TX_i+b)=1 $$  
and therefore $$\min_{i} |\hat{w}^TX_i+b| = 1$$
in other words, $\hat w$ is a valid solution and a canonical Hyperplane.

In [1]:
import time 

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from numpy.testing import assert_equal, assert_almost_equal

from sklearn import preprocessing
from sklearn.datasets import load_breast_cancer
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression

#  Hide warnings of LinearSVC, LogisticRegression

import warnings
from sklearn.exceptions import ConvergenceWarning
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=ConvergenceWarning)

np.random.seed(42)

## a) Know the Dataset


- **Features:**  
    Characteristics of Cell nuclei in an image of a possible breast cancer sample, such as size, texture, compactness, ...  
    For all of these, the mean, std and worst value is given.
- **Labels:**  
    Diagnosis, wheter sampled tumor is malignant (non-cancerous) or benign (cancerous)

In [37]:
dataset = load_breast_cancer()

xs = dataset.data
ys = dataset.target

Split `xs, ys` to the training set `xs_train, ys_train` (size $70\%$) and the test set `xs_test, ys_test` (size $30\%$) 

## b)/c) Optimize the Hyperparameters: Linear SVM vs Logistic Regression


In [38]:
# center and normalize

scaler = preprocessing.StandardScaler().fit(xs)
xs = scaler.transform(xs)

# split into training and test test
n_train = int(len(xs) * .7)

xs_train, xs_test = xs[:n_train], xs[n_train:]
ys_train, ys_test = ys[:n_train], ys[n_train:]

# evaluate standard SVM
svm_estimator = LinearSVC().fit(xs_train,ys_train)
print('We can do better than ', np.mean(svm_estimator.predict(xs_test) != ys_test))

We can do better than  0.03508771929824561


Find good hyperparameters *without* the test set.

In [106]:
# DO NOT USE xs_test, ys_test here!
# Please make your optimization reproducable (e.g. set random_state, seed, …)
# LinearSVC and LogisticRegression
seed = 42

no_CV_parts = 5

best_params_svm, best_params_lr = {}, {}
cv_errors_svm, cv_errors_lr = [],[]

Cs =  [0.001, 0.0035, 0.01, 0.4037, 0.1, 1, 10, 100, 1000]

np.random.seed(seed)
indices = np.array(range(len(xs_train)))
np.random.shuffle(indices)
xss_train = xs_train[indices]
yss_train = ys_train[indices]

splits = [int(len(xs_train) * 1/no_CV_parts * i) for i in range(no_CV_parts+1)]

xs_parts = [xss_train[splits[i]: splits[i+1]] for i in range(no_CV_parts)]
ys_parts = [yss_train[splits[i]: splits[i+1]] for i in range(no_CV_parts)]

for C in Cs:
    cv_errors_svm_all = []
    for i in range(len(parts)):
        xp_train = np.concatenate([xs_parts[j] for j in range(no_CV_parts) if j!=i])
        xp_test = xs_parts[i]
        
        yp_train = np.concatenate([ys_parts[j] for j in range(no_CV_parts) if j!=i])
        yp_test = ys_parts[i]
        
        svm_estimator = LinearSVC(C=C, random_state=seed).fit(xp_train, yp_train)
        cv_errors_svm_all.append(1-svm_estimator.score(xp_test, yp_test))
    cv_errors_svm.append(np.mean(cv_errors_svm_all))
    
    cv_errors_lr_all = []
    for i in range(len(parts)):
        xp_train = np.concatenate([xs_parts[j] for j in range(no_CV_parts) if j!=i])
        xp_test = xs_parts[i]
        
        yp_train = np.concatenate([ys_parts[j] for j in range(no_CV_parts) if j!=i])
        yp_test = ys_parts[i]
        
        lr = LogisticRegression(C=C, random_state=seed).fit(xp_train, yp_train)
        cv_errors_lr_all.append(1-lr.score(xp_test, yp_test))
    cv_errors_lr.append(np.mean(cv_errors_lr_all))
    

svm_i = np.argmin(cv_errors_svm)
best_params_svm = {'C': Cs[svm_i]}


lr_i = np.argmin(cv_errors_lr)
best_params_lr = {'C': Cs[lr_i]}

print(f"Cross validation errors of best model:\nLinear SVM {cv_errors_svm[svm_i]:.4f}, with param C={best_params_svm['C']}\nLogistic Regression {cv_errors_lr[lr_i]:.4f}, with param C={best_params_lr['C']}")

best_estimator_svm = LinearSVC(**best_params_svm, random_state=seed).fit(xs_train, ys_train)
best_estimator_lr = LogisticRegression(**best_params_lr, random_state=seed).fit(xs_train, ys_train)

Cross validation errors of best model:
Linear SVM 0.0226, with param C=0.1
Logistic Regression 0.0251, with param C=0.4037


In [124]:
# if you did not solve the cross validation, use this:
svm_estimator = LinearSVC(C=0.0035, random_state=42).fit(xs_train, ys_train)
lr_estimator = LogisticRegression(C=0.4037,random_state=42).fit(xs_train, ys_train)
# else, use this:

#svm_estimator = best_estimator_svm
#lr_estimator = best_estimator_lr

svm_score = svm_estimator.score(xs_test, ys_test)
lr_score = lr_estimator.score(xs_test, ys_test)

print(f"Test errors of best model:\nLinear SVM:{1-svm_score:.4f} \nLogistic Regression: {1-lr_score:.4f}")

Test errors of best model:
Linear SVM:0.0117 
Logistic Regression: 0.0292


What do you observe and why?

Answer: YOUR ANSWER HERE

## d) State concerns

1. **Ethical:** 
    YOUR ANSWER HERE

2. **Technical/Statistical:**
    YOUR ANSWER HERE

3. **Any:**
    YOUR ANSWER HERE