# QUESTION 1

Solving the given equation for N, we get to the form

\begin{equation}
N = \frac{(d+1)\sigma^2}{\sigma^2-E_{in}}
\end{equation}

To find the $N$ that gives $E_{in} = 0.08$ with $\sigma = 0.1$ and $d=8$, we can substitute these values, arriving at:

\begin{equation}
\boxed{N = 45}
\end{equation}

Analysing the equation given by the problem, we can see that, as $N$ increases, so does the expected value of $E_in$. Therefore, the first alternative with $N > 45$ is the correct answer ("which among the following choices is the smallest
number of examples N that will result in an expected Ein greater than 0.008?"). Therefore, the correct alternative is **alternative c (N_c = 100).**



# QUESTION 2

The hypothesis in the transformed space will be of form:

\begin{equation}
h(s) = \text{sign}(\tilde{w}_0 + \tilde{w}_1 x_1^2 + \tilde{w}_2 x_2^2) 
\end{equation}

From the picture given in the problem, we know that the origin $x_1 = x_2 = 0$ has a +1 value, so that $h(s) = +1$ in that point. From this, it follows that $\tilde{w}_0 = +1$. Next, we can look at values along the horizontal axis ($x_2 = 0$) where $x_1$ is large enough in either direction so that we end in the negative regions of the classification. In this situation, the hypothesis simplifies to:

\begin{equation}
h(s) = \text{sign}(\tilde{w}_0 + \tilde{w}_1 x_1^2) = -1
\end{equation}

Since $\tilde{w}_0 = +1$ and $x_1 ^2 > 0$, we must have $\tilde{w}_1 < 0$ for this equation to hold. Now consider a different point in the transformed space with the same $x_1$ coordinate, but a higher value of $x_2$ so that it $h(s)$ once again falls into the region classified as $+1$. In this case, we once again work with the complete hypothesis:

\begin{equation}
h(s) = \text{sign}(\underbrace{\tilde{w}_0}_{=+1} + \underbrace{\tilde{w}_1 x_1^2}_{< -1} + \tilde{w}_2 x_2^2) = +1
\end{equation}

From the previous discussion, we know that $\tilde{w}_0 = +1$ and that not only $\tilde{w}_1 < 0$, but also more specifically that $\tilde{w}_1 x_1^2 < -1$, so that the point $(x_1,0)$ can fall into the negative region. If we are now taking a point $(x_1,x_2)$ with $x_2$ large enough to fall into the positive region, it folows that we must have $\tilde{w}_2 > 0$, so that the above equation can hold.

Therefore, it has been found that $\tilde{w}_1 < 0$ and $\tilde{w}_2 > 0$. The alternative that correctly describes this scenario is **alternative d**.

# QUESTION 3

From the $\Phi$ function given, we know there are $14$ parameters in the model (not counting the fixed parameter 1), which gives us a VC dimension $d_{VC} = 14+1 = 15$.  Therefore, the smallest alternative that is not smaller than $d_{VC}$ is **alternative c (15).**

# QUESTION 4

Applying the chain rule, the correct alternative is **alternative e**.

# QUESTIONS 5 AND 6 (CODE)

In [14]:
from math import e

#Implement functions for the error function and its two partial derivatives

def E(u,v):
    return (u*e**v - 2*v*e**-u)**2

def dEdu(u,v):
    return 2*(u*e**v - 2*v*e**-u)*(e**v + 2*v*e**-u)

def dEdv(u,v):
    return 2*(u*e**v - 2*v*e**-u)*(u*e**v - 2*e**-u)

def gradE(u,v):
    return dEdu(u,v) + dEdv(u,v)

#Initialize u and v
u=1
v=1

#Evaluate initial error
error=E(u,v)

#Iterate until error is smaller than a tolerance and count iterations
tol=10**-14
it=0

#Learning rate
eta=0.1

 
while True:
    #Save old error value
    error_old = error
    
    #Increment iteration counter
    it +=1

    #Evalute new u and v from gradient descent -> a step in the gradient
    u_new = u - eta*dEdu(u,v)
    v -= eta*dEdv(u,v)
    u = u_new #Only overwrite u later to properly evaluate v

    #Calculate new error value
    error = E(u,v)

    if (error < tol or it > 1000):
        break

print(it)
print(f'(u,v) = ({round(u,3)},{round(v,3)})')


10
(u,v) = (0.045,0.024)


# QUESTIONS 5 AND 6 (ANSWERS)

### Question 5
From the above code, we can see the correct alternative is **alternative d (10 iterations)**.

### Question 6
From the above code, we can see the correct alternative is **alternative e ((u,v) = (0.045,0.024))**.

# QUESTION 7 (CODE)

In [15]:
from math import e

#Implement functions for the error function and its two partial derivatives

def E(u,v):
    return (u*e**v - 2*v*e**-u)**2

def dEdu(u,v):
    return 2*(u*e**v - 2*v*e**-u)*(e**v + 2*v*e**-u)

def dEdv(u,v):
    return 2*(u*e**v - 2*v*e**-u)*(u*e**v - 2*e**-u)

def gradE(u,v):
    return dEdu(u,v) + dEdv(u,v)

#Initialize u and v
u=1
v=1

#Evaluate initial error
error=E(u,v)

#Repeat for 'it' iterations
it = 15

#Learning rate
eta=0.1

 
for i in range(it):
    #Save old error value
    error_old = error
    
    #Increment iteration counter
    it +=1

    # Evalute new u from coordinate descent
    u -= eta*dEdu(u,v)

    # Now evaluate v
    v -= eta*dEdv(u,v)

    #Calculate new error value
    error = E(u,v)


print(error)
print(f'(u,v) = ({round(u,3)},{round(v,3)})')


0.13981379199615324
(u,v) = (6.297,-2.852)


# QUESTION 7

From the above code, we can see that the error term, after 15 iterations, has order of $10^{-1}$. Therefore, the correct alternative is **alternative a.**.

# QUESTION 8 (CODE)

In [7]:
import random

a=[[1,2],[3,4],[5,6]]
b=random.choice(a)
b

[3, 4]

In [9]:
a[a.index([3,4])]

[3, 4]

In [23]:
import numpy as np

a=[1,2,3]
print(np.linalg.norm(a,ord=2,axis=0))

3.7416573867739413
