# QUESTION 1

Since $H'$ is a subset of $H$, the deterministic noise should, in general, increase, as $H'$ has less hypotheses available than $H$. Therefore, the correct alternative is **alternative b.**

# SETUP CODE FOR QUESTIONS 2 - 6

Run this before running the code for any of the above questions

In [12]:
from hw6_func import *

#Initialize a data set of x (inputs) and y (target values)
x,y = read_hw6_datasets('in.dta')

#Transform x inputs
x_tilde = hw6_transform(x)

#Now create a test dataset
x_test,y_test = read_hw6_datasets('out.dta')

#Transform x_test inputs
x_test_tilde = hw6_transform(x_test)

def questions2to6(k=None,error_in=True,error_out=True):
    """
    Runs the required Linear Regression code for questions 2 through 6
    Input is the k value to use lambda = 10**k in questions 3 and beyond.
    If k is set to none, then lambda=0
    error_in = True -> evaluate in-sample error
    error_out = True -> evaluate out of sample error
    """

    #Set lamb value
    if not k:
        lamb = 0
    else:
        lamb = 10**k

    #Run linear regression
    #Initialize a LinearRegression object with the x and y lists
    linreg = LinearRegression(x_tilde,y)

    #Calculate the linear regression
    linreg.learn(lamb=lamb)

    if error_in and not error_out:
        #Test learning in sample
        e_in = linreg.test_learning(x_tilde,y)
        return e_in

    elif error_out and not error_in:
        #Test learning out of sample
        e_out = linreg.test_learning(x_test_tilde,y_test)
        return e_out

    else:
        #Test learning in sample
        e_in = linreg.test_learning(x_tilde,y)

        #Test learning out of sample
        e_out = linreg.test_learning(x_test_tilde,y_test)
        return (e_in,e_out)
    

# QUESTION 2 (CODE)

In [13]:
#Run the Linear Regression code with k=None (leads to lambda = 0)
e_in,e_out = questions2to6(k=None)

# Print results
print(f'Ein  = {round(e_in,2)}')
print(f'Eout = {round(e_out,2)}')


Ein  = 0.03
Eout = 0.08


# QUESTION 2 (ANSWER)

From the above code, we can see that the correct alternative is **alternative a.**

# QUESTION 3 (CODE)

In [14]:
#Run the Linear Regression code with k=-3
e_in,e_out = questions2to6(k=-3)

# Print results
print(f'Ein  = {round(e_in,2)}')
print(f'Eout = {round(e_out,2)}')

Ein  = 0.03
Eout = 0.08


# QUESTION 3 (ANSWER)

From the above code, we can see that the correct alternative is **alternative d.**

# QUESTION 4 (CODE)

In [15]:
#Run the Linear Regression code with k=3
e_in,e_out = questions2to6(k=3)

# Print results
print(f'Ein  = {round(e_in,1)}')
print(f'Eout = {round(e_out,1)}')

Ein  = 0.4
Eout = 0.4


# QUESTION 4 (ANSWER)

From the above code, we can see that the correct alternative is **alternative e.**

# QUESTION 5 (CODE)

In [17]:
#Initialize a list of the k values in the alternatives
k_list = [2, 1, 0, -1, -2]

#Initialize e_out as an empty list
e_out = []

# Run the linear regression for all k values
for k in k_list:
    e_out.append(questions2to6(k=k,error_in=False))

#Find minimum error and corresponding k
e_out_min = min(e_out)
k_min = k_list[e_out.index(e_out_min)]

# Print results
print(f'Minimum Eout  = {round(e_out_min,2)}')
print(f'Corresponding k = {k_min}')

Minimum Eout  = 0.06
Corresponding k = -1


# QUESTION 5 (ANSWER)

From the above code, we can see that the correct alternative is **alternative d.**

# QUESTION 6 (CODE)

In [28]:
# Brute force: check a wide spectrum of k values to find the minimum
k_list = range(-20,20)

#Initialize e_out as an empty list
e_out = []

# Run the linear regression for all k values
for k in k_list:
    e_out.append(questions2to6(k=k,error_in=False))

#Find minimum error and corresponding k
e_out_min = min(e_out)
k_min = k_list[e_out.index(e_out_min)]

# Print results
print(f'Minimum Eout  = {round(e_out_min,2)}')
print(f'Corresponding k = {k_min}')

Minimum Eout  = 0.06
Corresponding k = -1


# QUESTION 6 (ANSWER)

From the above code, we can see that the correct alternative is **alternative b.**

# QUESTION 7

Comparing the two given equations, we can immediately eliminate **alternatives b and d**, as these alternatives have $C = 1$, which will lead to the second equations having different coefficients when compared to the first one for $q \geq Q_o$. We can also readily eliminate **alternative a**, as the union of the two sets would lead to repeat elements.

Analysing alternative c, we have that:

\begin{align}
H(10,0,3) = \Big\lbrace h | h(x) &= \sum_q^2 w_q L_q(x) \Big\rbrace \\ \\
H(10,0,4) = \Big\lbrace h | h(x) &= \sum_q^3 w_q L_q(x) \Big\rbrace \\ \\
H(10,0,3) \cap H(10,0,4) = \Big\lbrace h | h(x) &= \sum_q^2 w_q L_q(x) \Big\rbrace = H_2
\end{align}

Therefore, **alternative c is correct!**


# QUESTION 8

For forward propagation, we have 22 total steps. This is equivalent to 1 step per weight, and we have 22 total weights. Of these,  we have 18 weights connecting layers $l=0$ and $l=1$ ($(d_0 + 1) \times (d_1 +1) = 18$) and 4 weights connecting layers $l=1$ and $l=2$ ($(d_1 +1) * d_2 = 4$; no artificial node appears in the output layer).

For back propagation, there are only 3 steps. We only update $\delta$ in the single hidden layer $l=1$ and there is no $\delta$ associeated with the artificial node.

Finally, for updating the weights, once again we take 22 total steps, one per each weight.

Therefore, the total number of steps is $N = 22 + 3 + 22 = 47$. **The correct alternative is alternative d.**

# QUESTION 9

The total number of weights of the neural network of $L$ layers can be written, in more general form, as:

\begin{equation}
N_W = \sum_{l=0}^{L-2} n^{(l)} (n^{(l+1)}-1) + n^{(L-1)}n^{L}
\end{equation}

Where $n^{(l)}$ is the number of nodes on layer $l$. For the present problem, this is subjected to the restrictions:

\begin{align}
n^{(0)} &= 10 \\
n^{(L)} &= 1 \\
\sum_{l=1}^{L-1} n^{(l)} &= 36 \\
n^{(l)} &> 1 \text{ for } l < L
\end{align}

The minimum number of connections (weights) occurs if all hidden layers have just a two units unit, resulting in 18 total hidden layers. In this scenario, the first equation reduces to:

\begin{equation}
N_W = \sum_{l=0}^{18} n^{(l)} (n^{(l+1)}-1)  + n^{(L-1)}n^{L}
\end{equation}

Subject to:

\begin{align}
L = 20 \\
n^{(0)} &= 10 \\
n^{(20)} &= 1 \\
n^{(l)} &= 2 \text{ for } 1 < l < 19
\end{align}

Therefore,
\begin{equation}
N_W = \sum_{l=0}^{18} n^{(l)} (n^{(l+1)}-1) + n^{(L-1)}n^{L} = (10 \times 1) + 17(2 \times 1) + (2 \times 1)
\end{equation}

# QUESTION 10

Here, we use the same equation for the total number of nodes derived for Question 9 above, but instead we use only 1 or 2 hidden layers; using more introduces more artificial points with no incoming weights, which reduces the total number of weights. For 1 hidden layer, the total number of weights is:

\begin{equation}
N_W = 10 (36 - 1) + (36 \times 1) = 386
\end{equation}

For 2 hidden layers, many possible options are available. The code snippet below explores all possible options for 2 hidden layers:

In [35]:
def unit_sum(l0=10,l1=2,l2=34,l3=1):

    return l0*(l1-1) + l1*(l2-1) + l2*l3

#Number of units in first and last layers
l0 = 10
l3 = 1

#Hidden layers
hidden = 2

# Explore all combinations with two hidden layers
nodes = 36

# Both layers need at least 2 nodes, leaving us with 32 leftover nodes
leftover = 36 - 2*hidden

# Start max unit at 0 and iterate
max_unit = 0
max_l1 = 0
max_l2 = 0

# Explore all combinations
for i in range(leftover+1):
    l1 = 2+i #First hidden layer will have this many units
    l2 = 2+(leftover-i)

    units = unit_sum(l0=l0,l1=l1,l2=l2,l3=l3)

    if units > max_unit:
        max_unit = units
        max_l1 = l1
        max_l2 = l2

print(max_unit)
print(max_l1)
print(max_l2)

510
22
14


Therefore, we find that the maximum number of weights possible is **N_W = 510, alternative e.**. This combination is possible with 22 units in one of the two hidden layers and 14 units in the other hidden layer.