 # Gradient descent - continuation
 
 ##  Multiple inputs

Imagine we have one review, that had a *positive* label and the occurrences for words *beatifully, masterpiece, movie, dull,* and *disappointing* were 1 (yes) , 0 (no),  1,    0 and   0, respectively. That means:

Input values: $\mathbf{x} = (1,  0, 1, 0, 0)'$

True label: $y = 1$

- Consider weights $A=(-0.46, -0.25,  0.14,  0.82, -0.59)$. What is the corresponding error? 
- Update the weights with step $\alpha = 0.1$. What are the final weights?
- If you change the starting weights, do you get the same final weights?

In [3]:
import numpy as np
input_data = np.array([[1], [0], [1], [0], [0]])
goal = 1
weights = np.array([-0.46, -0.25,  0.14,  0.82, -0.59])
print(input_data.shape)
print (weights.shape)

prediction = np.matmul(weights, input_data)    # <-- Prediction #weights @ input_data #weights.dot(input_data)
gap =  prediction - goal  # <-- Difference between the prediction and the goal to predict
error = gap**2
print('Weight =', weights, '; gap = ', gap, '; error = ', error)

# To update the weights
alpha = 0.1
Ni =  200  #<-- Number of iterations
for iteration in range(200):
    weights = weights - alpha*gap*input_data.T # 1 x 5 shape
    # update the prediction, gap, error
    prediction = np.matmul(weights, input_data)
    gap = prediction - goal
    error = gap**2
print('Weight =', weights, '; gap = ', gap, '; error = ', error)

(5, 1)
(5,)
Weight = [-0.46 -0.25  0.14  0.82 -0.59] ; gap =  [-1.32] ; error =  [1.7424]
Weight = [[ 0.2  -0.25  0.8   0.82 -0.59]] ; gap =  [[-1.11022302e-16]] ; error =  [[1.23259516e-32]]


Extend this example for 4 the following instances:
$$
   \begin{array}{l|cccccc}
   & \mbox{beatifully}& \mbox{masterpiece} & \mbox{movie} & \mbox{dull} &\mbox{disappointing} & \mbox{ Output}\\
r1     &     1      &     0 &    1 &   0    &         0 & 1\\
r2     &     0   &        1  &   1 &   0   &          0 & 1\\
r3     &     0     &      0  &   1 &   1   &          0 & 0\\
r4     &     1    &       0  &   1 &   1   &          1 & 0\\
\end{array}
$$

Start with weights 𝐴=(−0.46,−0.25,0.14,0.82,−0.59) and step $\alpha = 0.1$. What is the total error at the end of 200 iterations?

In [1]:
import numpy as np
X = np.array([[1, 0, 0, 1], [0, 1, 0, 0], [1, 1, 1, 1], [0, 0, 1, 1], [0, 0, 0, 1]])
Y = np.array([[1],[1], [0], [0]])
weights = np.array([-0.46, -0.25,  0.14,  0.82, -0.59])


alpha = 0.1
Ni =  200  #<-- Number of iterations
for iteration in range(200):
    error_for_all = 0
    for ind in range(4):
        input_data = X[:, [ind]]
        goal = Y[ind]
        prediction = np.matmul(weights, input_data)
        gap = prediction - goal
        error = gap**2
        weights = weights - alpha*gap*input_data.T # 1 x 5 shape
        prediction = np.matmul(weights, input_data)
        
print('Weight =', weights, '; gap = ', gap, '; error = ', error_for_all)
print('Predictions =\n', np.matmul(weights, X))

Weight = [[ 0.51200102  0.51199212  0.48800449 -0.48799866 -0.51201001]] ; gap =  [[-5.26551771e-06]] ; error =  0
Predictions =
 [[ 1.00000551e+00  9.99996615e-01  5.82833817e-06 -3.15931063e-06]]


 ##  Multiple inputs and multiple outputs


Input values: $\mathbf{x} = (1.5, 0.65, 1.3)'$

True label: $\mathbf{y} = (1.2, 1.4)'$

Starting weights $A = \left(\begin{array}{ccc}
1 & 1 &1.1\\
1 & 3 & 2\\
\end{array}\right)$

- Calculate the first prediction. What is the dimension? Calculate the corresponding error.
- Iterate 200 times the gradient descent with $\alpha= 0.1$. What are the final weights and corresponding error?

In [5]:
import numpy as np

input_data = np.array([[1.5], [0.65], [1.3]])
goal = np.array([[1.2], [1.4]])
weights = np.array([[1, 1, 1.1], [1, 3, 2]])
#print (goal.shape)

prediction =  np.matmul(weights, input_data)# <-- Calculate the prediction
print('Prediction: ', prediction, '\n')

gap = prediction - goal # <-- Difference between prediction anf goal to predict
print('Gap:\n', gap, '\n')
print('Gap**2:\n ', gap**2, '\n')
print('Sum Gap**2: ', sum(gap**2), '\n')

error = sum(gap**2) # <--  Calculate the error
print('Weight =', weights, '\n gap = ', gap, '\n error = ', error, '\n')

Ni = 200
# update the weight
for iteration in range(Ni):
    prediction =  np.matmul(weights, input_data)
    gap = prediction - goal
    error = sum(gap**2)
    weights = weights - alpha*np.matmul(gap, input_data.T)
    
print('Weight =', weights, '\n gap = ', gap, '\n error = ', error, '\n')

(2, 1)
Prediction:  [[3.58]
 [6.05]] 

Gap:
 [[2.38]
 [4.65]] 

Gap**2:
  [[ 5.6644]
 [21.6225]] 

Sum Gap**2:  [27.2869] 

Weight = [[1.  1.  1.1]
 [1.  3.  2. ]] 
 gap =  [[2.38]
 [4.65]] 
 error =  [27.2869] 

Weight = [[ 0.18166189  0.64538682  0.39077364]
 [-0.59885387  2.30716332  0.61432665]] 
 gap =  [[-2.22044605e-16]
 [ 0.00000000e+00]] 
 error =  [4.93038066e-32] 



 ## Adding a second layer

Input values: $\mathbf{x} = (1, 0, 1)'$

True label: $y = 1$

We will build an intermediate layer of dimension 4, $F(\mathbf{x}) = B(ReLU(A\mathbf{x}))$.

Starting weights $A = \left(\begin{array}{ccc}
0.17 & 0.71 & -0.20\\
0.44 & 0.82 & 0.08\\
1.00 & 0.63 & -0.16\\
-0.40 & 0.31 & 0.37\\
\end{array}\right)$ and $B= (-0.59 \; 0.76 \; 0.95  \; 0.34)$

- Compute the first prediction, layer by layer. What is the gap? And the error?

In [16]:
import numpy as np
import matplotlib.pyplot as plt

x = np.array([[1], [0], [1]]) # Input data
y = 1 # <-- Goal to predict
# Starting weights
A = np.array([[0.17, 0.71, -0.2], [0.44, 0.82 , 0.08], [1.00, 0.63, -0.16], [-0.40, 0.31 , 0.37]] ) # 4 x 3
B = np.array([[-0.59, 0.76, 0.95, 0.34]])

#print(A) matrix
#print(B) vector

def ReLU(x):
    return (x >= 0)*x  # <-- ReLU function
def derivReLU(x):
    return 1*(x > 0) # <-- Derivative of ReLU

#print("A", A.shape)
#A=(4,3) (4rows, 3col)
#print("X", x.shape) 
#X=(3,1)
#print("B", B.shape)
#B=(1,4) (1 row, 4 col)

input_data = x
layer1 = ReLU(np.matmul(A, input_data))#(4,3)*(3,1)    # <-- # Prediction for first layer
print('First layer: ', layer1 )
#print("First layer sh", layer1.shape) #layer1=(4,1)

pred =  np.matmul(B, layer1)   # <--  Prediction for the second layer
print('Prediction Last layer: ', pred)
#print("Sec layer sh", pred.shape) #pred=(1,1)

goal = y
#y = 1
gap =  pred - goal   # <--  Gap
error =  gap**2  # <-- error
print('Gap= ', gap, 'Error = ', error)

First layer:  [[-0.  ]
 [ 0.52]
 [ 0.84]
 [-0.  ]]
Prediction Last layer:  [[1.1932]]


AttributeError: 'int' object has no attribute 'shape'

Update the weights: first, the weights of the second layer, then those of the first one. (Use step $\alpha = 0.1$ and 200 iterations)

In [34]:
alpha = 0.1 # <-- step of the function
Ni = 200 #<-- Number of iterations

def ReLU(x):
    return (x >= 0)*x
def derivReLU(x):
    return 1*(x > 0)

input_data = x
goal = y
for iteration in range(Ni):
    pre_layer1 =  np.matmul(A, input_data)# <-- Ax product, pre-layer 1 
    layer1 =  ReLU(pre_layer1)   # <-- At layer 1
    pred =  np.matmul(B, layer1)    # <-- Prediction at layer 2
    gap = pred - goal
    error = gap**2
    # Update second layer's weights
    B = B - alpha * gap*layer1.T
    # Update first layer's weights in two steps
    aux = np.multiply(B.T, derivReLU(pre_layer1)) # <-- Multiply arguments element-wise, B and derivReLU evaluated at Ax
    A =  A - alpha * gap* np.matmul(aux, input_data.T)# <-- update A
print('Final weights:', A, B)



Final weights: [[ 0.17        0.71       -0.2       ]
 [ 0.40114708  0.82        0.04114708]
 [ 0.9517511   0.63       -0.2082489 ]
 [-0.4         0.31        0.37      ]] [[-0.59        0.73442204  0.90809338  0.34      ]]


Repeat the procedure for several units of observations:
Input values: $X = \left(\begin{array}{ccc}
1 & 0  &1 \\
0 &1 &1\\
0 & 0 & 1\\
1 & 1 & 1\\
\end{array}\right)$

True label: $\mathbf{y} = (1, 1, 0, 0)'$

Use the same structure and the same starting weights with step $\alpha = 0.1$ and 200 iterations.
Starting weights $A = \left(\begin{array}{ccc}
0.17 & 0.71 & -0.20\\
0.44 & 0.82 & 0.08\\
1.00 & 0.63 & -0.16\\
-0.40 & 0.31 & 0.37\\
\end{array}\right)$ and $B= (-0.59 \; 0.76 \; 0.95  \; 0.34)$


What is the error? What are the predictions you get?

In [27]:
import numpy as np
import matplotlib.pyplot as plt

x = np.array([[1, 0, 0, 1], [0, 1, 0, 1], [1, 1, 1, 1]]) # <-- Input data
y = np.array([[1], [1], [0], [0]]) # <-- Goal to predict

A = np.array([[0.17, 0.71, -0.2], [0.44, 0.82 , 0.08], [1.00, 0.63, -0.16], [-0.40, 0.31 , 0.37]] ) # 4 x 3
B = np.array([[-0.59, 0.76, 0.95, 0.34]]) #1x4

def ReLU(x):
    return (x >= 0)*x
def derivReLU(x):
    return 1*(x > 0)

#print("X", x.shape) #(3,4)
#print("Y", y.shape) #(4,1)
#print("A", A.shape) #(4,3)
#print("B", B.shape) #(1,4) vector

alpha = 0.1 # <- step of the function
Ni = 200

for iteration in range(Ni):
    for ind in range(4):
        input_data = x[:, [ind]] #(3,1[each 1 out of 4]))
        goal = y[ind] #(4,1)
        #print("goal", goal.shape)
        pre_layer1 =  np.matmul(A, input_data)#(4,3)*(3,1)[*4 times]
        layer1 =  ReLU(pre_layer1)   # (4,1)*[4 times]
        pred =  np.matmul(B, layer1)    #(1,4)*(4,1)=(1,1)*[4 times]
        #print(pred.shape) 
        #print(pred)
        gap = pred - goal #(1,1)-(4,1)
        #print("gap:",gap)
        #print(gap.shape) #(1,1) many times?
        error = gap**2
        # Update second layer's weights
        B = B - alpha * gap*layer1.T
        # Update first layer's weights in two steps
        aux = np.multiply(B.T, derivReLU(pre_layer1)) # <-- Multiply arguments element-wise, B and derivReLU evaluated at Ax
        A =  A - alpha * gap* np.matmul(aux, input_data.T)# <-- update A
print('Final weights:', A, B)


pred = np.matmul(B, ReLU(np.matmul(A, x)))    # <-- Final predictions
gap2 = (pred- y.T)**2
total_error = np.sum(gap2)
print('Total error=', total_error)
print('Predictions:\n', pred)

Final weights: [[ 8.57771315e-01  1.13745339e+00 -8.57745896e-01]
 [ 5.05788348e-01  7.46415148e-01  5.03677531e-02]
 [ 1.13535948e+00  5.10178812e-01 -6.28361315e-04]
 [-8.03932850e-01  8.04785876e-01 -9.64607733e-04]] [[-1.42121403  0.34084107  0.7127118   0.94780922]]
Total error= 0.0002986637716262265
Predictions:
 [[ 9.98260981e-01  9.99083950e-01  1.71673990e-02 -2.84334271e-04]]


**Optional.** Repeat the above procedure for different values of $\alpha$ and different number of iterations. Choose the best one. What are the predictions?

In [54]:
import numpy as np
import matplotlib.pyplot as plt
A = np.array([[0.17, 0.71, -0.2], [0.44, 0.82 , 0.08], [1.00, 0.63, -0.16], [-0.40, 0.31 , 0.37]] ) # 4 x 3
B = np.array([[-0.59, 0.76, 0.95, 0.34]])
x = np.array([[1, 0, 0, 1], [0, 1, 0, 1], [1, 1, 1, 1]])  # <-- Input values
y = np.array([[1], [1], [0], [0]]) # <-- Goal to predict

def ReLU(x):
    return (x >= 0)*x
def derivReLU(x):
    return 1*(x > 0) #sarrera bektorearen dimentsioak mantentzen ditu

alpha_values = np.array([0.01, 0.1, 0.2]) # <- step of the function
Ni_values = np.array([100, 200, 1000]) # <-- Number of iteration
error = np.zeros(shape=(3,3))
for a in range(len(alpha_values)):
    alpha = alpha_values[a]
    for ni in range(len(Ni_values)):
        Ni = Ni_values[ni]
        for iteration in range(Ni):
            error[[a], [ni]] = 0
            for row_index in range(4):
                input_data = x[:, [row_index]]
                pre_layer1 = np.matmul(A, input_data)
                layer1 = ReLU(pre_layer1)
                pred = np.matmul(B, layer1)
                goal = y[row_index]
                gap = pred - goal
                error[[a], [ni]] =  error[[a], [ni]] +  gap*gap
                B = B - alpha * gap * layer1.T # 1 x 3 matrix
                aux = np.multiply(B.T, derivReLU(np.matmul(A, input_data)))
                A = A - alpha * gap * np.matmul(aux, input_data.T)
print(error)


[[6.37250766e-01 1.78290610e-01 5.17867136e-04]
 [1.01656591e-05 2.93375503e-09 6.23380209e-27]
 [5.84476428e-29 2.55170747e-24 2.60761676e-02]]
