CLASE 1
* Resolver el problema de la XOR con 3 neuronas:

1. Implementar SGD para resolver la XOR.
* > Resuelto 
2. ¿Cuántos parámetros desconocidos tiene el modelo?
* > El modelo tiene 9 (nueve) parametros desconocidos. Tres por cada neurona. o*(m+1), con o=3 y m=2
3. ¿Cuáles son los hiper-parámetros del modelo? Explicar que pasa cuando se
varía el learning rate.
* > **Hiperparametros** 
    *    Del lado de arquitectura: cantidad de layers y cantidad de neuronas por layer. 
    *    Del lado de gradiente descendiente: lr (learning rate)   , lr varia el paso de avance 
    *    Cantidad de iteraciones

4. Una vez entrenado el modelo, realizar predicciones para verificar el
funcionamiento
* > Resuelto

In [31]:
import numpy as np

def sigmoid(z):
  return 1/(1 + np.exp(-z))

In [32]:
np.random.seed(2)

# X train
X = np.array([[0, 0, 1, 1], [0, 1, 0, 1]])

# Y train
Y = np.array([[0, 1, 1, 0]])
m = X.shape[1]
# hiperparametros
input_dim = 2     
hidden_dim = 3    
output_dim = 1    
num_of_iters = 1000
learning_rate = 0.1

In [33]:
def initialize_parameters(input_dim, hidden_dim, output_dim):
  W1 = np.random.randn(hidden_dim,input_dim)/np.sqrt(input_dim)
  b1 = np.zeros((hidden_dim, 1))
  W2 = np.random.randn(output_dim, hidden_dim)/np.sqrt(hidden_dim)
  b2 = np.zeros((output_dim, 1))

  parameters = {
    "W1": W1,
    "b1" : b1,
    "W2": W2,
    "b2" : b2
  }
  return parameters

In [34]:
def forward_prop(X, parameters):
  W1 = parameters["W1"]
  b1 = parameters["b1"]
  W2 = parameters["W2"]
  b2 = parameters["b2"]

  Z1 = np.matmul(W1, X) + b1
  A1 = np.tanh(Z1)
  Z2 = np.matmul(W2, A1) + b2
  A2 = sigmoid(Z2)

  cache = {"A1": A1,
           "A2": A2}
  return A2, cache

In [35]:
def calculate_cost(A2, Y):
   return np.mean((A2-Y)**2)/2

In [36]:
def backward_prop(X, Y, cache, parameters):
  A1 = cache["A1"]
  A2 = cache["A2"]

  W2 = parameters["W2"]

  dZ2 = A2 - Y
  dW2 = np.matmul(dZ2, A1.T)/m
  db2 = np.sum(dZ2, axis=1, keepdims=True)/m
  dZ1 = np.multiply(np.matmul(W2.T, dZ2), 1-np.power(A1, 2))
  dW1 = np.matmul(dZ1, X.T)/m
  db1 = np.sum(dZ1, axis=1, keepdims=True)/m

  grads = {
    "dW1": dW1,
    "db1": db1,
    "dW2": dW2,
    "db2": db2
  }

  return grads

In [37]:
def update_parameters(parameters, grads, learning_rate):
  W1 = parameters["W1"]
  b1 = parameters["b1"]
  W2 = parameters["W2"]
  b2 = parameters["b2"]

  dW1 = grads["dW1"]
  db1 = grads["db1"]
  dW2 = grads["dW2"]
  db2 = grads["db2"]

  W1 = W1 - learning_rate*dW1
  b1 = b1 - learning_rate*db1
  W2 = W2 - learning_rate*dW2
  b2 = b2 - learning_rate*db2
  
  new_parameters = {
    "W1": W1,
    "W2": W2,
    "b1" : b1,
    "b2" : b2
  }

  return new_parameters

In [46]:
def model(X, Y, input_dim, hidden_dim, output_dim, num_of_iters, learning_rate):

  parameters = initialize_parameters(input_dim, hidden_dim, output_dim)
  for i in range(0, num_of_iters+1):
    a2, cache = forward_prop(X, parameters)
    cost = calculate_cost(a2, Y)
    grads = backward_prop(X, Y, cache, parameters)
    parameters = update_parameters(parameters, grads, learning_rate)
    if(i%100 == 0):
      print('Epoch # {:d}: costo={:f}'.format(i, cost))
  return parameters

In [47]:
def predict(X, parameters):
  a2, cache = forward_prop(X, parameters)
  yhat = a2
  yhat = np.squeeze(yhat)
  if(yhat >= 0.5):
    y_predict = 1
  else:
    y_predict = 0

  return y_predict

In [48]:
trained_parameters = model(X, Y,input_dim, hidden_dim, output_dim, num_of_iters, learning_rate)

Epoch # 0: costo=0.126839
Epoch # 100: costo=0.125226
Epoch # 200: costo=0.124821
Epoch # 300: costo=0.124138
Epoch # 400: costo=0.122411
Epoch # 500: costo=0.118199
Epoch # 600: costo=0.109758
Epoch # 700: costo=0.097807
Epoch # 800: costo=0.086302
Epoch # 900: costo=0.075290
Epoch # 1000: costo=0.057216


In [42]:
# Test
X_test = np.array([[1], [1]])
y_predict = predict(X_test, trained_parameters)
print('Predicción para ({:d}, {:d}) is {:d}'.format(
    X_test[0][0], X_test[1][0], y_predict))

Prediccion para (1, 1) is 0
