CLASE 1
* Resolver el problema de la XOR con 3 neuronas:

1. Implementar SGD para resolver la XOR.
* > Resuelto 
2. ¿Cuántos parámetros desconocidos tiene el modelo?
* > El modelo tiene 9 (nueve) parametros desconocidos. Tres por cada neurona. o*(m+1), con o=3 y m=2
3. ¿Cuáles son los hiper-parámetros del modelo? Explicar que pasa cuando se
varía el learning rate.
* > **Hiperparametros** 
    *    Del lado de arquitectura: cantidad de layers y cantidad de neuronas por layer. 
    *    Del lado de gradiente descendiente: lr (learning rate)   , lr varia el paso de avance de 
    *   Del lado de optimizadores L1, L2 es el parametro gamma 
4. Una vez entrenado el modelo, realizar predicciones para verificar el
funcionamiento
* > Resuelto

a) **Implementar SGD para resolver la XOR.**

In [216]:
# Importamos las librerias 
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score

In [242]:
#Se define el dataset para XOR
X_train = np.array([[0,0],[0,1],[1,0],[1,1]])
y_train = np.array([[0],[1],[1],[0]])




In [304]:
#se define la función de activación sigmoide
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

#se define la derivada de la función de activación sigmoide
def d_sigmoid(z):
    return (1 - sigmoid(z)) * sigmoid(z)

#se define el optimizador
def softmax(z):
    exp_zi = np.exp(z)
    return exp_zi / np.sum(exp_zi, axis=1, keepdims=True)



    

In [311]:
class Layer:
    # Se inicializan los pesos W1, W2 y los sesgos b1, b2
    def __init__(self, input_dim, hidden_dim, output_dim):
        # Guardamos las dimensiones 
        self.inp_dim = input_dim
        self.hid_dim = hidden_dim
        self.out_dim = output_dim
        # Creamos la primera capa con valores aleatorios
        self.W1 = np.random.rand(self.inp_dim, self.hid_dim) / np.sqrt(self.inp_dim)
        self.b1 = np.zeros((1, self.hid_dim))
        # Creamos la capa de salida
        self.W2 = np.random.rand(self.hid_dim, self.out_dim) / np.sqrt(self.hid_dim)
        self.b2 = np.zeros((1, self.out_dim))
        #Se inicializa el cache para guardar los valores
        self.cache = None

    # Se define paso forward, se aplica la sigmoide en primer capa y softmax en la salida
    def forward(self, x):
        z1 = np.matmul(x,self.W1) + self.b1 #función lineal
        a1 = sigmoid(z1) #función de transferencia
        z2 = np.matmul(a1, self.W2) + self.b2 #función lineal
        y_hat = softmax(z2) #optimizador
        #La selida de cada forward se actualiza en cache
        self.cache = {'x0': x,'z1': z1,'a1': a1,'z2': z2,'y_hat': y_hat} #no está actualizando y_hat
        #print(y_hat)
        return y_hat


    def backward(self, y):
        #Calculamos dL/dy_hat * dy_hat/dz2
        delta3=(self.cache['z2']-y)*d_sigmoid(self.cache['z2'])  
        dW2 = np.matmul(self.cache['a1'].T, delta3)
        db2 = np.sum(delta3, axis=0)  # alternativamente: np.dot(np.ones((1, len(y))), delta3)

       
        delta2=np.dot(delta3,self.W2.T)*d_sigmoid(self.cache["z1"])
        dW1 = np.matmul(self.cache['x0'].T, delta2)
        db1 = np.sum(delta2, axis=0) 

        grad_dict = {'dW2': dW2,'db2': db2,'dW1': dW1,'db1': db1}
        #print(grad_dict)
        return grad_dict


    def mse(self,probs,y_train):
        return np.mean((probs-y_train)**2)/2
        

    def train(self, x, y, iters, lr=0.01):
        k=0
        lo=[]
        acurr=[]
        for epoch in range(1, iters+1):
            # Correr el 'forward pass'
            probs = self.forward(x) ##calculamos y_hat
            grad_dict = self.backward(probs) ##hacemos backwards usando y_hat e y_obs
            #print(probs) no se actualiza probs--siempre es un vector de unos
            # Actualizar los parametros con el gradiente descendiente (NOTA: es un pequeño paso negativo)
            self.W1 += -lr * grad_dict['dW1']
            self.b1 += -lr * grad_dict['db1']
            self.W2 += -lr * grad_dict['dW2']
            self.b2 += -lr * grad_dict['db2']

            # Obtener el loss
            loss = self.mse(probs, y)
            acc = accuracy_score(y, probs)
            lo.append(loss)
            acurr.append(acc)
            k+=1
            #print(f"Epoch {epoch} - Loss {np.mean(acurr):.5f}")#, Accuracy: {np.mean(epoch_acc):.5f}")
           

  
    def predict(self, x):
        probs = self.forward(x)
        preds = np.argmax(probs, axis=1) 
        return preds





In [312]:
model = Layer(input_dim=2, hidden_dim=3, output_dim=1)
model.train(X_train, y_train, 20000, 0.1)
"""
plt.figure() 
plt.scatter(k,acc) 
plt.xlabel("EPOCHS") 
plt.ylabel("Loss value") 
plt.show() """


[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]
[[1.]
 [1.]
 [1.]
 [1.]]


KeyboardInterrupt: 