<a href="https://colab.research.google.com/github/AmadeusEsparza/Data-Science/blob/main/Red_Neuronal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

El código que veremos a continuación, contiene los siguientes elementos clave:

**Red Neuronal:** Tiene dos capas densas ocultas de 24 neuronas con activación ReLU. La salida es lineal y predice valores Q para cada acción posible.

**Memoria de Repetición:** Se utiliza un búfer de memoria para almacenar transiciones (estado, acción, recompensa, siguiente estado, done). Esto ayuda a entrenar el modelo de manera más eficiente mediante muestras aleatorias.

**Epsilon-Greedy:** La política explora acciones aleatorias con probabilidad epsilon y gradualmente reduce esta exploración.

**Factor de Descuento (gamma):** Asegura que las recompensas a largo plazo se consideren menos importantes que las inmediatas.


In [1]:
import gym
import random
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import Adam
import random
from collections import deque

env = gym.make('CartPole-v1')

#Paràmetros
state_size = env.observation_space.shape[0]
action_size = env.action_space.n
batch_size = 32
n_episodes = 100
gamma = 0.95
epsilon = 1.0
epsilon_decay = 0.995
epsilon_min = 0.01
learning_rate = 0.001
memory = deque(maxlen=1000)

# Construir red neuronal
def build_model():
  model = Sequential()
  model.add(Dense(24, input_dim=state_size, activation='relu'))
  model.add(Dense(24, activation='relu'))
  model.add(Dense(action_size, activation='linear'))
  model.compile(loss='mse', optimizer=Adam(learning_rate=learning_rate))
  return model

model = build_model()

#Función para elegir la acción

def choose_action(state):
  if np.random.rand() <= epsilon:
    return random.randrange(action_size)
  q_values = model.predict(state, verbose=0)
  return np.argmax(q_values[0])

# Entrenamiento del modelo
def replay():
  global epsilon
  if len(memory) < batch_size:
    return
  minibatch = random.sample(memory, batch_size)
  for state, action, reward, next_state, done in minibatch:
    target = reward
    if not done:
      target = reward + gamma * np.amax(model.predict(next_state, verbose=0)[0])
    target_f = model.predict(state, verbose=0)
    target_f[0][action] = target
    model.fit(state, target_f, epochs=1, verbose=0)
  if epsilon > epsilon_min:
    epsilon *= epsilon_decay

# Entrenamiento por episodios
for episode in range(n_episodes):
    state = env.reset()
    state = np.reshape(state, [1, state_size])
    total_reward = 0
    done = False

    while not done:
        action = choose_action(state)
        next_state, reward, done, _ = env.step(action)
        total_reward += reward
        next_state = np.reshape(next_state, [1, state_size])
        memory.append((state, action, reward, next_state, done))
        state = next_state

        if done:
            print(f"Episode {episode + 1}/{n_episodes}, Total Reward: {total_reward}")
            break

    replay()

env.close()

  from jax import xla_computation as _xla_computation
  deprecation(
  deprecation(
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
  if not isinstance(terminated, (bool, np.bool8)):


Episode 1/100, Total Reward: 29.0
Episode 2/100, Total Reward: 24.0
Episode 3/100, Total Reward: 20.0
Episode 4/100, Total Reward: 13.0
Episode 5/100, Total Reward: 34.0
Episode 6/100, Total Reward: 19.0
Episode 7/100, Total Reward: 13.0
Episode 8/100, Total Reward: 17.0
Episode 9/100, Total Reward: 9.0
Episode 10/100, Total Reward: 32.0
Episode 11/100, Total Reward: 24.0
Episode 12/100, Total Reward: 40.0
Episode 13/100, Total Reward: 30.0
Episode 14/100, Total Reward: 26.0
Episode 15/100, Total Reward: 16.0
Episode 16/100, Total Reward: 16.0
Episode 17/100, Total Reward: 14.0
Episode 18/100, Total Reward: 13.0
Episode 19/100, Total Reward: 18.0
Episode 20/100, Total Reward: 9.0
Episode 21/100, Total Reward: 15.0
Episode 22/100, Total Reward: 30.0
Episode 23/100, Total Reward: 19.0
Episode 24/100, Total Reward: 34.0
Episode 25/100, Total Reward: 10.0
Episode 26/100, Total Reward: 20.0
Episode 27/100, Total Reward: 18.0
Episode 28/100, Total Reward: 17.0
Episode 29/100, Total Reward: 1

Cada neurona va a ajustar sus pesos de tal manera que buscara la recompensa más alta