cartpole learning(discrete) #16

CUN-bjy · 2020-11-28T14:59:25Z

20.12.29

poor performance even in cartpole..
get back to the original. something wrong..

CUN-bjy · 2020-12-30T17:25:18Z

20.12.31
Finally, did it! -> https://github.com/CUN-bjy/gym-ddpg-keras/tree/cartpole-v1

changes for solving the problem

output scale(**) calculation in actor

scalar = self.act_range * np.ones(self.act_dim)
out = Lambda(lambda i: i * scalar)(output)

experiment specifications
hyperparameter:

buffer_size = 20000, batch_size = 64
prioritized buffer : False
learning_rate: 1e-3,1e-2 for actor, critic
tau(target update rate): 1e-2,1e-2 for actor, critic

network

actor:

  # input layer(observations)
  input_ = Input(shape=self.obs_dim)

  # hidden layer 1
  h1_ = Dense(24,kernel_initializer=GlorotNormal())(input_)
  h1_b = BatchNormalization()(h1_)
  h1 = Activation('relu')(h1_b)

  # hidden_layer 2
  h2_ = Dense(16,kernel_initializer=GlorotNormal())(h1)
  h2_b = BatchNormalization()(h2_)
  h2 = Activation('relu')(h2_b)

  # output layer(actions)
  output_ = Dense(self.act_dim,kernel_initializer=GlorotNormal())(h2)
  output_b = BatchNormalization()(output_)
  output = Activation('tanh')(output_b)
  scalar = self.act_range * np.ones(self.act_dim)
  out = Lambda(lambda i: i * scalar)(output)

critic

  # input layer(observations and actions)
  input_obs = Input(shape=self.obs_dim)
  input_act = Input(shape=(self.act_dim,))
  inputs = [input_obs,input_act]
  concat = Concatenate(axis=-1)(inputs)

  # hidden layer 1
  h1_ = Dense(24, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(concat)
  h1_b = BatchNormalization()(h1_)
  h1 = Activation('relu')(h1_b)

  # hidden_layer 2
  h2_ = Dense(16, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h1)
  h2_b = BatchNormalization()(h2_)
  h2 = Activation('relu')(h2_b)

  # output layer(actions)
  output_ = Dense(1, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h2)
  output_b = BatchNormalization()(output_)
  output = Activation('linear')(output_b)

performance(total_reward)

critic_loss

CUN-bjy · 2020-12-30T18:06:44Z

the exploration when the agent is baby is the most important thing.

CUN-bjy · 2020-12-31T05:20:50Z

training time : about 1hour.
on intel i7 cpu.

CUN-bjy created this issue from a note in gym-ddpg 1.0 (In progress) Nov 28, 2020

CUN-bjy self-assigned this Nov 28, 2020

CUN-bjy changed the title ~~cartpole learning & comparisons~~ cartpole learning(discrete) Dec 30, 2020

CUN-bjy closed this as completed Dec 31, 2020

gym-ddpg 1.0 automation moved this from In progress to Done Dec 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cartpole learning(discrete) #16

cartpole learning(discrete) #16

CUN-bjy commented Nov 28, 2020 •

edited

Loading

CUN-bjy commented Dec 30, 2020 •

edited

Loading

CUN-bjy commented Dec 30, 2020

CUN-bjy commented Dec 31, 2020

cartpole learning(discrete) #16

cartpole learning(discrete) #16

Comments

CUN-bjy commented Nov 28, 2020 • edited Loading

20.12.29

CUN-bjy commented Dec 30, 2020 • edited Loading

performance(total_reward)

critic_loss

CUN-bjy commented Dec 30, 2020

CUN-bjy commented Dec 31, 2020

CUN-bjy commented Nov 28, 2020 •

edited

Loading

CUN-bjy commented Dec 30, 2020 •

edited

Loading