Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cartpole learning(discrete) #16

Closed
CUN-bjy opened this issue Nov 28, 2020 · 3 comments
Closed

cartpole learning(discrete) #16

CUN-bjy opened this issue Nov 28, 2020 · 3 comments
Assignees

Comments

@CUN-bjy
Copy link
Owner

CUN-bjy commented Nov 28, 2020

20.12.29

  • poor performance even in cartpole..
  • get back to the original. something wrong..
@CUN-bjy CUN-bjy created this issue from a note in gym-ddpg 1.0 (In progress) Nov 28, 2020
@CUN-bjy CUN-bjy self-assigned this Nov 28, 2020
@CUN-bjy
Copy link
Owner Author

CUN-bjy commented Dec 30, 2020

20.12.31
Finally, did it! -> https://github.com/CUN-bjy/gym-ddpg-keras/tree/cartpole-v1

changes for solving the problem

  • output scale(**) calculation in actor
    scalar = self.act_range * np.ones(self.act_dim)
    out = Lambda(lambda i: i * scalar)(output)

experiment specifications
hyperparameter:

  • buffer_size = 20000, batch_size = 64
  • prioritized buffer : False
  • learning_rate: 1e-3,1e-2 for actor, critic
  • tau(target update rate): 1e-2,1e-2 for actor, critic
  • network
    • actor:
        # input layer(observations)
        input_ = Input(shape=self.obs_dim)
      
        # hidden layer 1
        h1_ = Dense(24,kernel_initializer=GlorotNormal())(input_)
        h1_b = BatchNormalization()(h1_)
        h1 = Activation('relu')(h1_b)
      
        # hidden_layer 2
        h2_ = Dense(16,kernel_initializer=GlorotNormal())(h1)
        h2_b = BatchNormalization()(h2_)
        h2 = Activation('relu')(h2_b)
      
        # output layer(actions)
        output_ = Dense(self.act_dim,kernel_initializer=GlorotNormal())(h2)
        output_b = BatchNormalization()(output_)
        output = Activation('tanh')(output_b)
        scalar = self.act_range * np.ones(self.act_dim)
        out = Lambda(lambda i: i * scalar)(output)
    • critic
        # input layer(observations and actions)
        input_obs = Input(shape=self.obs_dim)
        input_act = Input(shape=(self.act_dim,))
        inputs = [input_obs,input_act]
        concat = Concatenate(axis=-1)(inputs)
      
        # hidden layer 1
        h1_ = Dense(24, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(concat)
        h1_b = BatchNormalization()(h1_)
        h1 = Activation('relu')(h1_b)
      
        # hidden_layer 2
        h2_ = Dense(16, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h1)
        h2_b = BatchNormalization()(h2_)
        h2 = Activation('relu')(h2_b)
      
        # output layer(actions)
        output_ = Dense(1, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h2)
        output_b = BatchNormalization()(output_)
        output = Activation('linear')(output_b)

cartpole

performance(total_reward)

reward 2020-12-31 02-22-55

critic_loss

critic_loss 2020-12-31 02-22-52

@CUN-bjy CUN-bjy changed the title cartpole learning & comparisons cartpole learning(discrete) Dec 30, 2020
@CUN-bjy
Copy link
Owner Author

CUN-bjy commented Dec 30, 2020

the exploration when the agent is baby is the most important thing.

@CUN-bjy
Copy link
Owner Author

CUN-bjy commented Dec 31, 2020

training time : about 1hour.
on intel i7 cpu.

@CUN-bjy CUN-bjy closed this as completed Dec 31, 2020
gym-ddpg 1.0 automation moved this from In progress to Done Dec 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
gym-ddpg 1.0
  
Done
Development

No branches or pull requests

1 participant