Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cartpole learning(continuous) : RoboschoolInvertedPendulum-v1 #23

Closed
CUN-bjy opened this issue Dec 31, 2020 · 2 comments
Closed

cartpole learning(continuous) : RoboschoolInvertedPendulum-v1 #23

CUN-bjy opened this issue Dec 31, 2020 · 2 comments
Assignees

Comments

@CUN-bjy
Copy link
Owner

CUN-bjy commented Dec 31, 2020

reinforcement learning
in RoboschoolInvertedPendulum-v1 environment

@CUN-bjy CUN-bjy changed the title cartpole learning(continuous) cartpole learning(continuous) : RoboschoolInvertedPendulum-v1 Dec 31, 2020
@CUN-bjy CUN-bjy added this to To do in gym-ddpg 1.0 via automation Dec 31, 2020
@CUN-bjy
Copy link
Owner Author

CUN-bjy commented Dec 31, 2020

experiment specifications
hyperparameter:

  • buffer_size = 20000, batch_size = 64
  • prioritized buffer : False
  • learning_rate: 1e-4,1e-3 for actor, critic
  • tau(target update rate): 1e-3,1e-3 for actor, critic
  • network
    • actor:
        # input layer(observations)
        input_ = Input(shape=self.obs_dim)
      
        # hidden layer 1
        h1_ = Dense(24,kernel_initializer=GlorotNormal())(input_)
        h1_b = BatchNormalization()(h1_)
        h1 = Activation('relu')(h1_b)
      
        # hidden_layer 2
        h2_ = Dense(16,kernel_initializer=GlorotNormal())(h1)
        h2_b = BatchNormalization()(h2_)
        h2 = Activation('relu')(h2_b)
      
        # output layer(actions)
        output_ = Dense(self.act_dim,kernel_initializer=GlorotNormal())(h2)
        output_b = BatchNormalization()(output_)
        output = Activation('tanh')(output_b)
        scalar = self.act_range * np.ones(self.act_dim)
        out = Lambda(lambda i: i * scalar)(output)
    • critic
        # input layer(observations and actions)
        input_obs = Input(shape=self.obs_dim)
        input_act = Input(shape=(self.act_dim,))
        inputs = [input_obs,input_act]
        concat = Concatenate(axis=-1)(inputs)
      
        # hidden layer 1
        h1_ = Dense(24, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(concat)
        h1_b = BatchNormalization()(h1_)
        h1 = Activation('relu')(h1_b)
      
        # hidden_layer 2
        h2_ = Dense(16, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h1)
        h2_b = BatchNormalization()(h2_)
        h2 = Activation('relu')(h2_b)
      
        # output layer(actions)
        output_ = Dense(1, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h2)
        output_b = BatchNormalization()(output_)
        output = Activation('linear')(output_b)

Performance

reward 2020-12-31 14-15-58

Critic Loss

critic_loss 2020-12-31 14-10-51

Results

continuous_cartpole 2020-12-31 14-13
continuous_cartpole2 2020-12-31 14-14

@CUN-bjy CUN-bjy moved this from To do to In progress in gym-ddpg 1.0 Dec 31, 2020
@CUN-bjy
Copy link
Owner Author

CUN-bjy commented Dec 31, 2020

training time : about 5 hours(2500 epi)
on intel i7 cpu

@CUN-bjy CUN-bjy self-assigned this Dec 31, 2020
@CUN-bjy CUN-bjy closed this as completed Dec 31, 2020
gym-ddpg 1.0 automation moved this from In progress to Done Dec 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
gym-ddpg 1.0
  
Done
Development

No branches or pull requests

1 participant