td3_implementation analysis #10

CUN-bjy · 2021-01-28T04:12:03Z

the first TD3 implementation do not work well..

so.. have to analysis each part of the differences from ddpg

CUN-bjy · 2021-01-28T04:14:02Z

[TEST 1]

rebase to ddpg
add target policy smoothing term.
test on RoboschoolInvertedPendulum-v1.
batch_size -> 64, hidden_layers -> 24, 16(for each actor and critic)
lr-> 1e-4,1e-3, tau->1e-3,1e-3(for each actor and critic)
it works.(cause of exploration problem, I tested 5 times more.. success rate is under 30%.)
✔️

[TEST 2]

basically, tested on InvertedPendulum, almost same to above parameters and added target policy smoothing.
change parameters, lr->3e-4,3e-4, tau->5e-3,5e-3 (same to original td3 code)
it works.(also have exploration problem, success rate is not high.)
✔️

[TEST 3]

add delayed policy update term.
delayed update peoriod -> 2
it works.(have exploration problem, success rate is too low, and feel like delayed to learn a good policy)
✔️

CUN-bjy · 2021-01-28T06:19:30Z

[TEST 4]

rebase to ddpg
add target policy smoothing term.
test on RoboschoolInvertedPendulum-v1.
batch_size -> 64, hidden_layers -> 24, 16(for each actor and critic)
lr->3e-4,3e-4, tau->5e-3,5e-3 (for each actor and critic)
(reset policy update interval to 1)
add double cliped Q update term
but only use Q1 for target update
it works.
✔️

[TEST 5]

same to above experiment.
use Q1,Q2 for target update
it doesn't work well..

[TEST 6]

test on RoboschoolInvertedPendulum-v1.
TD3 set.
add target policy smoothing term.
add delayed policy update term. update_interval -> 2
add double cliped Q update term
batch_size -> 64, hidden_layers -> 24, 16(for each actor and critic)
lr->3e-4,3e-4, tau->5e-3,5e-3 (for each actor and critic)
also doesn't work well..(take long time to learn the policy and getting worse)

👀 I think the my implementation of `double cliped q update` has some problems.

CUN-bjy · 2021-01-30T01:16:59Z

[TEST 7]

test on RoboschoolInvertedPendulum-v1.
add double cliped Q update term (only)
batch_size -> 64, hidden_layers -> 24, 16(for each actor and critic)
lr->3e-4,3e-4, tau->5e-3,5e-3 (for each actor and critic)
yes.. doesn't work.. Catastrophic forgetting problem happen, even in one task.

[TEST 8]

same condition to above.
changed some codes that make independent optimizer for critics
I think that is working, but very slow compared to simple ddpg.
by double cliped q update overestimation problem is fixed, but that make policy update so slow.
should try integrated system and than should change some parameters..

CUN-bjy · 2021-02-07T00:18:59Z

[TEST 9]

integrate that all.. -> doesn't work..

CUN-bjy · 2021-02-07T01:07:51Z

[TEST 10]

initial random policy added for exploration
and there's some mistake..

before

		a = agent.make_action(obs,t)
		action = np.argmax(a) if is_discrete else a

		# do step on gym at t-step
		new_obs, reward, done, info = env.step(action) 

		# store the results to buffer	
		agent.memorize(obs, a, reward, done, new_obs) 
                # should've memorize action w/ noise!!

after

		a = agent.make_action(obs,t)
		action = np.argmax(a) if is_discrete else a

		# do step on gym at t-step
		new_obs, reward, done, info = env.step(action) 

		# store the results to buffer	
		agent.memorize(obs, action, reward, done, new_obs)

but, consequently, doesn't work..

CUN-bjy · 2021-02-24T13:59:32Z

[TEST 11]

use OU noise process for off-policy exploration strategy

also, doesn't work.

CUN-bjy · 2021-02-25T01:52:44Z

[TEST 12]

use OU noise process for off-policy exploration strategy
without BatchNormalization, Weight Regularization, Groot Initializer on actor and critic.

CUN-bjy created this issue from a note in gym-td3-keras (In progress) Jan 28, 2021

CUN-bjy self-assigned this Jan 28, 2021

CUN-bjy added the bug Something isn't working label Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

td3_implementation analysis #10

td3_implementation analysis #10

CUN-bjy commented Jan 28, 2021

CUN-bjy commented Jan 28, 2021 •

edited

Loading

CUN-bjy commented Jan 28, 2021 •

edited

Loading

CUN-bjy commented Jan 30, 2021 •

edited

Loading

CUN-bjy commented Feb 7, 2021 •

edited

Loading

CUN-bjy commented Feb 7, 2021 •

edited

Loading

CUN-bjy commented Feb 24, 2021 •

edited

Loading

CUN-bjy commented Feb 25, 2021

td3_implementation analysis #10

td3_implementation analysis #10

Comments

CUN-bjy commented Jan 28, 2021

CUN-bjy commented Jan 28, 2021 • edited Loading

CUN-bjy commented Jan 28, 2021 • edited Loading

👀 I think the my implementation of double cliped q update has some problems.

CUN-bjy commented Jan 30, 2021 • edited Loading

CUN-bjy commented Feb 7, 2021 • edited Loading

CUN-bjy commented Feb 7, 2021 • edited Loading

CUN-bjy commented Feb 24, 2021 • edited Loading

CUN-bjy commented Feb 25, 2021

CUN-bjy commented Jan 28, 2021 •

edited

Loading

CUN-bjy commented Jan 28, 2021 •

edited

Loading

👀 I think the my implementation of `double cliped q update` has some problems.

CUN-bjy commented Jan 30, 2021 •

edited

Loading

CUN-bjy commented Feb 7, 2021 •

edited

Loading

CUN-bjy commented Feb 7, 2021 •

edited

Loading

CUN-bjy commented Feb 24, 2021 •

edited

Loading