Implementation: Logical error #9

KornbergFresnel · 2018-01-26T03:20:36Z

s_batch,a_batch,r_batch,d_batch,s2_batch = replayMemory.miniBatch(int(args['minibatch_size']))
a = []
for j in range(env.n):
	state_batch_j = np.asarray([x for x in s_batch[:,j]])
	a.append(actors[j].predict_target(state_batch_j))

	a_temp = np.transpose(np.asarray(a),(1,0,2))
	a_for_critic = np.asarray([x.flatten() for x in a_temp])
	s2_batch_i = np.asarray([x for x in s2_batch[:,i]])
	targetQ = critic.predict_target(s2_batch_i,a_for_critic)  # maybe a bug

targetQ should be calculated with s2_batch and a_next (a_next is predicted with s2_batch), while you use s_batch to get the a_for_critic to compute targetQ, so I think there has a logic error in your implementation.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation: Logical error #9

Implementation: Logical error #9

KornbergFresnel commented Jan 26, 2018 •

edited

Implementation: Logical error #9

Implementation: Logical error #9

Comments

KornbergFresnel commented Jan 26, 2018 • edited

KornbergFresnel commented Jan 26, 2018 •

edited