Skip to content
This repository has been archived by the owner on Dec 11, 2022. It is now read-only.

Categorical DQN - dimension error #484

Open
Colin1998 opened this issue May 12, 2021 · 0 comments
Open

Categorical DQN - dimension error #484

Colin1998 opened this issue May 12, 2021 · 0 comments

Comments

@Colin1998
Copy link

Hi,

I don't post issues very often, so I hope my problem is clear enough the way I present it below. When trying to train a Categorical DQN (for Batch RL, no interaction with environment), I run into the following error:

_Traceback (most recent call last):

File "", line 129, in
graph_manager.improve()

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\batch_rl_graph_manager.py", line 234, in improve
self.train()

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in train
[manager.train() for manager in self.level_managers]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in
[manager.train() for manager in self.level_managers]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in train
[agent.train() for agent in self.agents.values()]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in
[agent.train() for agent in self.agents.values()]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\agent.py", line 741, in train
total_loss, losses, unclipped_grads = self.learn_from_batch(batch)

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\categorical_dqn_agent.py", line 113, in learn_from_batch
self.q_values.add_sample(self.distribution_prediction_to_q_values(TD_targets))

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\categorical_dqn_agent.py", line 82, in distribution_prediction_to_q_values
return np.dot(prediction, self.z_values)

File "<array_function internals>", line 6, in dot

ValueError: shapes (128,2) and (51,) not aligned: 2 (dim 1) != 51 (dim 0)_

The 2 (dim 1) is the number of actions in my ActionSpace, and the 51 (dim 0) corresponds to the number of atoms set in the agent's parameters. So the error suggests that these should be of equal length, which seems strange to me. Is this indeed true? Should these be of the same length? When setting the numbers of atoms to 2 (to get rid of this error) I got the following error:

_Traceback (most recent call last):

File "", line 129, in
graph_manager.improve()

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\batch_rl_graph_manager.py", line 234, in improve
self.train()

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in train
[manager.train() for manager in self.level_managers]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in
[manager.train() for manager in self.level_managers]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in train
[agent.train() for agent in self.agents.values()]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in
[agent.train() for agent in self.agents.values()]

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\agent.py", line 741, in train
total_loss, losses, unclipped_grads = self.learn_from_batch(batch)

File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\categorical_dqn_agent.py", line 116, in learn_from_batch
target_actions = np.argmax(self.distribution_prediction_to_q_values(distributional_q_st_plus_1), axis=1)

File "<array_function internals>", line 6, in argmax

File "C:\Users\colin.conda\envs\py36\lib\site-packages\numpy\core\fromnumeric.py", line 1188, in argmax
return _wrapfunc(a, 'argmax', axis=axis, out=out)

File "C:\Users\colin.conda\envs\py36\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)

AxisError: axis 1 is out of bounds for array of dimension 1_

I tried setting the axis to zero, but this results in more complex errors, so I assumed this is not the way to go. Does anyone have a clue how I can fix this error? Any suggestions would be of great help, thanks in advance!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant