Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

第7章-DQN算法 训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0) #30

Closed
horacehht opened this issue Feb 11, 2023 · 8 comments

Comments

@horacehht
Copy link

在运行第5个代码块时,报出错误ValueError: expected sequence of length 4 at dim 2 (got 0),完整提示如下:
`ValueError Traceback (most recent call last)
f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in <cell line: 26>()
31 done = False
32 while not done:
---> 33 action = agent.take_action(state) # 根据状态state作出动作action
34 next_state, reward, done, _ = env.step(action) # 实际探索
35 replay_buffer.add(state, action, reward, next_state, done) # 将(状态,动作,奖励,下个状态,done)放入缓冲池

f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in DQN.take_action(self, state)
22 action = np.random.randint(self.action_dim)
23 else:# 利用
---> 24 state = torch.tensor([state], dtype=torch.float).to(self.device)
25 action = self.q_net(state).argmax().item() # 选取分数最大的那个动作,然后item函数转为python内置的数字类型
26 return action

ValueError: expected sequence of length 4 at dim 2 (got 0)`
我只是添加了注释,并没有修改任何代码。按道理来说,作为一本教科书,除非是版本问题导致的,否则不应该出现这种错误

@horacehht
Copy link
Author

本人使用的gym版本是0.26.2

@horacehht
Copy link
Author

horacehht commented Feb 11, 2023

经过查阅官方文档及debug,确认是gym版本变更导致的错误。
原因:gym中的env对象的reset方法,step方法的返回值作了改动。
解决方案:
将第5个代码块中的state = env.reset()修改为state = env.reset()[0],同时将next_state, reward, done, _ = env.step(action)修改为next_state, reward, done, _, __ = env.step(action)
修改后即可进行训练。

解决问题的过程:
检查时发现state = env.reset()处,state的类型为tuple,第一个元素为array,第二个为dict,如(array([-0.0344371 , -0.01493822, -0.01339062, -0.02076969], dtype=float32), {})
因此将state = env.reset()修改为state = env.reset()[0]即可解决上述问题(输入格式)。

但是这样修改之后,会出现新的错误,ValueError: too many values to unpack (expected 4),完整提示如下:

`ValueError                                Traceback (most recent call last)
f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 6 in <cell line: 26>()
     32 while not done:
     33     action = agent.take_action(state)  # 根据状态state作出动作action
---> 34     next_state, reward, done, _ = env.step(action)  # 实际探索
     35     replay_buffer.add(state, action, reward, next_state, done)  # 将(状态,动作,奖励,下个状态,done)放入缓冲池
     36     state = next_state   

ValueError: too many values to unpack (expected 4)`

经过检查env.step(action)输出的东西,发现输出了5个,而=号左边是4个,因此导致了错误。所以,将next_state, reward, done, _ = env.step(action)修改为next_state, reward, done, _, __ = env.step(action)。即可解决此问题

@Shengqi-Kong
Copy link

很棒!也解决了我的问题!

@ZhouXiinlei
Copy link

感谢,帮助很大

@zhyantao
Copy link

在运行第5个代码块时,报出错误ValueError: expected sequence of length 4 at dim 2 (got 0),完整提示如下: `ValueError Traceback (most recent call last) f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in <cell line: 26>() 31 done = False 32 while not done: ---> 33 action = agent.take_action(state) # 根据状态state作出动作action 34 next_state, reward, done, _ = env.step(action) # 实际探索 35 replay_buffer.add(state, action, reward, next_state, done) # 将(状态,动作,奖励,下个状态,done)放入缓冲池

f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in DQN.take_action(self, state) 22 action = np.random.randint(self.action_dim) 23 else:# 利用 ---> 24 state = torch.tensor([state], dtype=torch.float).to(self.device) 25 action = self.q_net(state).argmax().item() # 选取分数最大的那个动作,然后item函数转为python内置的数字类型 26 return action

ValueError: expected sequence of length 4 at dim 2 (got 0)` 我只是添加了注释,并没有修改任何代码。按道理来说,作为一本教科书,除非是版本问题导致的,否则不应该出现这种错误

state = env.reset() 改为 state, info = env.reset() 可以解决这个报错。

@jennie1124
Copy link

感谢,很用帮助!

@lovewinner
Copy link

@horacehht 感谢!!

@ialwayshungry
Copy link

thanks!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants