第7章-DQN算法训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0) #30

horacehht · 2023-02-11T13:15:29Z

在运行第5个代码块时，报出错误ValueError: expected sequence of length 4 at dim 2 (got 0)，完整提示如下：
`ValueError Traceback (most recent call last)
f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in <cell line: 26>()
31 done = False
32 while not done:
---> 33 action = agent.take_action(state) # 根据状态state作出动作action
34 next_state, reward, done, _ = env.step(action) # 实际探索
35 replay_buffer.add(state, action, reward, next_state, done) # 将(状态，动作，奖励，下个状态，done)放入缓冲池

f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in DQN.take_action(self, state)
22 action = np.random.randint(self.action_dim)
23 else:# 利用
---> 24 state = torch.tensor([state], dtype=torch.float).to(self.device)
25 action = self.q_net(state).argmax().item() # 选取分数最大的那个动作，然后item函数转为python内置的数字类型
26 return action

ValueError: expected sequence of length 4 at dim 2 (got 0)`
我只是添加了注释，并没有修改任何代码。按道理来说，作为一本教科书，除非是版本问题导致的，否则不应该出现这种错误

horacehht · 2023-02-11T13:23:05Z

本人使用的gym版本是0.26.2

horacehht · 2023-02-11T13:47:37Z

经过查阅官方文档及debug，确认是gym版本变更导致的错误。
原因：gym中的env对象的reset方法，step方法的返回值作了改动。
解决方案：
将第5个代码块中的state = env.reset()修改为state = env.reset()[0]，同时将next_state, reward, done, _ = env.step(action)修改为next_state, reward, done, _, __ = env.step(action)。
修改后即可进行训练。

解决问题的过程：
检查时发现state = env.reset()处，state的类型为tuple，第一个元素为array，第二个为dict，如(array([-0.0344371 , -0.01493822, -0.01339062, -0.02076969], dtype=float32), {})
因此将state = env.reset()修改为state = env.reset()[0]即可解决上述问题（输入格式）。

但是这样修改之后，会出现新的错误，ValueError: too many values to unpack (expected 4)，完整提示如下：

`ValueError                                Traceback (most recent call last)
f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 6 in <cell line: 26>()
     32 while not done:
     33     action = agent.take_action(state)  # 根据状态state作出动作action
---> 34     next_state, reward, done, _ = env.step(action)  # 实际探索
     35     replay_buffer.add(state, action, reward, next_state, done)  # 将(状态，动作，奖励，下个状态，done)放入缓冲池
     36     state = next_state   

ValueError: too many values to unpack (expected 4)`

经过检查env.step(action)输出的东西，发现输出了5个，而=号左边是4个，因此导致了错误。所以，将next_state, reward, done, _ = env.step(action)修改为next_state, reward, done, _, __ = env.step(action)。即可解决此问题

Shengqi-Kong · 2023-04-08T07:52:00Z

很棒！也解决了我的问题！

ZhouXiinlei · 2023-04-23T15:17:26Z

感谢，帮助很大

zhyantao · 2023-05-18T06:44:04Z

在运行第5个代码块时，报出错误ValueError: expected sequence of length 4 at dim 2 (got 0)，完整提示如下： `ValueError Traceback (most recent call last) f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in <cell line: 26>() 31 done = False 32 while not done: ---> 33 action = agent.take_action(state) # 根据状态state作出动作action 34 next_state, reward, done, _ = env.step(action) # 实际探索 35 replay_buffer.add(state, action, reward, next_state, done) # 将(状态，动作，奖励，下个状态，done)放入缓冲池

f:\Codefield\jupyter_workspace\deeplearn\rl\Hands-on-RL\第7章-DQN算法.ipynb 单元格 5 in DQN.take_action(self, state) 22 action = np.random.randint(self.action_dim) 23 else:# 利用 ---> 24 state = torch.tensor([state], dtype=torch.float).to(self.device) 25 action = self.q_net(state).argmax().item() # 选取分数最大的那个动作，然后item函数转为python内置的数字类型 26 return action

ValueError: expected sequence of length 4 at dim 2 (got 0)` 我只是添加了注释，并没有修改任何代码。按道理来说，作为一本教科书，除非是版本问题导致的，否则不应该出现这种错误

把 state = env.reset() 改为 state, info = env.reset() 可以解决这个报错。

jennie1124 · 2023-05-24T06:47:26Z

感谢，很用帮助！

lovewinner · 2023-09-07T03:32:30Z

@horacehht 感谢！！

ialwayshungry · 2023-09-17T12:20:08Z

thanks!!!

horacehht closed this as completed Feb 11, 2023

zhyantao mentioned this issue May 18, 2023

ValueError: expected sequence of length 3 at dim 2 (got 0) #31

Open

SHTechBoBo mentioned this issue Mar 11, 2024

CartPole-v0环境训练reward超过上限值200？ #72

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

第7章-DQN算法训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0) #30

第7章-DQN算法训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0) #30

horacehht commented Feb 11, 2023

horacehht commented Feb 11, 2023

horacehht commented Feb 11, 2023 •

edited

Loading

Shengqi-Kong commented Apr 8, 2023

ZhouXiinlei commented Apr 23, 2023

zhyantao commented May 18, 2023

jennie1124 commented May 24, 2023

lovewinner commented Sep 7, 2023

ialwayshungry commented Sep 17, 2023

第7章-DQN算法 训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0) #30

第7章-DQN算法 训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0) #30

Comments

horacehht commented Feb 11, 2023

horacehht commented Feb 11, 2023

horacehht commented Feb 11, 2023 • edited Loading

Shengqi-Kong commented Apr 8, 2023

ZhouXiinlei commented Apr 23, 2023

zhyantao commented May 18, 2023

jennie1124 commented May 24, 2023

lovewinner commented Sep 7, 2023

ialwayshungry commented Sep 17, 2023

第7章-DQN算法训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0) #30

第7章-DQN算法训练时报出错误 ValueError: expected sequence of length 4 at dim 2 (got 0) #30

horacehht commented Feb 11, 2023 •

edited

Loading