Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLoader读取自定义数据出错 #31217

Closed
yeyupiaoling opened this issue Feb 25, 2021 · 9 comments
Closed

DataLoader读取自定义数据出错 #31217

yeyupiaoling opened this issue Feb 25, 2021 · 9 comments
Assignees

Comments

@yeyupiaoling
Copy link
Contributor

  • PaddlePaddle 2.0.0
  • python 3.7
class PPASRDataset(Dataset):
    def __init__(self, data_list, dict_path):
        super(PPASRDataset, self).__init__()
        # 获取数据列表
        with open(data_list) as f:
            idx = f.readlines()
        self.idx = [x.strip().split(",", 1) for x in idx]
        # 加载数据字典
        with open(dict_path) as f:
            labels = eval(f.read())
        self.labels = dict([(labels[i], i) for i in range(len(labels))])

    def __getitem__(self, idx):
        # 分割音频路径和标签
        wav_path, transcript = self.idx[idx]
        # 读取音频并转换为短时傅里叶变换
        wav = load_audio(wav_path)
        stft = audio_to_stft(wav)
        # 将字符标签转换为int数据
        transcript = list(filter(None, [self.labels.get(x) for x in transcript]))
        return stft, transcript

    def __len__(self):
        return len(self.idx)
from paddle.io import DataLoader
from utils.data import PPASRDataset

dataset = PPASRDataset("dataset/manifest.train", "dataset/zh_vocab.json")
loader = DataLoader(dataset, batch_size=8, collate_fn=None, num_workers=1, shuffle=False)

# 这个是可以正常读取
for p, l in dataset:
    print(p.shape)

# 当时封装到这里就报错 了,
for i, (x, y, x_lens, y_lens) in enumerate(loader()):
    print(x.shape)
    print(y.shape)
    print(x_lens.shape)
    print(y_lens.shape)

错误信息:

ERROR:root:DataLoader reader thread raised an exception!
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    for i, (x, y, x_lens, y_lens) in enumerate(loader()):
  File "/home/PaddlePaddle2/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 788, in __next__
    data = self._reader.read_next_var_list()
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:158)

@paddle-bot-old
Copy link

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@wangchaochaohu
Copy link
Contributor

建议参考 PaddlePaddle/PaddleOCR#227#25569

@heavengate
Copy link
Contributor

num_workers设置成0看下报什么错呢

@yeyupiaoling
Copy link
Contributor Author

@heavengate 还是一样的错误。

WARNING:root:DataLoader reader thread raised an exception.
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    for i, (x, y, x_lens, y_lens) in enumerate(loader()):
  File "/home/envs/PaddlePaddle2/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 365, in __next__
    return self._reader.read_next_var_list()
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
  [Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:158)

@yeyupiaoling
Copy link
Contributor Author

@wangchaochaohu 首先排除是文件读取错误,因为我没有使用DataLoader封装之前,数据是能够正常读取的。

@yeyupiaoling
Copy link
Contributor Author

@heavengate DataLoader这个封装器有什么特殊的要求吗?我的数据长度是不一样的。

@yeyupiaoling
Copy link
Contributor Author

@heavengate 切片赋值,正常可以赋值,但是放在DataLoadercollate_fn函数中就不能正常赋值了。

inputs = paddle.zeros((2, 3), dtype='float32')
tensor = paddle.ones((2, 2), dtype='float32')

inputs[:, :2] = tensor[:, :]
print(inputs)
inputs[x][:, :seq_length] = tensor[:, :]

如下输出,第二个Tensor的前面部分应该是第一个Tensor的,但结果不是。

Tensor(shape=[161, 588], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
       [[-0.41342652,  1.27590382,  2.37738037, ...,  3.25610328,  5.00926352,  4.99169397],
        [-0.34888825,  0.71066976,  2.02860641, ...,  3.51251411,  4.55912304,  3.41741419],
        [ 0.23867491,  0.30086622,  1.16742933, ...,  2.12231445,  2.66486526,  3.17759562],
        ...,
        [-0.79125428, -0.82575482, -0.82424164, ..., -0.79962283, -0.81947351, -0.82025945],
        [-0.82957345, -0.81708139, -0.84568805, ..., -0.82432973, -0.81169850, -0.78284645],
        [-0.84403110, -0.83519691, -0.82905120, ..., -0.77519333, -0.81731087, -0.76279652]])
Tensor(shape=[161, 608], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
       [[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]])

@yeyupiaoling
Copy link
Contributor Author

是我前面的tensor处理错了,

@paddle-bot-old
Copy link

Are you satisfied with the resolution of your issue?

YES
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants