Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File read failed 에러 관련 문의입니다. #223

Open
snu-njc opened this issue Aug 14, 2023 · 1 comment
Open

File read failed 에러 관련 문의입니다. #223

snu-njc opened this issue Aug 14, 2023 · 1 comment

Comments

@snu-njc
Copy link

snu-njc commented Aug 14, 2023

baby varnet의 코드와 동일한 dataloader를 사용하여 데이터를 읽는 중
/Data/train/kspace/brain_acc8_63.h5 파일을 읽을 때 OSError: [Errno 5] Can't read data 에러가 발생함을 확인했습니다.
Workspace를 다시 생성하여도 동일한 파일에서 에러가 발생합니다.

현재 iabeng61, iabeng27 두 개의 gpu를 사용 중인데
같은 코드를 실행시킴에도 전자에서는 에러가 나지 않는 것을 보아 iabeng27 노드의 문제라고 추측 중입니다.
문제 해결에 도움을 주시면 감사하겠습니다.
아래는 에러가 발생할 때의 traceback 전문입니다.

Traceback (most recent call last):
  File "train.py", line 55, in <module>
    train(args)
  File "/root/FastMRI_challenge/utils/learning/train_part.py", line 122, in train
    train_loss, train_time = train_epoch(args, epoch, train_step_fn, train_loader, state)
  File "/root/FastMRI_challenge/utils/learning/train_part.py", line 36, in train_epoch
    for iter, data in enumerate(data_loader):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 673, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/FastMRI_challenge/utils/data/load_data.py", line 52, in __getitem__
    input = hf[self.input_key][dataslice]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/usr/local/lib/python3.8/dist-packages/h5py/_hl/dataset.py", line 841, in __getitem__
    self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 243, in h5py.h5d.DatasetID.read
  File "h5py/_proxy.pyx", line 112, in h5py._proxy.dset_rw
OSError: [Errno 5] Can't read data (file read failed: time = Mon Aug 14 14:56:34 2023
, filename = '/Data/train/kspace/brain_acc8_63.h5', file descriptor = 25, errno = 5, error message = 'Input/output error', buf = 0x7fd7d6ccb1e0, total read size = 37363248, bytes this sub-read = 37363248, bytes actually read = 18446744073709551615, offset = 0)
@wogur110
Copy link
Collaborator

안녕하세요. 2023 SNU FastMRI Challenge 조교 배재혁입니다.
해당 노드의 kspace 데이터셋 배포 과정에서 데이터셋의 오류가 난 것으로 추측하고 있습니다.
데이터를 재배포 중이니 (8월 15일 1:15am) 8월 15일 오전 이후에 한 번 확인해주시면 감사하겠습니다.

감사합니다.
배재혁 드림

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants