Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not get file from fileserver: /grp/g_biapol/robert/blobs.tif #4

Open
haesleinhuepf opened this issue Sep 8, 2022 · 7 comments

Comments

@haesleinhuepf
Copy link
Member

follow up issue after #3

code:

image = pft.imread("blobs.tif")

Error:

/app/env/lib/python3.10/site-packages/taurus_datamover/_datamover.py:316: UserWarning: process: /sw/taurus/tools/slurmtools/default/bin/dtcp -r /grp/g_biapol/robert/blobs.tif /scratch/ws/1/roha044c-cache/qbxwalal/blobs.tif --blocking
exited with error: srun: job 28299021 queued and waiting for resources
srun: job 28299021 has been allocated resources
srun: error: ioctl(TIOCGWINSZ): Inappropriate ioctl for device
srun: error: Not using a pseudo-terminal, disregarding --pty option
cp: cannot create regular file '/scratch/ws/1/roha044c-cache/qbxwalal/blobs.tif': No such file or directory
srun: error: taurusexport4: task 0: Exited with exit code 1
and output: 
  warnings.warn(warning_message)
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In [8], line 1
----> 1 image = pft.imread("blobs.tif")

File /app/env/lib/python3.10/site-packages/biapol_taurus/_project_file_transfer.py:76, in ProjectFileTransfer.imread(self, filename, *args, **kw)
     59 """
     60 Load an image from a file.
     61 
   (...)
     73 
     74 """
     75 from skimage.io import imread
---> 76 full_path = self.get_file(filename)
     77 return imread(str(full_path), *args, **kw)

File /app/env/lib/python3.10/site-packages/biapol_taurus/_project_file_transfer.py:558, in ProjectFileTransfer.get_file(self, filename, timeout_in_s, wait_for_finish)
    556 exit_code = waitfor(process, quiet=self.quiet)
    557 if exit_code > 0:
--> 558     raise IOError(
    559         'Could not get file from fileserver: {}'.format(
    560             str(source_file)))
    561 return target_file

OSError: Could not get file from fileserver: /grp/g_biapol/robert/blobs.tif
@haesleinhuepf
Copy link
Member Author

Maybe... executing ws_allocate cache helps. Afterwards restarting the kernel was necessary

@thawn
Copy link

thawn commented Oct 17, 2022

this error was caused by #3 after which the temporary directory was not created

fixing #3 should have fixed this issue as well.

@thawn thawn closed this as completed Oct 17, 2022
@thawn
Copy link

thawn commented Oct 17, 2022

I spoke too soon :-(

@thawn thawn reopened this Oct 17, 2022
@thawn
Copy link

thawn commented Oct 17, 2022

it works, if the file is in the project space, but it does not work for files that are on the fileserver. Something is fishy with the temporary directory...

@thawn
Copy link

thawn commented Oct 17, 2022

also, this issue only occurs in the container.

@thawn
Copy link

thawn commented Oct 17, 2022

... and it is gone after restarting the kerne.
I suspect, it is a race condition that only occurs, if the scratch drive has not been used for a while.
I'll investigate further in taurus-datamover, issue 1

@thawn
Copy link

thawn commented Oct 18, 2022

This is a tough nut to crack. I cannot reproduce this any more - even on a fresh test account.

For now, I added a couple of assertions to ensure that we at least get a verbose error message when the ProjectFileTransfer object is first created and not only later when the temp directory is being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants