Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNAnexus module within stor #2

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Conversation

anujkumar93
Copy link
Owner

@anujkumar93 anujkumar93 commented Sep 5, 2018

Implement apis for DNAnexus inside stor. Also implement unit and integration testing for DNAnexus paths inside stor. Setup DNAnexus credentials for testing.

Motivation

As part of the epic to support DNAnexus paths within stor.

Implementation

  • Setup DX canonical, virtual and super classes for stor
  • Setup vcrpy and cassettes for testing, and login token in tox
  • The following api's are implemented as part of this PR, along with their tests:
  1. abspath
  2. normcase
  3. normpath
  4. realpath
  5. expanduser
  6. expandvars
  7. dirname
  8. basename
  9. expand
  10. join
  11. split
  12. splitext
  13. list
  14. listdir
  15. exists
  16. isabs
  17. isdir
  18. isfile
  19. islink
  20. ismount
  21. getsize
  22. walkfiles
  23. stat
  24. temp_url

@anujkumar93 anujkumar93 changed the title DNAnexus testing module within stor DNAnexus module within stor Sep 5, 2018
@anujkumar93 anujkumar93 self-assigned this Sep 6, 2018
Copy link

@jtratner jtratner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bunch of comments

NotFoundError = stor_exceptions.NotFoundError


def _parse_dx_error(exc, **kwargs):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks unused now - should you remove it? It's also missing a second else clause when none of the cases match that would lead to a confusing error message

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, outdated and removed.

wrapper.__doc__ = 'No-op for %r' % attr_name
return wrapper

abspath = _noop('abspath')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this going to lead to bugs around dx://Proj:/parent/subdir/../anotherfile.txt - do we care about that behavior?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abspath is not used for DXPaths in code base, and seeing that OBS paths in general have abspath as a noop, it is upto the user to use abspath cautiously with DXPaths.

project=self.canonical_project
)[0]
else:
raise DXError('DX Projects cannot have a temporary download url')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty close to the right way to do this; however a couple points:

  1. since the if clause always raise an error, you can check it first, raise the exception and dedent here, i.e.:
if not self.resource:
    raise DXError('Can only generate a download URL for individual files')
try:
    return file_handler.get_download_url(...)
except: pass
  1. I think you want to check resource and not canonical resource before raising an error, no reason to go hit an API to canonicalize a project before raising this exception.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on both points, and changed.

return file_handler.get_download_url(
duration=lifetime,
preauthenticated=True,
filename=filename,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're already describing this, should filename just be what's returned from describe() if not present? what happens if filename is None here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If filename is None here, the get_download_url defaults to the name of the source file, which is the behavior we want.

else:
raise DXError('DX Projects cannot have a temporary download url')
except ValueError:
raise DXError('DXPaths ending in folders cannot have a temporary download url')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would you hit this exception? wouldn't you know this because you can't generate a canonical resource for this method?

I'm thinking it'd be clearer to write something like:

if not self.resource: error
canonical = self.canonicalize()  # causes error if it's a folder already??

in the second case I think it's fine to just have an error that's just about data objects, etc

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed after discussion. Changed the method to reflect this.

if canonicalize:
yield DXCanonicalPath('dx://{}:/{}'.format(obj['project'], obj['id']))
else:
dx_p = DXVirtualPath(self.drive + proj_name + ':' + obj['describe']['folder'])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this'd be clearer written as:

'{drive}{proj_name}:{folder}/{name}'.format(drive=drive, proj_name=proj_name, folder=obj['describe']['folder'], name=obj['describe']['name'])
``` which I believe does the same thing but avoids a level of indirection

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, changed here and other places where it would be more intuitive to write like this.

pass
# otherwise we could be a directory, so try to grab first
# file/subfolder
if self._is_folder():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help me understand what the _is_folder check adds here? why not just try to list it directly and return False on the NotFoundError?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you ensure there's an integration test that shows the following is True?

# empty dirs should be seen as existing
folder_path = dx_p / 'myfolder'
folder_path.makedirs_p()
assert stor.exists(folder_path)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a documented diff between swift vs. s3 vs. posix

Swift & S3 - "empty" folder is exists = False
Posix & DX - "empty" folder is exists = True

Copy link
Owner Author

@anujkumar93 anujkumar93 Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed _is_folder

There is a unit test TestExists.test_true_empty_dir to check that empty dirs are seen as existing. Should I add an integration test as well?

"""
if not self.resource:
return None
if self._is_folder():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I look here, it feels like this should just raise a NotFoundError that specifies no data object could be found for the given path.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and changed.

'folder': self.resource.parent,
'project': self.canonical_project
}]
results = dxpy.resolve_data_objects(objects=objects)[0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside, should you pass in limit=2 here, just cuz we only care 0 or 1

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately resolve_data_objects doesn't have a limit argument, hence cannot pass it.

.format(self.project))
return proj_dict['id']
except DXError as e:
raise DuplicateProjectError('Duplicate projects were found with given name ({})'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a dangerous check, because it's possible you'd swallow up a different error (e.g., about authentication or something else). can you list projects with a limit of 2 and then handle 0/1/2 projects with appropriate exceptions?

Copy link
Owner Author

@anujkumar93 anujkumar93 Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed DXError to DXSearchError which is only raised when there are multiple projects.
Since find_one_project does not have a limit argument, the only other way is to pass zero_ok=False, in which case DXSearchError is raised for 0/2 projects and there's no way to differentiate the two scenarios.


Args:
canonicalize (boolean): whether to return canonicalized paths
only (str): "objects" for only objects, "folders" for only folder,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to keep because using it below

"""List the path as a dir, returning top-level directories and files.

Args:
canonicalize (boolean): whether to return canonicalized paths
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canonicalize (bool, default False): if True, return canonical paths

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed here and other places. Will include in next commit.

@anujkumar93
Copy link
Owner Author

want to add a max limit on number of iterations? Maybe wait for 10s at most or something?

Yes, added a maximum limit of 15s, after which the test case fails

@anujkumar93
Copy link
Owner Author

with above, seems ripe for generate_closed_file(<path>) method

Agreed. Made a 'setup_file' method that waits for the files to close before returning the handler.

@pkaleta pkaleta removed their request for review December 16, 2022 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants