Skip to content

[BEAM-1585] Filesystems should be picked using the provided scheme#2807

Closed
sb2nov wants to merge 1 commit intoapache:masterfrom
sb2nov:BEAM-1585-add-filesystem-scheme
Closed

[BEAM-1585] Filesystems should be picked using the provided scheme#2807
sb2nov wants to merge 1 commit intoapache:masterfrom
sb2nov:BEAM-1585-add-filesystem-scheme

Conversation

@sb2nov
Copy link
Contributor

@sb2nov sb2nov commented May 1, 2017

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

Filesystems should be picked using subclass method instead of only using the ones that are registered and are part of the if-else condition. All filesystem URI must expose a scheme method that provides the structure for the URI related to that FS.

R: @chamikaramj PTAL

@coveralls
Copy link

Coverage Status

Coverage remained the same at 70.386% when pulling 13f027c on sb2nov:BEAM-1585-add-filesystem-scheme into c2a5557 on apache:master.

@chamikaramj
Copy link
Contributor

Could you add more details about this change to the PR and commit descriptions ?

@sb2nov
Copy link
Contributor Author

sb2nov commented May 1, 2017

Done

@sb2nov sb2nov force-pushed the BEAM-1585-add-filesystem-scheme branch from 13f027c to 53fb089 Compare May 2, 2017 18:12
@coveralls
Copy link

Coverage Status

Coverage remained the same at 69.856% when pulling 53fb089 on sb2nov:BEAM-1585-add-filesystem-scheme into 4c3174b on apache:master.

Copy link
Contributor

@chamikaramj chamikaramj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

all_subclasses = []
for subclass in cls.__subclasses__():
all_subclasses.append(subclass)
all_subclasses.extend(subclass.get_all_subclasses())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How performant is this (say, for looking up GCSFileSystem in current code) ? Should we cache results ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took 10-e5 seconds without caching so don't think it is worth doing much here.

self.exception_details = exception_details


class abstractclassmethod(classmethod):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

from apache_beam.io.filesystem import FileSystem

# pylint: disable=wrong-import-position, unused-import
from apache_beam.io.localfilesystem import LocalFileSystem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move these FileSystem imports to a method and call that from FileSystem.init() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand that completely but we want to know about the FS without the user process actually instantiating anything so I think do need the imports


@staticmethod
def get_scheme(path):
match_result = FileSystems.URI_SCHEMA_PATTERN.match(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path.strip() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try:
return get_filesystem(path)
path_scheme = FileSystems.get_scheme(path)
for fs in FileSystem.get_all_subclasses():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if two different file systems define the same scheme ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm it is possible, should we throw an exception if that happens?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should raise an exception to prevent surprises in case somebody add a file-system with an existing prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL againn

return fs()
except Exception as e:
raise BeamIOError('Enable to get the Filesystem', {path: e})
raise ValueError('Enable to get the Filesystem for path %s' % path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sb2nov sb2nov force-pushed the BEAM-1585-add-filesystem-scheme branch from 53fb089 to ea6d480 Compare May 4, 2017 20:27
@chamikaramj
Copy link
Contributor

LGTM other than the above comment.

@sb2nov sb2nov force-pushed the BEAM-1585-add-filesystem-scheme branch from ea6d480 to 251b9e9 Compare May 4, 2017 22:28
@chamikaramj
Copy link
Contributor

LGTM. Thanks.

@chamikaramj
Copy link
Contributor

Seems like there is a Jenkins failure for Python SDK.
test_using_slow_impl (apache_beam.coders.slow_coders_test.SlowCoders) ... FAIL

FAIL: test_using_slow_impl (apache_beam.coders.slow_coders_test.SlowCoders)

Traceback (most recent call last):
File "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_MavenInstall/sdks/python/apache_beam/coders/slow_coders_test.py", line 41, in test_using_slow_impl
import apache_beam.coders.stream
AssertionError: ImportError not raised

@sb2nov sb2nov force-pushed the BEAM-1585-add-filesystem-scheme branch from 251b9e9 to 648bb98 Compare May 5, 2017 00:58
@sb2nov
Copy link
Contributor Author

sb2nov commented May 5, 2017

Retest this please

@sb2nov sb2nov force-pushed the BEAM-1585-add-filesystem-scheme branch from 648bb98 to 3e700f9 Compare May 5, 2017 18:00
@coveralls
Copy link

Coverage Status

Coverage increased (+0.008%) to 70.51% when pulling 3e700f9 on sb2nov:BEAM-1585-add-filesystem-scheme into 69846f5 on apache:master.

@asfgit asfgit closed this in 1ecb111 May 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants