[BEAM-1585] Filesystems should be picked using the provided scheme#2807
[BEAM-1585] Filesystems should be picked using the provided scheme#2807sb2nov wants to merge 1 commit intoapache:masterfrom
Conversation
|
Could you add more details about this change to the PR and commit descriptions ? |
|
Done |
13f027c to
53fb089
Compare
| all_subclasses = [] | ||
| for subclass in cls.__subclasses__(): | ||
| all_subclasses.append(subclass) | ||
| all_subclasses.extend(subclass.get_all_subclasses()) |
There was a problem hiding this comment.
How performant is this (say, for looking up GCSFileSystem in current code) ? Should we cache results ?
There was a problem hiding this comment.
It took 10-e5 seconds without caching so don't think it is worth doing much here.
| self.exception_details = exception_details | ||
|
|
||
|
|
||
| class abstractclassmethod(classmethod): |
| from apache_beam.io.filesystem import FileSystem | ||
|
|
||
| # pylint: disable=wrong-import-position, unused-import | ||
| from apache_beam.io.localfilesystem import LocalFileSystem |
There was a problem hiding this comment.
can we move these FileSystem imports to a method and call that from FileSystem.init() ?
There was a problem hiding this comment.
Not sure I understand that completely but we want to know about the FS without the user process actually instantiating anything so I think do need the imports
|
|
||
| @staticmethod | ||
| def get_scheme(path): | ||
| match_result = FileSystems.URI_SCHEMA_PATTERN.match(path) |
| try: | ||
| return get_filesystem(path) | ||
| path_scheme = FileSystems.get_scheme(path) | ||
| for fs in FileSystem.get_all_subclasses(): |
There was a problem hiding this comment.
What if two different file systems define the same scheme ?
There was a problem hiding this comment.
hmm it is possible, should we throw an exception if that happens?
There was a problem hiding this comment.
I think we should raise an exception to prevent surprises in case somebody add a file-system with an existing prefix.
| return fs() | ||
| except Exception as e: | ||
| raise BeamIOError('Enable to get the Filesystem', {path: e}) | ||
| raise ValueError('Enable to get the Filesystem for path %s' % path) |
53fb089 to
ea6d480
Compare
|
LGTM other than the above comment. |
ea6d480 to
251b9e9
Compare
|
LGTM. Thanks. |
|
Seems like there is a Jenkins failure for Python SDK. FAIL: test_using_slow_impl (apache_beam.coders.slow_coders_test.SlowCoders)Traceback (most recent call last): |
251b9e9 to
648bb98
Compare
|
Retest this please |
648bb98 to
3e700f9
Compare
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull requestmvn clean verify. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.
Filesystems should be picked using subclass method instead of only using the ones that are registered and are part of the if-else condition. All filesystem URI must expose a scheme method that provides the structure for the URI related to that FS.
R: @chamikaramj PTAL