Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for viewfs:// URLs. #665

Merged
merged 9 commits into from Feb 19, 2022
Merged

Conversation

ChandanChainani
Copy link
Contributor

@ChandanChainani ChandanChainani commented Oct 21, 2021

As stated by @zhezh that hdfs support viewfs
we just need to convert viewfs to hdfs uri internally
and it will work.

Fixes #645

As stated by @zhezh that hdfs support viewfs
we just need to convert viewfs to hdfs uri internally
and it will work.
@ChandanChainani
Copy link
Contributor Author

@mpenkov could you take a look?


uri_path = split_uri.netloc + split_uri.path
uri_path = "/" + uri_path.lstrip("/")
if not uri_path:
raise RuntimeError("invalid HDFS URI: %r" % uri_as_string)

return dict(scheme=SCHEME, uri_path=uri_path)
return dict(scheme="hdfs", uri_path=uri_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit hacky. It's OK to reuse the hdfs backend for viewfs, but here you're actually modifying the URL that the user passed. On top of violating the principle of least surprise, it's only marginally better than:

url = 'viewfs://foo/bar/baz'
with smart_open.open(url.replace('viewfs://', 'hdfs://'):
    ...

It'd be better to do

Suggested change
return dict(scheme="hdfs", uri_path=uri_path)
return dict(scheme=split_uri.scheme, uri_path=uri_path)

and then handle viewfs and hdfs as identical in remaining code (but keep them distinct). This would be similar to how we handle e.g. http and https.

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. It's a good start, but needs more work.

Please add some tests to verify that your contribution works and prevent future regressions. See smart_open/tests/test_hdfs.py and integration-tests/test_hdfs.py for the existing HDFS tests.

Left you a comment - please have a look and let me know when you're ready for another review.

@mpenkov
Copy link
Collaborator

mpenkov commented Feb 18, 2022

@ChandanChainani Are you able to finish this PR?

@ChandanChainani
Copy link
Contributor Author

ChandanChainani commented Feb 18, 2022

@mpenkov I didn't got the chance to complete the test changes.

Please can you guide me how can i write test cases for viewfs.

I am thinking since we are using hdfs internally hdfs test case should work for viewfs.

Sorry if I put the words in wrong way I am not good in english.

@piskvorky piskvorky added this to the 6.0.0 milestone Feb 18, 2022
@ChandanChainani
Copy link
Contributor Author

@mpenkov any suggestion on above comment.

@mpenkov mpenkov changed the title Added support for viewfs:// URLs. #645 Added support for viewfs:// URLs. Feb 19, 2022
Getting this error: [WinError 6] The handle is invalid
@mpenkov
Copy link
Collaborator

mpenkov commented Feb 19, 2022

I added some tests. Thank you for your work!

@mpenkov mpenkov merged commit eef642a into piskvorky:develop Feb 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request Support for viewfs://
3 participants