-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validates that input and output GCS paths specify a bucket #2602
Conversation
gs://bucket seems like a valid output prefix. We should recognize that it's a directory, append the / , and then create files like Am I missing something fundamental? |
If user specifies gs://something, it could either be that 1) they forgot to specify the bucket, or that 2) they really want to write to bucket gs://something. [assumption X:] I think it's unlikely that they really want files named like "-0000-of-0001.txt", so in case 2 I'd assume that they forgot to specify the basename. With the current PR's approach, they'll get an error "please specify a bucket" and:
With your suggested approach, they'll get no error and:
Assumption X is the critical one; if it's valid, then my approach seems preferable; if it's invalid, then yours. |
retest this please |
Dan is swamped with stuff. R: @lukecwik instead. |
Looking at GcsUtil.expand, we do not support gs://some-bucket as read everything in this bucket, we expect the object () to be specified like gs://some-bucket/ For writing out a user could technically say they want gs://some-bucket and as Dan pointed out we could write files to gs://some-bucket/-0001-of-0004.txt Looking at GcsPath, it seems as though if we parse gs://some-bucket and then turn it back into a string we get gs://some-bucket/ so I'm thinking that the user error posted on SO should not have happened as it should have been specified as gs://some-bucket/-0001-of-0004.txt Looking at gsutil: I'm with @jkff with what he has proposed where users are likely always wanting to have a non-empty object for input (for glob expansion) and for output (to protect people from output names being strange). |
LGTM |
Context: http://stackoverflow.com/questions/43505776/google-dataflow-workflow-error
To be backported into Dataflow SDK as well.
R: @dhalperi