New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-895] Allow empty GCP credential for pipelines that only access to public data. #1280
Conversation
R: @davorbonaci |
@@ -0,0 +1,53 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cases when the user doesn't have credentials and is trying to run a pipeline, how does there error message now change for an unauthorized user. Before we would fail on creation of the client, but now we might try to stage files or contact BQ to see if datasets exist. Is the message as to why it failed same, better or worse off for various GCP service use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Users will see the same error message: NULL_CREDENTIAL_REASON.
- There will be few lines differences about where the exception is thrown.
We typically do:
someClient = Transport.newSomeClient(); // previous throws in here
// few lines between
someClient.list/insert(); // Now throws in here.
I think this changes should apply to GCP data services: BigQuery, PubSub, Storage, but not CloudResourceManager. Updated to get around this.
+ "for details on how to specify credentials. This version of the SDK is " | ||
+ "dependent on the gcloud core component version 2015.02.05 or newer to " | ||
+ "be able to get credentials from the currently authorized user via gcloud auth.", e); | ||
// Ignore the exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to leave the contract of this method the same but migrate GcpOptions to ignore failures in getting the credential?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to fail immediate if users provide a bad key file (line 112).
So, ignoring in here could differentiate the two cases.
+ "be able to get credentials from the currently authorized user via gcloud auth.", e); | ||
// Ignore the exception | ||
// Pipelines that only access to public data should be able to run without credentials. | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update Javadoc of this method to document this new contact, and possibly Javadoc of any calling method that has changed as a consequence?
* | ||
* <p>When the access is denied, it throws {@link IOException} with a detailed error message. | ||
*/ | ||
public class NullCredential implements HttpRequestInitializer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better name? SpecificExceptionAfterNonauthorizedResponse? something better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep NullCredential.
SpecificExceptionAfterNonauthorizedResponse is describing the implementation. If we need to do more for a "NullCredential", we will have to change the name. And, NullCredential follows the current pattern, and reads better in Transport as following:
if (credential == null) {
return new ChainingHttpRequestInitializer(new NullCredential(), httpRequestInitializer);
} else {
return new ChainingHttpRequestInitializer(credential, httpRequestInitializer);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NullCredentialInitializer?
Also, per Luke's comment -- we should do a manual inspection could any callers be negatively impacted. |
de23f50
to
356ae15
Compare
Updated and merged with the gcp credentials change. @lukecwik could you take another look? |
@peihe, thats correct. The user can call GoogleCredentials.fromStream(InputStream) to load any location they want if they want to use custom credential locations. The credentials library supports many more ways to load/create credentials beyond the few that were supported by Dataflow. |
One issue I found is that gcs writes could be buffered in the client, and the exception could be deferred until the file is closed. The result is one exception per bundle in TextIO.Write and BigQueryIO.Write. I have verified following cases with DirectRunner:
|
Verified TextIO.Write works with a public buckets. |
Where does this PR stand? Can we move forward? |
@davorbonaci My understanding is that your the primary reviewer. Has that changed? |
LGTM from a review standpoint. I know @peihe was investigating a few corner cases, but that seems to be done now. So, we are good to go, I think. |
1e0104e
to
a9392d0
Compare
PTAL done rename |
e83217c
to
7f6686a
Compare
I don't think you need the NoopCredentialFactory anymore since the default is to return null which implies to do the service calls with no credentials which is effectively what NoopCredentials is already doing. |
Creating a DataflowClient with null credential is considered as a error, since nothing is public with dataflow requests. NoopCredentialFactory is used by DataflowRunnerTest to create a non-null fake Credentials. |
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull request
mvn clean verify
. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>
in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.