Skip to content

[BEAM-4291] Propagates artifact retrieval token in Flink runner and to the Java harness#5676

Merged
jkff merged 1 commit into
apache:masterfrom
jkff:retrieval-token
Jun 20, 2018
Merged

[BEAM-4291] Propagates artifact retrieval token in Flink runner and to the Java harness#5676
jkff merged 1 commit into
apache:masterfrom
jkff:retrieval-token

Conversation

@jkff
Copy link
Copy Markdown
Contributor

@jkff jkff commented Jun 18, 2018

Also changes Go artifact-related testing code to use the staging and retrieval tokens.

The bulk of this change is the changes in generated Go proto code.

R: @herohde for Go code, @angoenka for Java code

@jkff jkff requested a review from herohde June 18, 2018 21:52
@jkff jkff changed the title Propagates artifact retrieval token in Flink runner and to the Java harness [BEAM-4291] Propagates artifact retrieval token in Flink runner and to the Java harness Jun 19, 2018
@jkff jkff force-pushed the retrieval-token branch from 090986c to 3b09401 Compare June 19, 2018 00:03
@jkff jkff force-pushed the retrieval-token branch from 3b09401 to 721357e Compare June 19, 2018 00:05
@jkff
Copy link
Copy Markdown
Contributor Author

jkff commented Jun 19, 2018

OK, this is fully ready for review I think.


// (required) The artifact retrieval token produced by
// ArtifactStagingService.CommitManifestResponse.
string retrieval_token = 6;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level comment: What will we be using this token for, concretely? We already have the client ID to tell SDK harnesses apart. I don't recall it being discussed in the design doc.

It also seems somewhat out of place to include a retrieval_token here, but not the ApiServiceDescriptor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design was discussed some time ago (see also #5582) - we're coming from the assumption that all services are (or at least may be) globally distributed and stateless, i.e. we're not relying on the assumption that there's 1 ArtifactRetrievalService per worker or per harness. Without that assumption, we need the ArtifactRetrievalService calls to be somehow linked to which job we're talking about. Likewise, ArtifactStagingService also needs to know which job we're talking about.

We decided to do this by propagating tokens:

  • PrepareJob returns a token used for ArtifactStagingService calls
  • ArtifactStagingService.CommitManifest returns a token used for ArtifactRetrievalService calls

This token is an opaque string containing the information necessary for the service to do its job. In practice, with the "distributed file system" based implementations of both services, we're using (basically) a base path as the token.

Alternatively we could explicitly include the job ID in the RPCs, but that would require the services to do some sort of global lookup of artifact placement parameters based on job ID, it seems easier to include the necessary parameters explicitly in the token.

Now, since a retrieval token is needed for the harness to talk to the retrieval service, it seems reasonable to include it in provision info. It's not part of the service descriptor because it does not identify the service, it only gives a necessary argument for its RPCs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original design has the same assumptions. The provisioning and control service will still have to differentiate harnesses based on the client id (using internal lookups and whatnot). Artifact service is now different.

For the artifact ApiServiceDescriptor, I meant that if we must call provisioning first for the token then it might as well vendor the service descriptor, too (and perhaps the control service as well towards a smaller container contract). That would be a larger change, though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, we haven't thought about applying similar changes to provision and control services but it makes sense too - might do it later if we run into issues stemming from that.

// Runners may -- but are not required to -- enforce any limits provided.
Resources resource_limits = 4;

// (required) The artifact retrieval token produced by
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this token should not have to be identical to the commit side.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(addressed by previous comment: this token is, by definition, a way to pass information from the Commit call to the retrieval service - so it has to be identical to the commit side)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a requirement that the same underlying storage and necessary metadata is used for staging and retrieval (which might use a caching intermediate, say), so this seems like an unnecessary restriction. The user/sdk-side code may run in completely different environments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a caching intermediate would not need its own token format - it could just pass the given token to the "real" retrieval service?..

Copy link
Copy Markdown
Contributor

@herohde herohde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, but I don't want to block this PR if we're already going down this way.

@jkff jkff merged commit 3ae2a78 into apache:master Jun 20, 2018
@jkff jkff deleted the retrieval-token branch June 20, 2018 01:09
@jkff
Copy link
Copy Markdown
Contributor Author

jkff commented Jun 20, 2018

Thanks, merged, we can discuss the other issues in parallel if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants