Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GAWB-2942] Leo pets per project: Make ServiceAccountProvider interface and update Leo code to use it #89

Merged
merged 21 commits into from
Dec 11, 2017

Conversation

rtitle
Copy link
Collaborator

@rtitle rtitle commented Dec 7, 2017

https://broadinstitute.atlassian.net/browse/GAWB-2927

Adds a ServiceAccountProvider interface based on this Tech Doc.

Updates Leo code to use this interface to obtain service accounts. NOTE: this PR does not change current Leo behavior; switching to a pets-per-project implementation will be done in a follow-up (much smaller) PR.

A note on naming. I used the following terms (in code and in comments):

  • leoServiceAccount == the service account Leo executes gcloud commands as
  • clusterServiceAccount == the service account passed to the 'gcloud dataproc create cluster --serviceAccount' command
  • notebookServiceAccount == the service account whose key we localize onto the cluster

If people have other suggestions re naming, please let me know.

Includes a database migration to track both types of SAs in the database.

Also beefs up swat tests slightly to check notebook credentials from multiple sources (oauth2, fiss, hadoop).

Have you read CONTRIBUTING.md lately? If not, do that first.

I, the developer opening this PR, do solemnly pinky swear that:

  • I've documented my API changes in Swagger

In all cases:

  • Get two thumbsworth of review and PO signoff if necessary
  • Verify all tests go green
  • Squash and merge; you can delete your branch after this
  • Test this change deployed correctly and works on dev environment after deployment

@rtitle
Copy link
Collaborator Author

rtitle commented Dec 7, 2017

jenkins retest

1 similar comment
@rtitle
Copy link
Collaborator Author

rtitle commented Dec 7, 2017

jenkins retest

</changeSet>

<changeSet logicalFilePath="leonardo" author="rtitle" id="cluster_migrate_service_accounts">
<sql dbms="mysql">UPDATE CLUSTER SET notebookServiceAccount = googleServiceAccount</sql>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All existing service accounts in the DB are really notebook service accounts.

*
* @return service account email
*/
def getLeoServiceAccount: WorkbenchEmail
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK for this to not return a Future? I'm expecting this to be a constant value, and not an asynchronous operation. It's easier to work with in Leo code if it's not a future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, shouldn't this one be specified in config, so configurers can pass in the json keyblob?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's a good point. Leo config currently has a path to the serviceAccountPemFile. Should we remove this interface method and expect people running Leo to just configure the Leo SA+key via config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe this interface method can still exist but have a default implementation that reads the Leo SA+key from config?

@rtitle rtitle changed the title [NOTYET] [GAWB-2942] Leo pets per project: Make ServiceAccountProvider interface and update Leo code to use it [GAWB-2942] Leo pets per project: Make ServiceAccountProvider interface and update Leo code to use it Dec 7, 2017
@rtitle rtitle requested a review from helgridly December 7, 2017 19:39
@rtitle
Copy link
Collaborator Author

rtitle commented Dec 7, 2017

Should be ready for review. Unit+swat tests should be passing (can't actually check since the github hooks aren't appearing).

@@ -283,4 +280,18 @@ class ClusterMonitorActor(val cluster: Cluster,
case Right(_) => throw ClusterNotReadyException(cluster.googleProject, cluster.clusterName)
}
}

private def removeCredentialsFromMetadata: Future[Unit] = {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's that stub method Hussein

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rtitle
Copy link
Collaborator Author

rtitle commented Dec 7, 2017

jenkins retest

3 similar comments
@rtitle
Copy link
Collaborator Author

rtitle commented Dec 7, 2017

jenkins retest

@rtitle
Copy link
Collaborator Author

rtitle commented Dec 7, 2017

jenkins retest

@rtitle
Copy link
Collaborator Author

rtitle commented Dec 8, 2017

jenkins retest

Copy link
Contributor

@helgridly helgridly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments and questions but 👍

@@ -81,11 +85,11 @@ object Boot extends App with LazyLogging {
}
}

private def constructAuthProvider(authProviderClass: String, authConfig: Config): LeoAuthProvider = {
Class.forName(authProviderClass)
private def constructProvider[T](className: String, config: Config): T = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

*
* @return service account email
*/
def getLeoServiceAccount: WorkbenchEmail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable.

*
* @param config any necessary configuration information.
*/
abstract class ServiceAccountProvider(config: Config) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tedious question: the auth provider is called LeoAuthProvider. should this be called LeoServiceAccountProvider?

(we seem to be inconsistent in general whether we should call classes X or LeoX or LeonardoX)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we're inconsistent in naming. :( I'm not a huge fan of LeoServiceAccountProvider though because it's misleading. It's providing service accounts, not Leo service accounts. LeoAuthProvider is a better because it's providing authorization specific to Leo operations.

*
* @return service account email
*/
def getLeoServiceAccount: WorkbenchEmail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, shouldn't this one be specified in config, so configurers can pass in the json keyblob?

@@ -283,4 +280,18 @@ class ClusterMonitorActor(val cluster: Cluster,
case Right(_) => throw ClusterNotReadyException(cluster.googleProject, cluster.clusterName)
}
}

private def removeCredentialsFromMetadata: Future[Unit] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

_ <- OptionT.liftF(googleIamDAO.removeServiceAccountKey(dataprocConfig.leoGoogleProject, serviceAccountEmail, key))
} yield ()

tea.value.void
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suck it, baby shoes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this reference but I appreciate the comment!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a better short story than "for sale, baby shoes, never worn"

val tea = for {
key <- OptionT(dbRef.inTransaction { _.clusterQuery.getServiceAccountKeyId(googleProject, clusterName) })
serviceAccountEmail <- OptionT.fromOption[Future](serviceAccountOpt)
_ <- OptionT.liftF(googleIamDAO.removeServiceAccountKey(dataprocConfig.leoGoogleProject, serviceAccountEmail, key))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might return a Googly exception and nix the " internalDeleteCluster must march on regardless of failures" condition

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though on further thought i'm conflicted about this. we don't monitor the result of the internalDeleteCluster future in some cases... do we need delete-retries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(possibly a problem for another PR)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch. I'm not sure what the right thing to do is. I see the value in having delete march on regardless of intermediate failures. But I also don't like swallowing errors if there is a problem.

Note the call to googleIamDAO.removeServiceAccountKey itself retries on errors (and handles 404s): https://github.com/broadinstitute/workbench-libs/blob/develop/google/src/main/scala/org/broadinstitute/dsde/workbench/google/HttpGoogleIamDAO.scala#L168

Is that a reasonable compromise? Or should I jam a recover on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we jam a recover on it. We're going to swallow the error anyway; might as well not bail on cleaning up everything else too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this in the internalDeleteCluster method

object ServiceAccountInfo {
// TODO: something less hacky, please!
// If we're going to use Jackson we should use it the right way, with annotations in our model.
// Otherwise we should rip out LeonardoModelCopy + ClusterKluge and just use Leo model objects + spray json (my prefrence).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 publish leo-model and use it in swat. both leo-swat and our jupyter-docker are both in need of some serious love once we're AoU-ready.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@@ -104,7 +105,7 @@ class GoogleDataprocDAO(protected val dataprocConfig: DataprocConfig,
new GoogleCredential.Builder()
.setTransport(httpTransport)
.setJsonFactory(jsonFactory)
.setServiceAccountId(dataprocConfig.serviceAccountEmail.value)
.setServiceAccountId(serviceAccountProvider.getLeoServiceAccount.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is the only place where we use serviceAccountProvider can we just pass in the leo SA in the ctor instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -35,7 +27,20 @@ case class ClusterRecord(id: Long,
destroyedDate: Timestamp,
jupyterExtensionUri: Option[String],
initBucket: String,
serviceAccountKeyId: Option[String])
machineConfig: MachineConfigRecord,
serviceAccountInfo: ServiceAccountInfoRecord)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need these in the Record classes? i can see why we'd need them in the Cluster class, since we're json-ifying that, but here too?

Copy link
Collaborator Author

@rtitle rtitle Dec 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added them here because slick doesn't like the ClusterRecord case class with >22 members.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

22 is rapidly becoming my least favorite number.

@rtitle rtitle merged commit b2ec3ec into develop Dec 11, 2017
@rtitle rtitle deleted the rt-gawb-2942 branch December 11, 2017 23:32
akarukappadath pushed a commit that referenced this pull request Jul 11, 2018
…ce and update Leo code to use it (#89)

* Initial refactoring. Main compiles, tests don't.

* Tests compile, don't pass

* Add DB migration, tests pass now

* Added Sam ServiceAccountProvider implementation, clean up some code

* Oops, fix name

* Beef up pet swat a little bit

* Rebase from develop

* Minor fixup

* post rebase fix

* Fix LeonardoModelCopy + ClusterKluge

* Add stub method to remove creds from instance metadata, if an override service account is used.

* Fix label swat test

* s/overrideServiceAccount/notebookServiceAccount/g

* Fix comment

* Fix substring check

* Dummy commit to test github

* PR feedback: made ServiceAccountProvider return both the Leo SA and the pem file

* Sleep a little more

* Jam a recover on removeServiceAccountKey so the delete doesn't fail

* Fix swat test flakiness, fix TODOs

* retrieving service account keys from Google is SUPER inconsistent and flakey
akarukappadath pushed a commit that referenced this pull request Oct 22, 2018
…ce and update Leo code to use it (#89)

* Initial refactoring. Main compiles, tests don't.

* Tests compile, don't pass

* Add DB migration, tests pass now

* Added Sam ServiceAccountProvider implementation, clean up some code

* Oops, fix name

* Beef up pet swat a little bit

* Rebase from develop

* Minor fixup

* post rebase fix

* Fix LeonardoModelCopy + ClusterKluge

* Add stub method to remove creds from instance metadata, if an override service account is used.

* Fix label swat test

* s/overrideServiceAccount/notebookServiceAccount/g

* Fix comment

* Fix substring check

* Dummy commit to test github

* PR feedback: made ServiceAccountProvider return both the Leo SA and the pem file

* Sleep a little more

* Jam a recover on removeServiceAccountKey so the delete doesn't fail

* Fix swat test flakiness, fix TODOs

* retrieving service account keys from Google is SUPER inconsistent and flakey
Qi77Qi pushed a commit that referenced this pull request Jun 25, 2019
…ce and update Leo code to use it (#89)

* Initial refactoring. Main compiles, tests don't.

* Tests compile, don't pass

* Add DB migration, tests pass now

* Added Sam ServiceAccountProvider implementation, clean up some code

* Oops, fix name

* Beef up pet swat a little bit

* Rebase from develop

* Minor fixup

* post rebase fix

* Fix LeonardoModelCopy + ClusterKluge

* Add stub method to remove creds from instance metadata, if an override service account is used.

* Fix label swat test

* s/overrideServiceAccount/notebookServiceAccount/g

* Fix comment

* Fix substring check

* Dummy commit to test github

* PR feedback: made ServiceAccountProvider return both the Leo SA and the pem file

* Sleep a little more

* Jam a recover on removeServiceAccountKey so the delete doesn't fail

* Fix swat test flakiness, fix TODOs

* retrieving service account keys from Google is SUPER inconsistent and flakey
jchanbroad pushed a commit that referenced this pull request Jun 27, 2019
…ce and update Leo code to use it (#89)

* Initial refactoring. Main compiles, tests don't.

* Tests compile, don't pass

* Add DB migration, tests pass now

* Added Sam ServiceAccountProvider implementation, clean up some code

* Oops, fix name

* Beef up pet swat a little bit

* Rebase from develop

* Minor fixup

* post rebase fix

* Fix LeonardoModelCopy + ClusterKluge

* Add stub method to remove creds from instance metadata, if an override service account is used.

* Fix label swat test

* s/overrideServiceAccount/notebookServiceAccount/g

* Fix comment

* Fix substring check

* Dummy commit to test github

* PR feedback: made ServiceAccountProvider return both the Leo SA and the pem file

* Sleep a little more

* Jam a recover on removeServiceAccountKey so the delete doesn't fail

* Fix swat test flakiness, fix TODOs

* retrieving service account keys from Google is SUPER inconsistent and flakey
jdcanas pushed a commit that referenced this pull request Jul 10, 2019
…ce and update Leo code to use it (#89)

* Initial refactoring. Main compiles, tests don't.

* Tests compile, don't pass

* Add DB migration, tests pass now

* Added Sam ServiceAccountProvider implementation, clean up some code

* Oops, fix name

* Beef up pet swat a little bit

* Rebase from develop

* Minor fixup

* post rebase fix

* Fix LeonardoModelCopy + ClusterKluge

* Add stub method to remove creds from instance metadata, if an override service account is used.

* Fix label swat test

* s/overrideServiceAccount/notebookServiceAccount/g

* Fix comment

* Fix substring check

* Dummy commit to test github

* PR feedback: made ServiceAccountProvider return both the Leo SA and the pem file

* Sleep a little more

* Jam a recover on removeServiceAccountKey so the delete doesn't fail

* Fix swat test flakiness, fix TODOs

* retrieving service account keys from Google is SUPER inconsistent and flakey
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants