-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GAWB-2942] Leo pets per project: Make ServiceAccountProvider interface and update Leo code to use it #89
Conversation
jenkins retest |
1 similar comment
jenkins retest |
</changeSet> | ||
|
||
<changeSet logicalFilePath="leonardo" author="rtitle" id="cluster_migrate_service_accounts"> | ||
<sql dbms="mysql">UPDATE CLUSTER SET notebookServiceAccount = googleServiceAccount</sql> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All existing service accounts in the DB are really notebook service accounts.
* | ||
* @return service account email | ||
*/ | ||
def getLeoServiceAccount: WorkbenchEmail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK for this to not return a Future? I'm expecting this to be a constant value, and not an asynchronous operation. It's easier to work with in Leo code if it's not a future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, shouldn't this one be specified in config, so configurers can pass in the json keyblob?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's a good point. Leo config currently has a path to the serviceAccountPemFile. Should we remove this interface method and expect people running Leo to just configure the Leo SA+key via config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe this interface method can still exist but have a default implementation that reads the Leo SA+key from config?
Should be ready for review. Unit+swat tests should be passing (can't actually check since the github hooks aren't appearing). |
@@ -283,4 +280,18 @@ class ClusterMonitorActor(val cluster: Cluster, | |||
case Right(_) => throw ClusterNotReadyException(cluster.googleProject, cluster.clusterName) | |||
} | |||
} | |||
|
|||
private def removeCredentialsFromMetadata: Future[Unit] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's that stub method Hussein
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
jenkins retest |
3 similar comments
jenkins retest |
jenkins retest |
jenkins retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments and questions but 👍
@@ -81,11 +85,11 @@ object Boot extends App with LazyLogging { | |||
} | |||
} | |||
|
|||
private def constructAuthProvider(authProviderClass: String, authConfig: Config): LeoAuthProvider = { | |||
Class.forName(authProviderClass) | |||
private def constructProvider[T](className: String, config: Config): T = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* | ||
* @return service account email | ||
*/ | ||
def getLeoServiceAccount: WorkbenchEmail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable.
* | ||
* @param config any necessary configuration information. | ||
*/ | ||
abstract class ServiceAccountProvider(config: Config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tedious question: the auth provider is called LeoAuthProvider
. should this be called LeoServiceAccountProvider
?
(we seem to be inconsistent in general whether we should call classes X or LeoX or LeonardoX)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we're inconsistent in naming. :( I'm not a huge fan of LeoServiceAccountProvider
though because it's misleading. It's providing service accounts, not Leo service accounts. LeoAuthProvider
is a better because it's providing authorization specific to Leo operations.
* | ||
* @return service account email | ||
*/ | ||
def getLeoServiceAccount: WorkbenchEmail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, shouldn't this one be specified in config, so configurers can pass in the json keyblob?
@@ -283,4 +280,18 @@ class ClusterMonitorActor(val cluster: Cluster, | |||
case Right(_) => throw ClusterNotReadyException(cluster.googleProject, cluster.clusterName) | |||
} | |||
} | |||
|
|||
private def removeCredentialsFromMetadata: Future[Unit] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
_ <- OptionT.liftF(googleIamDAO.removeServiceAccountKey(dataprocConfig.leoGoogleProject, serviceAccountEmail, key)) | ||
} yield () | ||
|
||
tea.value.void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suck it, baby shoes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this reference but I appreciate the comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a better short story than "for sale, baby shoes, never worn"
val tea = for { | ||
key <- OptionT(dbRef.inTransaction { _.clusterQuery.getServiceAccountKeyId(googleProject, clusterName) }) | ||
serviceAccountEmail <- OptionT.fromOption[Future](serviceAccountOpt) | ||
_ <- OptionT.liftF(googleIamDAO.removeServiceAccountKey(dataprocConfig.leoGoogleProject, serviceAccountEmail, key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might return a Googly exception and nix the " internalDeleteCluster
must march on regardless of failures" condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though on further thought i'm conflicted about this. we don't monitor the result of the internalDeleteCluster
future in some cases... do we need delete-retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(possibly a problem for another PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good catch. I'm not sure what the right thing to do is. I see the value in having delete march on regardless of intermediate failures. But I also don't like swallowing errors if there is a problem.
Note the call to googleIamDAO.removeServiceAccountKey
itself retries on errors (and handles 404s): https://github.com/broadinstitute/workbench-libs/blob/develop/google/src/main/scala/org/broadinstitute/dsde/workbench/google/HttpGoogleIamDAO.scala#L168
Is that a reasonable compromise? Or should I jam a recover on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we jam a recover on it. We're going to swallow the error anyway; might as well not bail on cleaning up everything else too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this in the internalDeleteCluster
method
object ServiceAccountInfo { | ||
// TODO: something less hacky, please! | ||
// If we're going to use Jackson we should use it the right way, with annotations in our model. | ||
// Otherwise we should rip out LeonardoModelCopy + ClusterKluge and just use Leo model objects + spray json (my prefrence). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 publish leo-model and use it in swat. both leo-swat and our jupyter-docker are both in need of some serious love once we're AoU-ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
@@ -104,7 +105,7 @@ class GoogleDataprocDAO(protected val dataprocConfig: DataprocConfig, | |||
new GoogleCredential.Builder() | |||
.setTransport(httpTransport) | |||
.setJsonFactory(jsonFactory) | |||
.setServiceAccountId(dataprocConfig.serviceAccountEmail.value) | |||
.setServiceAccountId(serviceAccountProvider.getLeoServiceAccount.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is the only place where we use serviceAccountProvider
can we just pass in the leo SA in the ctor instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -35,7 +27,20 @@ case class ClusterRecord(id: Long, | |||
destroyedDate: Timestamp, | |||
jupyterExtensionUri: Option[String], | |||
initBucket: String, | |||
serviceAccountKeyId: Option[String]) | |||
machineConfig: MachineConfigRecord, | |||
serviceAccountInfo: ServiceAccountInfoRecord) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need these in the Record
classes? i can see why we'd need them in the Cluster
class, since we're json-ifying that, but here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added them here because slick doesn't like the ClusterRecord
case class with >22 members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
22 is rapidly becoming my least favorite number.
…e service account is used.
…ce and update Leo code to use it (#89) * Initial refactoring. Main compiles, tests don't. * Tests compile, don't pass * Add DB migration, tests pass now * Added Sam ServiceAccountProvider implementation, clean up some code * Oops, fix name * Beef up pet swat a little bit * Rebase from develop * Minor fixup * post rebase fix * Fix LeonardoModelCopy + ClusterKluge * Add stub method to remove creds from instance metadata, if an override service account is used. * Fix label swat test * s/overrideServiceAccount/notebookServiceAccount/g * Fix comment * Fix substring check * Dummy commit to test github * PR feedback: made ServiceAccountProvider return both the Leo SA and the pem file * Sleep a little more * Jam a recover on removeServiceAccountKey so the delete doesn't fail * Fix swat test flakiness, fix TODOs * retrieving service account keys from Google is SUPER inconsistent and flakey
…ce and update Leo code to use it (#89) * Initial refactoring. Main compiles, tests don't. * Tests compile, don't pass * Add DB migration, tests pass now * Added Sam ServiceAccountProvider implementation, clean up some code * Oops, fix name * Beef up pet swat a little bit * Rebase from develop * Minor fixup * post rebase fix * Fix LeonardoModelCopy + ClusterKluge * Add stub method to remove creds from instance metadata, if an override service account is used. * Fix label swat test * s/overrideServiceAccount/notebookServiceAccount/g * Fix comment * Fix substring check * Dummy commit to test github * PR feedback: made ServiceAccountProvider return both the Leo SA and the pem file * Sleep a little more * Jam a recover on removeServiceAccountKey so the delete doesn't fail * Fix swat test flakiness, fix TODOs * retrieving service account keys from Google is SUPER inconsistent and flakey
…ce and update Leo code to use it (#89) * Initial refactoring. Main compiles, tests don't. * Tests compile, don't pass * Add DB migration, tests pass now * Added Sam ServiceAccountProvider implementation, clean up some code * Oops, fix name * Beef up pet swat a little bit * Rebase from develop * Minor fixup * post rebase fix * Fix LeonardoModelCopy + ClusterKluge * Add stub method to remove creds from instance metadata, if an override service account is used. * Fix label swat test * s/overrideServiceAccount/notebookServiceAccount/g * Fix comment * Fix substring check * Dummy commit to test github * PR feedback: made ServiceAccountProvider return both the Leo SA and the pem file * Sleep a little more * Jam a recover on removeServiceAccountKey so the delete doesn't fail * Fix swat test flakiness, fix TODOs * retrieving service account keys from Google is SUPER inconsistent and flakey
…ce and update Leo code to use it (#89) * Initial refactoring. Main compiles, tests don't. * Tests compile, don't pass * Add DB migration, tests pass now * Added Sam ServiceAccountProvider implementation, clean up some code * Oops, fix name * Beef up pet swat a little bit * Rebase from develop * Minor fixup * post rebase fix * Fix LeonardoModelCopy + ClusterKluge * Add stub method to remove creds from instance metadata, if an override service account is used. * Fix label swat test * s/overrideServiceAccount/notebookServiceAccount/g * Fix comment * Fix substring check * Dummy commit to test github * PR feedback: made ServiceAccountProvider return both the Leo SA and the pem file * Sleep a little more * Jam a recover on removeServiceAccountKey so the delete doesn't fail * Fix swat test flakiness, fix TODOs * retrieving service account keys from Google is SUPER inconsistent and flakey
…ce and update Leo code to use it (#89) * Initial refactoring. Main compiles, tests don't. * Tests compile, don't pass * Add DB migration, tests pass now * Added Sam ServiceAccountProvider implementation, clean up some code * Oops, fix name * Beef up pet swat a little bit * Rebase from develop * Minor fixup * post rebase fix * Fix LeonardoModelCopy + ClusterKluge * Add stub method to remove creds from instance metadata, if an override service account is used. * Fix label swat test * s/overrideServiceAccount/notebookServiceAccount/g * Fix comment * Fix substring check * Dummy commit to test github * PR feedback: made ServiceAccountProvider return both the Leo SA and the pem file * Sleep a little more * Jam a recover on removeServiceAccountKey so the delete doesn't fail * Fix swat test flakiness, fix TODOs * retrieving service account keys from Google is SUPER inconsistent and flakey
https://broadinstitute.atlassian.net/browse/GAWB-2927
Adds a ServiceAccountProvider interface based on this Tech Doc.
Updates Leo code to use this interface to obtain service accounts. NOTE: this PR does not change current Leo behavior; switching to a pets-per-project implementation will be done in a follow-up (much smaller) PR.
A note on naming. I used the following terms (in code and in comments):
leoServiceAccount
== the service account Leo executes gcloud commands asclusterServiceAccount
== the service account passed to the 'gcloud dataproc create cluster --serviceAccount' commandnotebookServiceAccount
== the service account whose key we localize onto the clusterIf people have other suggestions re naming, please let me know.
Includes a database migration to track both types of SAs in the database.
Also beefs up swat tests slightly to check notebook credentials from multiple sources (oauth2, fiss, hadoop).
Have you read CONTRIBUTING.md lately? If not, do that first.
I, the developer opening this PR, do solemnly pinky swear that:
In all cases: