-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: Fix pyfunc ensembler undeployment bug #331
Bugfix: Fix pyfunc ensembler undeployment bug #331
Conversation
Labels: map[string]string{ | ||
"env": envType, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should no longer be here the actual clusterServiceBuilder
does not even use an argument envType
to set the labels.
Labels: map[string]string{ | ||
"env": envType, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these from the expected responses since we were just 'expecting' a mocked response though in reality the actual method doesn't even actually set the envType
as the env
label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tracking this down, @deadlycoconuts ! Left some questions.
Endpoint: PyFuncEnsemblerServiceEndpoint, | ||
Port: PyFuncEnsemblerServicePort, | ||
Env: pyfuncConfig.Env, | ||
ServiceAccount: "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: If this is going to be an empty value, we can just leave this field unset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out, I've just removed that :)
ae645a6
to
8a71e04
Compare
4404564
to
973bce2
Compare
239fdac
to
dbfdac5
Compare
dbfdac5
to
64c2cbf
Compare
@@ -184,7 +184,6 @@ var _ = DeployedRouterContext("testdata/create_router_nop_logger_proprietary_exp | |||
|
|||
Eventually(func(g Gomega) { | |||
router = api.GetRouter(apiE, routerCtx.ProjectID, routerCtx.ID) | |||
// TODO: Why is it pending? - g.Expect(router.Value("status").Raw()).To(Equal(api.Status.Pending)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this as it is no longer needed (I agree that we shouldn't be expecting the pending
state).
), | ||
endpoint="anything", | ||
endpoint="ensemble", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made these changes to the SDK tests so these configs resemble the test configs here: https://github.com/caraml-dev/turing/blob/main/api/e2e/test/testdata/create_router_nop_logger_proprietary_exp.json.tmpl.
# check that the status of the router is pending | ||
try: | ||
router.wait_for_status(RouterStatus.PENDING) | ||
except TimeoutError: | ||
raise Exception( | ||
f"Turing API is taking too long for router {router.id} to start getting deployed." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this block here in order to verify that the router enters a pending
state before being deployed
. This is similar to the API e2e test here: https://github.com/caraml-dev/turing/blob/main/api/e2e/test/router_e2e_test.go#L236.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for sorting this out and all the test case analysis, @deadlycoconuts !
Context
After the Gorm library has been updated to v2 in #261, there is a bug discovered in how we're using the Gorm methods to save router versions to the DB whereby nested fields (such as the ensembler configs) do not get updated in their respective tables. This has caused deployed pyfunc ensemblers to not have their corresponding docker configs updated in the
ensembler_configs
table. Since these are left asNULL
, what then happens downstream is that the cluster controller fails to build KnativeService objects that correspond to the deployed ensembler instances, causing it to be unable to deploy ensemblers that have been deployed (either as part of a router update operation or an undeployment operation).This PR introduces a config to the Gorm DB layer which adds a new fieldFullSaveAssociations: true
to the existing Gorm DB configs, which would ensure that all tables get updated whenever Gorm is used to update a router version (or any other object for that matter of fact). See go-gorm/gorm#6162 for more details.This PR introduces additional steps to update the ensembler and enricher configs of a router version (these are stored in their respective tables) and were previously not getting updated whenever a router version is updated:
Totally Unrelated Changes - Unused Arguments Clean Up
The linter's raising a lot of warnings about unused arguments so I've decided to remove them completely wherever possible, which is the case for the methods like
NewRouterEndpoint
,NewEnricherService
andNewEnsemblerService
since the existing implementations of the theclusterSvcBuilder
don't even use them. And since these methods are concrete implementations of the interfaceClusterServiceBuilder
's methods, I've also removed the arguments from them too, which then requires some minor refactoring of the methods ofmockClusterServiceBuilder
(which also implements the interface) and the tests where it is used.Most notably, I've removed all instances of the following:
that's specified in the expected and mocked responses (in
api/turing/service/router_deployment_service_test.go
) since in no way is the actual implementation of theclusterSvcBuilder
ever using theenvType
argument to set labels in theKnativeService
struct returned. It's weird to continue mocking responses that set those labels and then configure the expected responses to include those labels.In all other cases (the majority of them), I've simply renamed the remaining unused arguments to
_
so those should be pretty easy to look through.