-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
translate tempo config to pydantic models #5
Conversation
This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great that the config is ops independent. Let's try to keep it that way.
Re pydantic model itself (and how strict it is in loading rel data): how would an upgrade sequence look like when the new version has a schema change? Could we get tracebacks for validation error between coordinator and worker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with @sed-i
relation data for tempo config is sent as a dictionary, only the construction of the config in coordinator is done through pydantic models. And this dictionary is used by the dumb workers as is to run the workload.
So, in the event of changing the schema of the tempo configuration with a breaking change, workers would fail with an error in running the workload (or something similar).
A suggestion was to try to validate the received dict on the model in workload side and if it fails, we would put the worker in Blocking state instead of crashing with an error.
@PietroPasotti @mmkay what are your thoughts on this matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't even need to validate against the pydantic model.
Tempo itself can help us verify if the passed config file is correct or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my thought here is we should try to fail early and loudly.
Option A: validate once.
The coordinator generates a config, the pydantic model validates that it's correct. We put it in relation data.
Worker picks it up, puts it in the workload, workload breaks because the config is not valid.
Now we have to figure out that the workload is broken and why.
Option B: validate twice.
The coordinator generates a config, the pydantic model validates that it's correct. We put it in relation data.
Worker picks it up, attempts to validate it but it fails. Worker immediately sets blocked status and goes belly up, saying 'config invalid'.
Regardless of how we do the validation in the worker (whether it's by the built-in cli tool offered by tempo or by a copy of the same pydantic model hosted in the controller, that's up to us) I think we should.
Option C: validate thrice.
The coordinator generates a config, the pydantic model validates that it's correct. We put it in relation data.
Worker picks it up, attempts to validate it with (the same) pydantic model and it succeeds.
It attempts to validate it with the tempo cli tool and it fails.
This means that the pydantic model and the tempo image are out of sync: our models encode a config that's not supported in the tempo version we're running.
Option C feels a bit like overkill, we can guard against this scenario by integration-testing the worker and verify that the config generated by the pydantic models always passes the built-in config validator. So I'd say let's go for B.
As to how we validate, I think Pydantic models offers us the most flexibility, because in case we want to be backwards-compatible or support consecutive tempo versions, the worker could implement logic like so:
if ConfigModelV1.validate(config_from_reldata):
unit.status = ActiveStatus("Please upgrade the tempo coordinator to v2 (rev > 1412)")
conf = ConfigModelV2.from_v1(config_from_reldata)
else:
conf = ConfigModelV2(config_from_reldata)
and the upgrade path would be:
- upgrade the workers
- upgrade the coordinator
16f5840
to
4535ebd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impressive work!
I have a few suggestions about the API (class names, etc...) that are non-blocking
as for the commented-out modules, let's add an explanation of what's going on or let's remove them.
a8e028c
to
a78b27f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few possible typing issues with the models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once the last couple of issues with the enum are fixed, I'm fine with merging :)
Thanks!
Issue
tempo configurations are in the form of raw dicts.
Solution
Translate all currently used tempo config to pydantic models.
Testing instructions
For coordinator to work and push tempo configuration to workers, it needs an s3 integration. To workaround the issue and not need an actual s3 instance, we can use Facade charm for mocking s3 interface.
Deploy tempo k8s operator (used for validation)
Deploy tempo coordinator
cwd to
tempo-coordinator-k8s-operator
Deploy Facade
cwd to
tempo-coordinator-k8s-operator
Deploy tempo worker
cwd to
tempo-worker-k8s-operator
Integrate cluster
Validation
1- coordinator and worker should both be in active/idle
2- kubectl exec -it tempo-worker-0 -n -c tempo -- cat /etc/tempo/tempo.yaml
3- kubectl exec -it tempo-0 -n -c tempo -- cat /etc/tempo/tempo.yaml
4- Both config files should match apart from storage options, as s3 would be used in coordinator instead of local storage
TLS Validation
Repeat above steps after
juju integrate ssc tempo-coordinator-k8s