translate tempo config to pydantic models #5

michaeldmitry · 2024-06-21T14:18:30Z

Issue

tempo configurations are in the form of raw dicts.

Solution

Translate all currently used tempo config to pydantic models.

Testing instructions

For coordinator to work and push tempo configuration to workers, it needs an s3 integration. To workaround the issue and not need an actual s3 instance, we can use Facade charm for mocking s3 interface.

Deploy tempo k8s operator (used for validation)

juju deploy tempo-k8s tempo --resource tempo-image=grafana/tempo:2.4.0 --channel=latest/edge

Deploy tempo coordinator

cwd to tempo-coordinator-k8s-operator

charmcraft pack
juju deploy ./tempo-coordinator-k8s_ubuntu-22.04-amd64.charm

Deploy Facade

cwd to tempo-coordinator-k8s-operator

juju deploy facade s3-facade --channel=latest/edge
juju integrate s3-facade:provide-s3 tempo-coordinator-k8s:s3
juju run s3-facade/0 update --params ./tests/manual/facades/s3.yaml

Deploy tempo worker

cwd to tempo-worker-k8s-operator

git fetch
git checkout coordinator-integration
charmcraft pack
juju deploy ./tempo-worker-k8s_ubuntu-22.04-amd64.charm  tempo-worker --resource tempo-image=docker.io/ubuntu/tempo:2-22.04

Integrate cluster

 juju integrate tempo-coordinator-k8s tempo-worker

Validation

1- coordinator and worker should both be in active/idle
2- kubectl exec -it tempo-worker-0 -n -c tempo -- cat /etc/tempo/tempo.yaml
3- kubectl exec -it tempo-0 -n -c tempo -- cat /etc/tempo/tempo.yaml
4- Both config files should match apart from storage options, as s3 would be used in coordinator instead of local storage

TLS Validation

Repeat above steps after juju integrate ssc tempo-coordinator-k8s

github-advanced-security · 2024-06-21T14:39:13Z

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

src/charm.py

src/tempo.py

src/tempo_cluster.py

sed-i · 2024-06-25T03:44:17Z

src/tempo_config.py

It's great that the config is ops independent. Let's try to keep it that way.

Re pydantic model itself (and how strict it is in loading rel data): how would an upgrade sequence look like when the new version has a schema change? Could we get tracebacks for validation error between coordinator and worker?

After discussion with @sed-i
relation data for tempo config is sent as a dictionary, only the construction of the config in coordinator is done through pydantic models. And this dictionary is used by the dumb workers as is to run the workload.
So, in the event of changing the schema of the tempo configuration with a breaking change, workers would fail with an error in running the workload (or something similar).
A suggestion was to try to validate the received dict on the model in workload side and if it fails, we would put the worker in Blocking state instead of crashing with an error.

@PietroPasotti @mmkay what are your thoughts on this matter?

I think we don't even need to validate against the pydantic model.
Tempo itself can help us verify if the passed config file is correct or not.

So my thought here is we should try to fail early and loudly.
Option A: validate once.
The coordinator generates a config, the pydantic model validates that it's correct. We put it in relation data.
Worker picks it up, puts it in the workload, workload breaks because the config is not valid.
Now we have to figure out that the workload is broken and why.

Option B: validate twice.
The coordinator generates a config, the pydantic model validates that it's correct. We put it in relation data.
Worker picks it up, attempts to validate it but it fails. Worker immediately sets blocked status and goes belly up, saying 'config invalid'.

Regardless of how we do the validation in the worker (whether it's by the built-in cli tool offered by tempo or by a copy of the same pydantic model hosted in the controller, that's up to us) I think we should.

Option C: validate thrice.
The coordinator generates a config, the pydantic model validates that it's correct. We put it in relation data.
Worker picks it up, attempts to validate it with (the same) pydantic model and it succeeds.
It attempts to validate it with the tempo cli tool and it fails.
This means that the pydantic model and the tempo image are out of sync: our models encode a config that's not supported in the tempo version we're running.

Option C feels a bit like overkill, we can guard against this scenario by integration-testing the worker and verify that the config generated by the pydantic models always passes the built-in config validator. So I'd say let's go for B.
As to how we validate, I think Pydantic models offers us the most flexibility, because in case we want to be backwards-compatible or support consecutive tempo versions, the worker could implement logic like so:

if ConfigModelV1.validate(config_from_reldata): unit.status = ActiveStatus("Please upgrade the tempo coordinator to v2 (rev > 1412)") conf = ConfigModelV2.from_v1(config_from_reldata) else: conf = ConfigModelV2(config_from_reldata)

and the upgrade path would be:

upgrade the workers

upgrade the coordinator

tests/integration/test_ingressed_tls.py

PietroPasotti

impressive work!
I have a few suggestions about the API (class names, etc...) that are non-blocking
as for the commented-out modules, let's add an explanation of what's going on or let's remove them.

src/tempo_cluster.py

src/tempo.py

tests/integration/test_integration.py

tests/integration/test_scaling_monolithic.py

PietroPasotti

a few possible typing issues with the models

src/tempo_config.py

PietroPasotti

once the last couple of issues with the enum are fixed, I'm fine with merging :)
Thanks!

src/tempo_config.py

michaeldmitry requested a review from a team as a code owner June 21, 2024 14:18

sed-i reviewed Jun 25, 2024

View reviewed changes

github-actions bot added the Libraries: Out of sync label Jun 25, 2024

michaeldmitry force-pushed the OPENG-2508 branch 3 times, most recently from 16f5840 to 4535ebd Compare June 25, 2024 09:18

mmkay approved these changes Jun 27, 2024

View reviewed changes

PietroPasotti requested changes Jun 28, 2024

View reviewed changes

michaeldmitry force-pushed the OPENG-2508 branch 2 times, most recently from a8e028c to a78b27f Compare June 30, 2024 22:01

michaeldmitry mentioned this pull request Jun 30, 2024

Add a way to configure extra receivers for uncharmed applications #7

Merged

Merge branch 'main' into OPENG-2508

3f70f57

michaeldmitry force-pushed the OPENG-2508 branch from 4ce12e0 to 3f70f57 Compare July 1, 2024 11:42

PietroPasotti self-requested a review July 1, 2024 11:46

PietroPasotti reviewed Jul 1, 2024

View reviewed changes

src/tempo_config.py Outdated Show resolved Hide resolved

src/tempo_config.py Outdated Show resolved Hide resolved

src/tempo_config.py Outdated Show resolved Hide resolved

src/tempo_config.py Outdated Show resolved Hide resolved

michaeldmitry force-pushed the OPENG-2508 branch from 7dcd3d2 to c0c375e Compare July 1, 2024 13:07

fix field types

478bb55

michaeldmitry force-pushed the OPENG-2508 branch from c0c375e to 478bb55 Compare July 1, 2024 13:10

PietroPasotti approved these changes Jul 1, 2024

View reviewed changes

src/tempo_config.py Outdated Show resolved Hide resolved

src/tempo_config.py Show resolved Hide resolved

add all possible enum vals

a9e2d00

michaeldmitry merged commit 71847e4 into main Jul 1, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

translate tempo config to pydantic models #5

translate tempo config to pydantic models #5

michaeldmitry commented Jun 21, 2024 •

edited

Loading

github-advanced-security bot commented Jun 21, 2024

sed-i Jun 25, 2024

michaeldmitry Jun 25, 2024

michaeldmitry Jun 25, 2024 •

edited

Loading

PietroPasotti Jul 1, 2024

PietroPasotti left a comment

PietroPasotti left a comment

PietroPasotti left a comment

translate tempo config to pydantic models #5

translate tempo config to pydantic models #5

Conversation

michaeldmitry commented Jun 21, 2024 • edited Loading

Issue

Solution

Testing instructions

Deploy tempo k8s operator (used for validation)

Deploy tempo coordinator

Deploy Facade

Deploy tempo worker

Integrate cluster

Validation

TLS Validation

github-advanced-security bot commented Jun 21, 2024

sed-i Jun 25, 2024

Choose a reason for hiding this comment

michaeldmitry Jun 25, 2024

Choose a reason for hiding this comment

michaeldmitry Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

PietroPasotti Jul 1, 2024

Choose a reason for hiding this comment

PietroPasotti left a comment

Choose a reason for hiding this comment

PietroPasotti left a comment

Choose a reason for hiding this comment

PietroPasotti left a comment

Choose a reason for hiding this comment

michaeldmitry commented Jun 21, 2024 •

edited

Loading

michaeldmitry Jun 25, 2024 •

edited

Loading