Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

translate tempo config to pydantic models #5

Merged
merged 3 commits into from
Jul 1, 2024
Merged

Conversation

michaeldmitry
Copy link
Contributor

@michaeldmitry michaeldmitry commented Jun 21, 2024

Issue

tempo configurations are in the form of raw dicts.

Solution

Translate all currently used tempo config to pydantic models.

Testing instructions

For coordinator to work and push tempo configuration to workers, it needs an s3 integration. To workaround the issue and not need an actual s3 instance, we can use Facade charm for mocking s3 interface.

Deploy tempo k8s operator (used for validation)

juju deploy tempo-k8s tempo --resource tempo-image=grafana/tempo:2.4.0 --channel=latest/edge

Deploy tempo coordinator

cwd to tempo-coordinator-k8s-operator

charmcraft pack
juju deploy ./tempo-coordinator-k8s_ubuntu-22.04-amd64.charm 

Deploy Facade

cwd to tempo-coordinator-k8s-operator

juju deploy facade s3-facade --channel=latest/edge
juju integrate s3-facade:provide-s3 tempo-coordinator-k8s:s3
juju run s3-facade/0 update --params ./tests/manual/facades/s3.yaml

Deploy tempo worker

cwd to tempo-worker-k8s-operator

git fetch
git checkout coordinator-integration
charmcraft pack
juju deploy ./tempo-worker-k8s_ubuntu-22.04-amd64.charm  tempo-worker --resource tempo-image=docker.io/ubuntu/tempo:2-22.04

Integrate cluster

 juju integrate tempo-coordinator-k8s tempo-worker

Validation

1- coordinator and worker should both be in active/idle
2- kubectl exec -it tempo-worker-0 -n -c tempo -- cat /etc/tempo/tempo.yaml
3- kubectl exec -it tempo-0 -n -c tempo -- cat /etc/tempo/tempo.yaml
4- Both config files should match apart from storage options, as s3 would be used in coordinator instead of local storage

TLS Validation

Repeat above steps after juju integrate ssc tempo-coordinator-k8s

@michaeldmitry michaeldmitry requested a review from a team as a code owner June 21, 2024 14:18
@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

src/charm.py Outdated Show resolved Hide resolved
src/tempo.py Show resolved Hide resolved
src/tempo_cluster.py Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great that the config is ops independent. Let's try to keep it that way.

Re pydantic model itself (and how strict it is in loading rel data): how would an upgrade sequence look like when the new version has a schema change? Could we get tracebacks for validation error between coordinator and worker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @sed-i
relation data for tempo config is sent as a dictionary, only the construction of the config in coordinator is done through pydantic models. And this dictionary is used by the dumb workers as is to run the workload.
So, in the event of changing the schema of the tempo configuration with a breaking change, workers would fail with an error in running the workload (or something similar).
A suggestion was to try to validate the received dict on the model in workload side and if it fails, we would put the worker in Blocking state instead of crashing with an error.

@PietroPasotti @mmkay what are your thoughts on this matter?

Copy link
Contributor Author

@michaeldmitry michaeldmitry Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't even need to validate against the pydantic model.
Tempo itself can help us verify if the passed config file is correct or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my thought here is we should try to fail early and loudly.
Option A: validate once.
The coordinator generates a config, the pydantic model validates that it's correct. We put it in relation data.
Worker picks it up, puts it in the workload, workload breaks because the config is not valid.
Now we have to figure out that the workload is broken and why.

Option B: validate twice.
The coordinator generates a config, the pydantic model validates that it's correct. We put it in relation data.
Worker picks it up, attempts to validate it but it fails. Worker immediately sets blocked status and goes belly up, saying 'config invalid'.

Regardless of how we do the validation in the worker (whether it's by the built-in cli tool offered by tempo or by a copy of the same pydantic model hosted in the controller, that's up to us) I think we should.

Option C: validate thrice.
The coordinator generates a config, the pydantic model validates that it's correct. We put it in relation data.
Worker picks it up, attempts to validate it with (the same) pydantic model and it succeeds.
It attempts to validate it with the tempo cli tool and it fails.
This means that the pydantic model and the tempo image are out of sync: our models encode a config that's not supported in the tempo version we're running.

Option C feels a bit like overkill, we can guard against this scenario by integration-testing the worker and verify that the config generated by the pydantic models always passes the built-in config validator. So I'd say let's go for B.
As to how we validate, I think Pydantic models offers us the most flexibility, because in case we want to be backwards-compatible or support consecutive tempo versions, the worker could implement logic like so:

if ConfigModelV1.validate(config_from_reldata):
    unit.status = ActiveStatus("Please upgrade the tempo coordinator to v2 (rev > 1412)")
   conf = ConfigModelV2.from_v1(config_from_reldata) 
else:
    conf = ConfigModelV2(config_from_reldata)

and the upgrade path would be:

  • upgrade the workers
  • upgrade the coordinator

tests/integration/test_ingressed_tls.py Show resolved Hide resolved
Copy link
Collaborator

@PietroPasotti PietroPasotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impressive work!
I have a few suggestions about the API (class names, etc...) that are non-blocking
as for the commented-out modules, let's add an explanation of what's going on or let's remove them.

src/tempo_cluster.py Show resolved Hide resolved
src/tempo.py Outdated Show resolved Hide resolved
src/tempo.py Outdated Show resolved Hide resolved
src/tempo.py Outdated Show resolved Hide resolved
src/tempo.py Outdated Show resolved Hide resolved
tests/integration/test_integration.py Show resolved Hide resolved
tests/integration/test_scaling_monolithic.py Outdated Show resolved Hide resolved
tests/integration/test_scaling_monolithic.py Outdated Show resolved Hide resolved
tests/integration/test_scaling_monolithic.py Outdated Show resolved Hide resolved
tests/integration/test_scaling_monolithic.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@PietroPasotti PietroPasotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few possible typing issues with the models

src/tempo_config.py Outdated Show resolved Hide resolved
src/tempo_config.py Outdated Show resolved Hide resolved
src/tempo_config.py Outdated Show resolved Hide resolved
src/tempo_config.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@PietroPasotti PietroPasotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once the last couple of issues with the enum are fixed, I'm fine with merging :)
Thanks!

src/tempo_config.py Outdated Show resolved Hide resolved
src/tempo_config.py Show resolved Hide resolved
@michaeldmitry michaeldmitry merged commit 71847e4 into main Jul 1, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants