Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple metrics support for paasta #3837

Merged
merged 3 commits into from
May 6, 2024

Conversation

drmorr0
Copy link
Contributor

@drmorr0 drmorr0 commented Apr 23, 2024

This change introduces the ability to add multiple metrics configs to an HPA via paasta. We transform the existing AutoscalingParamsDict into a new version that looks like this:

metrics_providers:
  - type: cpu
    setpoint: 0.8
  - type: active-requests
    desired_requests_per_replica: 10
scaledown_policies: ...
max_instances_alert_threshold: ...

In this PR, we support both the old and the new ways of doing things, and we transform the old autoscaling params dict into the new version when we read the YAML files. After this PR is shipped, we will do a mass-refactor of yelpsoa into the new format and remove the "transformation" code so that only the new format is supported.

The first commit shows the bulk of the changes, and the second commit shows additional validation code and tests that are performed.

Testing done

  • make test/tox -e mypy passes (with new tests added)
  • paasta validate passes on all existing yelpsoa instances
  • paasta validate passes on yelpsoa instances using the new format (just tested locally, there's none of these in production yet)
  • setup_prometheus_adapter_config diffs in pnw-prod and nova-prod are empty
  • confirmed that the before/after instance configs and the prometheus adapter configs for all of yelpsoa are identical (part of the testing in https://github.yelpcorp.com/sysgit/yelpsoa-configs/pull/43847)

Open questions

  • What to do if someone's using active-requests in check_autoscaler_max_instances? (link1, link2)

cc @sclg-yelp

@drmorr0 drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch from 39bdd21 to 4040c65 Compare April 23, 2024 17:00
Copy link
Member

@nemacysts nemacysts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approach seems sensible to me - i mostly just have a couple questions and some non-blocking/ignorable suggestions/comments

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm 75% certain that this file is dead code from the mesos days and we can ignore completely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also 75% certain that none of this is needed. But if I don't make these changes then it will fail mypy checks. I am 50% tempted to just submit a PR to delete this file entirely but it seems like one of those things that is maybe still called somewhere in some arcane way that's hard to suss out.

paasta_tools/cli/cmds/validate.py Outdated Show resolved Hide resolved
paasta_tools/cli/cmds/validate.py Outdated Show resolved Hide resolved
paasta_tools/cli/cmds/validate.py Outdated Show resolved Hide resolved
paasta_tools/cli/cmds/validate.py Outdated Show resolved Hide resolved

def get_autoscaling_max_instances_alert_threshold(self) -> float:
autoscaling_params = self.get_autoscaling_params()
return autoscaling_params.get(
"max_instances_alert_threshold", autoscaling_params["setpoint"]
# TODO this default doesn't make sense for metrics providers that don't use setpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EvanKrall what are your thoughts here?

i can see us resolving this in two ways:

  1. look at the autoscaling params and use the desired_active_requests_per_replica if present
  2. give up on desired_active_requests_per_replica and rename it to setpoint so that the code is neater" (at the cost of people sometimes getting confused as to what setpoint means :p)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is kind of odd that for active-requests we have some positive number 0-infinity, where for other providers setpoint is typically a number between 0-1 expressed as a ratio of some maximum resource utilization that's possible (100% worker util, 100% cpu, etc). We could maybe change active-requests to have two parameters: max_active_requests_per_replica (which would be some larger number than desired is currently) and setpoint. But that's probably just adding complexity where it isn't necessary.

With my branch we could have setpoint be a large number for active requests and 0-1 for other providers, and then max_instances_alert_threshold for each of those would also follow the same rules.

paasta_tools/long_running_service_tools.py Show resolved Hide resolved
paasta_tools/long_running_service_tools.py Outdated Show resolved Hide resolved
paasta_tools/kubernetes_tools.py Outdated Show resolved Hide resolved
@@ -921,7 +933,7 @@ def get_autoscaling_metric_spec(
hpa = V2beta2HorizontalPodAutoscaler(
kind="HorizontalPodAutoscaler",
metadata=V1ObjectMeta(
name=name, namespace=namespace, annotations=annotations, labels=labels
name=name, namespace=namespace, annotations=dict(), labels=labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: heh, i was confused by this diff until i read the previous code and realized (as you did) that we never wrote to annotations 🤣

@drmorr0 drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch 2 times, most recently from 5a2cc07 to b70b0b5 Compare April 23, 2024 20:13
@drmorr0 drmorr0 force-pushed the drmorr/update-validation branch 3 times, most recently from c8d79cc to c55635f Compare April 24, 2024 19:09
@drmorr0 drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch 3 times, most recently from c72053d to 9f09517 Compare April 25, 2024 04:43
@drmorr0 drmorr0 force-pushed the drmorr/update-validation branch 2 times, most recently from 4abc69d to ce7b716 Compare April 29, 2024 19:48
@drmorr0 drmorr0 force-pushed the drmorr/update-validation branch 2 times, most recently from 1588e74 to 0c76f8f Compare April 29, 2024 20:08
@drmorr0 drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch from 9f09517 to 41135e6 Compare April 29, 2024 22:27
@drmorr0 drmorr0 changed the base branch from drmorr/update-validation to master April 29, 2024 22:27
@drmorr0 drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch 2 times, most recently from 2cfd2d9 to cb955b9 Compare April 30, 2024 02:26
@drmorr0 drmorr0 changed the title [WIP] multiple metrics support for paasta multiple metrics support for paasta Apr 30, 2024
@drmorr0 drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch from cb955b9 to 5e368dc Compare April 30, 2024 22:50
"metrics_providers"
]

# TODO: this doesn't work for metrics_providers that don't use setpoint (e.g. active-requests)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Krall should we add a conditional that exits early if the metrics_provider doesn't use setpoint or should we add some special-casing here to treat the desired_active_requests field as setpoint?

paasta_tools/cli/cmds/validate.py Outdated Show resolved Hide resolved
paasta_tools/cli/cmds/validate.py Outdated Show resolved Hide resolved
paasta_tools/long_running_service_tools.py Outdated Show resolved Hide resolved
@drmorr0 drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch from 5e368dc to 8000f7f Compare May 3, 2024 16:05
@drmorr0 drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch from 8000f7f to 75452ee Compare May 3, 2024 16:14
@drmorr0 drmorr0 merged commit e70b0ba into master May 6, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants