multiple metrics support for paasta #3837

drmorr0 · 2024-04-23T16:57:03Z

This change introduces the ability to add multiple metrics configs to an HPA via paasta. We transform the existing AutoscalingParamsDict into a new version that looks like this:

metrics_providers:
  - type: cpu
    setpoint: 0.8
  - type: active-requests
    desired_requests_per_replica: 10
scaledown_policies: ...
max_instances_alert_threshold: ...

In this PR, we support both the old and the new ways of doing things, and we transform the old autoscaling params dict into the new version when we read the YAML files. After this PR is shipped, we will do a mass-refactor of yelpsoa into the new format and remove the "transformation" code so that only the new format is supported.

The first commit shows the bulk of the changes, and the second commit shows additional validation code and tests that are performed.

Testing done

make test/tox -e mypy passes (with new tests added)
paasta validate passes on all existing yelpsoa instances
paasta validate passes on yelpsoa instances using the new format (just tested locally, there's none of these in production yet)
setup_prometheus_adapter_config diffs in pnw-prod and nova-prod are empty
confirmed that the before/after instance configs and the prometheus adapter configs for all of yelpsoa are identical (part of the testing in https://github.yelpcorp.com/sysgit/yelpsoa-configs/pull/43847)

Open questions

What to do if someone's using active-requests in check_autoscaler_max_instances? (link1, link2)

cc @sclg-yelp

nemacysts

approach seems sensible to me - i mostly just have a couple questions and some non-blocking/ignorable suggestions/comments

nemacysts · 2024-04-23T17:15:09Z

paasta_tools/autoscaling/autoscaling_service_lib.py

i'm 75% certain that this file is dead code from the mesos days and we can ignore completely

I am also 75% certain that none of this is needed. But if I don't make these changes then it will fail mypy checks. I am 50% tempted to just submit a PR to delete this file entirely but it seems like one of those things that is maybe still called somewhere in some arcane way that's hard to suss out.

paasta_tools/cli/cmds/validate.py

nemacysts · 2024-04-23T19:18:26Z

paasta_tools/long_running_service_tools.py


    def get_autoscaling_max_instances_alert_threshold(self) -> float:
        autoscaling_params = self.get_autoscaling_params()
        return autoscaling_params.get(
-            "max_instances_alert_threshold", autoscaling_params["setpoint"]
+            # TODO this default doesn't make sense for metrics providers that don't use setpoint


@EvanKrall what are your thoughts here?

i can see us resolving this in two ways:

look at the autoscaling params and use the desired_active_requests_per_replica if present

give up on desired_active_requests_per_replica and rename it to setpoint so that the code is neater" (at the cost of people sometimes getting confused as to what setpoint means :p)

It is kind of odd that for active-requests we have some positive number 0-infinity, where for other providers setpoint is typically a number between 0-1 expressed as a ratio of some maximum resource utilization that's possible (100% worker util, 100% cpu, etc). We could maybe change active-requests to have two parameters: max_active_requests_per_replica (which would be some larger number than desired is currently) and setpoint. But that's probably just adding complexity where it isn't necessary.

With my branch we could have setpoint be a large number for active requests and 0-1 for other providers, and then max_instances_alert_threshold for each of those would also follow the same rules.

paasta_tools/long_running_service_tools.py

paasta_tools/kubernetes_tools.py

nemacysts · 2024-04-23T19:28:20Z

paasta_tools/kubernetes_tools.py

@@ -921,7 +933,7 @@ def get_autoscaling_metric_spec(
        hpa = V2beta2HorizontalPodAutoscaler(
            kind="HorizontalPodAutoscaler",
            metadata=V1ObjectMeta(
-                name=name, namespace=namespace, annotations=annotations, labels=labels
+                name=name, namespace=namespace, annotations=dict(), labels=labels


comment: heh, i was confused by this diff until i read the previous code and realized (as you did) that we never wrote to annotations 🤣

nemacysts · 2024-05-02T16:11:39Z

paasta_tools/check_autoscaler_max_instances.py

+                    "metrics_providers"
+                ]
+
+                # TODO: this doesn't work for metrics_providers that don't use setpoint (e.g. active-requests)


@Krall should we add a conditional that exits early if the metrics_provider doesn't use setpoint or should we add some special-casing here to treat the desired_active_requests field as setpoint?

paasta_tools/cli/cmds/validate.py

paasta_tools/long_running_service_tools.py

paasta_tools/check_autoscaler_max_instances.py

drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch from 39bdd21 to 4040c65 Compare April 23, 2024 17:00

nemacysts reviewed Apr 23, 2024

View reviewed changes

drmorr0 force-pushed the drmorr/update-validation branch from 71edc6f to 90a7e7a Compare April 23, 2024 19:58

drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch 2 times, most recently from 5a2cc07 to b70b0b5 Compare April 23, 2024 20:13

drmorr0 force-pushed the drmorr/update-validation branch 3 times, most recently from c8d79cc to c55635f Compare April 24, 2024 19:09

drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch 3 times, most recently from c72053d to 9f09517 Compare April 25, 2024 04:43

drmorr0 force-pushed the drmorr/update-validation branch 2 times, most recently from 4abc69d to ce7b716 Compare April 29, 2024 19:48

drmorr0 mentioned this pull request Apr 29, 2024

update paasta validation code #3835

Merged

drmorr0 force-pushed the drmorr/update-validation branch 2 times, most recently from 1588e74 to 0c76f8f Compare April 29, 2024 20:08

drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch from 9f09517 to 41135e6 Compare April 29, 2024 22:27

drmorr0 changed the base branch from drmorr/update-validation to master April 29, 2024 22:27

drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch 2 times, most recently from 2cfd2d9 to cb955b9 Compare April 30, 2024 02:26

drmorr0 changed the title ~~[WIP] multiple metrics support for paasta~~ multiple metrics support for paasta Apr 30, 2024

drmorr0 requested review from EvanKrall and nemacysts April 30, 2024 03:21

drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch from cb955b9 to 5e368dc Compare April 30, 2024 22:50

nemacysts reviewed May 2, 2024

View reviewed changes

EvanKrall reviewed May 2, 2024

View reviewed changes

paasta_tools/check_autoscaler_max_instances.py Outdated Show resolved Hide resolved

drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch from 5e368dc to 8000f7f Compare May 3, 2024 16:05

drmorr0 added 2 commits May 3, 2024 09:13

implement new schema for autoscaling params

f14f749

add extra validation and testing

75452ee

drmorr0 force-pushed the drmorr/autoscaling-params-dict-v2 branch from 8000f7f to 75452ee Compare May 3, 2024 16:14

Move max_instances_alert_threshold to be per-metrics_provider

de297c2

drmorr0 mentioned this pull request May 3, 2024

Move max_instances_alert_threshold to be per-metrics_provider #3850

Closed

nemacysts approved these changes May 3, 2024

View reviewed changes

EvanKrall approved these changes May 6, 2024

View reviewed changes

drmorr0 merged commit e70b0ba into master May 6, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiple metrics support for paasta #3837

multiple metrics support for paasta #3837

drmorr0 commented Apr 23, 2024 •

edited

nemacysts left a comment

nemacysts Apr 23, 2024

drmorr0 Apr 23, 2024

nemacysts Apr 23, 2024

EvanKrall May 2, 2024

nemacysts Apr 23, 2024

nemacysts May 2, 2024

multiple metrics support for paasta #3837

multiple metrics support for paasta #3837

Conversation

drmorr0 commented Apr 23, 2024 • edited

Testing done

Open questions

nemacysts left a comment

Choose a reason for hiding this comment

nemacysts Apr 23, 2024

Choose a reason for hiding this comment

drmorr0 Apr 23, 2024

Choose a reason for hiding this comment

nemacysts Apr 23, 2024

Choose a reason for hiding this comment

EvanKrall May 2, 2024

Choose a reason for hiding this comment

nemacysts Apr 23, 2024

Choose a reason for hiding this comment

nemacysts May 2, 2024

Choose a reason for hiding this comment

drmorr0 commented Apr 23, 2024 •

edited