Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HorizontalPodAutoscaler for fluentd #339

Merged
merged 25 commits into from
Dec 18, 2019
Merged

Conversation

vsinghal13
Copy link
Contributor

@vsinghal13 vsinghal13 commented Dec 12, 2019

Description

This PR add the capability to autoscale fluentd based on CPU usage. Metrics Server is used to expose the metrics API . HPA uses the API to read the metrics and make a decision to autoscale.

Below is a graph to show the autoscaler behavior.

Two tests scenarios:

  1. Gradual increase in load: The load was increased in increments of 2 from 0 to 6 and then dropped to 2 and eventually 0.
  2. Immediate spike: Load goes from 0 to 10 in a short span of time.

In the first scenario, the autoscaler scales the number of replicas in multiples of 2 i.e. 1,2,4,8.
When the the load is dropped down to 0, autoscaler gradually drops with a much smaller step size and stabilizes at 3.

In the second scenario, the autoscaler again scales from 3 to 6 and then 10.

Screen Shot 2019-12-12 at 11 52 32 PM

Default value set to be "true".

Testing performed
  • ci/build.sh
  • Redeploy fluentd and fluentd-events pods
  • Confirm events, logs, and metrics are coming in

@@ -0,0 +1,17 @@
{{- if and .Values.sumologic.fluentd.autoscaling.enabled}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for and here? We're not checking two conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will fix it.

## Option to turn autoscaling on for fluentd and specify metrics for HPA.
autoscaling:
enabled: true
minReplicas: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to set minReplicas to 3 to match our current 3 replicas?

Copy link
Contributor Author

@vsinghal13 vsinghal13 Dec 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if that's necessary? I was thinking that we keep the min to be 1 and let the autoscaler do its job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's worth a discussion. Should get @frankreno 's opinion as well. Would 1 replica no longer be considered high availability (HA) since there could be data loss until the next replica is brought up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we have a min of 2 or 3 at least to prevent any loss of data while its spinning up or if it did scale down to 1 and the node died, it would have a fall back. This can be our default and we just expose it they values.yaml so customers can change if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will change it to 3 then.


## Configure metrics-server
## ref: https://github.com/helm/charts/blob/master/stable/metrics-server/values.yaml
metrics-server:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to teach our CI to generate metrics-server-overrides.yaml for the non-Helm installation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it will be required if we want to add the arguments.

metrics-server:
enabled: true
args:
- --kubelet-insecure-tls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see customers having an issue with using this in an insecure way. Is there any other way to run this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without using that the metrics server is unable to get metrics. Will look into it more if we can avoid this.
kubernetes-sigs/metrics-server#131

@samjsong samjsong mentioned this pull request Dec 13, 2019
5 tasks
Copy link
Contributor

@samjsong samjsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but this morning Gourav mentioned that in his experience Fluentd could sometimes drop records when it scaled down. Did you notice anything like that during your testing of the autoscaler?

@@ -164,6 +164,12 @@ sumologic:
fluentd:
## Option to specify the Fluentd buffer as file/memory.
buffer: "memory"
## Option to turn autoscaling on for fluentd and specify metrics for HPA.
autoscaling:
enabled: false

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vsinghal13 will autoscaling be OFF by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, by default it will be off as of now.

@vsinghal13 vsinghal13 merged commit 7fcb065 into master Dec 18, 2019
@vsinghal13 vsinghal13 deleted the vsinghal-fluentd-autoscaler branch December 18, 2019 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants