Skip to content

Commit

Permalink
feat(metrics/collector): adjust resources and autoscaling
Browse files Browse the repository at this point in the history
This is based on internal dogfooding. The collector is, in general, is
heavily memory-bound, much like Prometheus.
  • Loading branch information
swiatekm-sumo committed Aug 21, 2023
1 parent 692d97c commit 817fe7e
Show file tree
Hide file tree
Showing 15 changed files with 66 additions and 21 deletions.
1 change: 1 addition & 0 deletions .changelog/3219.changed.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
feat(metrics/collector): adjust resources and autoscaling
1 change: 1 addition & 0 deletions .changelog/3221.added.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
feat(metrics/collector): support remote write proxy
6 changes: 3 additions & 3 deletions deploy/helm/sumologic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,13 +117,13 @@ The following table lists the configurable parameters of the Sumo Logic chart an
| `sumologic.metrics.serviceMonitors` | Configuration of Sumo Logic Kubernetes Collection components serviceMonitors | See [values.yaml] |
| `sumologic.metrics.collector.otelcol.enabled` | Enable experimental otelcol metrics collector | See [values.yaml] |
| `sumologic.metrics.collector.otelcol.scrapeInterval` | The default scrape interval for the collector. | `30s` |
| `sumologic.metrics.collector.otelcol.replicaCount` | Replica count for the experimental otelcol metrics collector | `3` |
| `sumologic.metrics.collector.otelcol.replicaCount` | Replica count for the experimental otelcol metrics collector | `1` |
| `sumologic.metrics.collector.otelcol.resources` | Resource requests and limits for the experimental otelcol metrics collector | See [values.yaml] |
| `sumologic.metrics.collector.otelcol.autoscaling.enabled` | Option to turn autoscaling on for the experimental otelcol metrics and specify params for HPA. Autoscaling needs metrics-server to access cpu metrics. collector | `false` |
| `sumologic.metrics.collector.otelcol.autoscaling.maxReplicas` | Default max replicas for autoscaling. collector | `10` |
| `sumologic.metrics.collector.otelcol.autoscaling.minReplicas` | Default min replicas for autoscaling. collector | `3` |
| `sumologic.metrics.collector.otelcol.autoscaling.targetCPUUtilizationPercentage` | The desired target CPU utilization for autoscaling. | `100` |
| `sumologic.metrics.collector.otelcol.autoscaling.targetMemoryUtilizationPercentage` | The desired target memory utilization for autoscaling. | `50` |
| `sumologic.metrics.collector.otelcol.autoscaling.targetCPUUtilizationPercentage` | The desired target CPU utilization for autoscaling. | `70` |
| `sumologic.metrics.collector.otelcol.autoscaling.targetMemoryUtilizationPercentage` | The desired target memory utilization for autoscaling. | `70` |
| `sumologic.metrics.collector.otelcol.serviceMonitorSelector` | Selector for ServiceMonitors used for target discovery. By default, we select ServiceMonitors created by the Chart. See: https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocatorprometheuscr | `Nil` |
| `sumologic.metrics.collector.otelcol.podMonitorSelector` | Selector for PodMonitors used for target discovery. By default, we select PodMonitors created by the Chart. See: https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocatorprometheuscr | `Nil` |
| `sumologic.metrics.collector.otelcol.nodeSelector` | Node selector for the experimental otelcol metrics. [See docs/best-practices.md for more information.](/docs/best-practices.md). | `{}` |
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,29 @@
upstream remote {
upstream remote_prometheus {
server {{ template "sumologic.metadata.name.metrics.service" . }}:9888;
}

upstream remote_otel {
server {{ template "sumologic.metadata.name.metrics.service" . }}:4318;
}

server {
listen {{ .Values.sumologic.metrics.remoteWriteProxy.config.port }} default_server;
{{- if not .Values.sumologic.metrics.remoteWriteProxy.config.enableAccessLogs }}
access_log off;
{{- end }}
location / {
client_body_buffer_size {{ .Values.sumologic.metrics.remoteWriteProxy.config.clientBodyBufferSize }};
proxy_pass http://remote;
proxy_pass http://remote_prometheus;
}
}

server {
listen 4318 default_server;
{{- if not .Values.sumologic.metrics.remoteWriteProxy.config.enableAccessLogs }}
access_log off;
{{- end }}
location / {
client_body_buffer_size {{ .Values.sumologic.metrics.remoteWriteProxy.config.clientBodyBufferSize }};
proxy_pass http://remote_otel;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,10 @@ spec:
{{- end }}
env:
- name: METADATA_METRICS_SVC
value: {{ template "sumologic.metadata.name.metrics.service" . }} # no need for remote write proxy here
valueFrom:
configMapKeyRef:
name: sumologic-configmap
key: metadataMetrics
- name: NAMESPACE
valueFrom:
fieldRef:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ spec:
imagePullPolicy: {{ .Values.sumologic.metrics.remoteWriteProxy.image.pullPolicy }}
ports:
- containerPort: {{ .Values.sumologic.metrics.remoteWriteProxy.config.port }}
- containerPort: 4318
resources:
{{- toYaml .Values.sumologic.metrics.remoteWriteProxy.resources | nindent 10 }}
livenessProbe:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,12 @@ metadata:
{{- include "sumologic.labels.common" . | nindent 4 }}
spec:
ports:
- name: http
- name: prometheus
port: 9888
targetPort: {{ .Values.sumologic.metrics.remoteWriteProxy.config.port }}
- name: otel
port: 4318
targetPort: 4318
selector:
app: {{ template "sumologic.labels.app.remoteWriteProxy.pod" . }}
{{- end }}
10 changes: 5 additions & 5 deletions deploy/helm/sumologic/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -474,8 +474,8 @@ sumologic:
enabled: false
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 100
# targetMemoryUtilizationPercentage: 50
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 70

nodeSelector: {}

Expand All @@ -487,15 +487,15 @@ sumologic:
## Option to define priorityClassName to assign a priority class to pods.
priorityClassName:

replicaCount: 3
replicaCount: 1

resources:
limits:
memory: 1Gi
memory: 2Gi
cpu: 1000m
requests:
memory: 768Mi
cpu: 500m
cpu: 100m

## Selector for ServiceMonitors used for target discovery. By default, this selects resources created by this Chart.
## See https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#opentelemetrycollectorspectargetallocatorprometheuscr
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ metadata:
sumologic.com/scrape: "true"
spec:
mode: statefulset
replicas: 3
replicas: 1
serviceAccount: RELEASE-NAME-sumologic-metrics
targetAllocator:
serviceAccount: RELEASE-NAME-sumologic-metrics-targetallocator
Expand All @@ -29,7 +29,10 @@ spec:
release: RELEASE-NAME
env:
- name: METADATA_METRICS_SVC
value: RELEASE-NAME-sumologic-metadata-metrics # no need for remote write proxy here
valueFrom:
configMapKeyRef:
name: sumologic-configmap
key: metadataMetrics
- name: NAMESPACE
valueFrom:
fieldRef:
Expand All @@ -45,9 +48,9 @@ spec:
resources:
limits:
cpu: 1000m
memory: 1Gi
memory: 2Gi
requests:
cpu: 500m
cpu: 100m
memory: 768Mi
volumes:
- name: tmp
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ metadata:
podKey: podValue
spec:
mode: statefulset
replicas: 3
replicas: 1
serviceAccount: RELEASE-NAME-sumologic-metrics
targetAllocator:
serviceAccount: RELEASE-NAME-sumologic-metrics-targetallocator
Expand All @@ -45,7 +45,10 @@ spec:
targetMemoryUtilization: 90
env:
- name: METADATA_METRICS_SVC
value: RELEASE-NAME-sumologic-metadata-metrics # no need for remote write proxy here
valueFrom:
configMapKeyRef:
name: sumologic-configmap
key: metadataMetrics
- name: NAMESPACE
valueFrom:
fieldRef:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ spec:
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
- containerPort: 4318
resources:
limits:
cpu: 1000m
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ spec:
imagePullPolicy: Always
ports:
- containerPort: 80
- containerPort: 4318
resources:
limits:
cpu: 400m
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,26 @@ metadata:
heritage: "Helm"
data:
remote-write-proxy.conf: |
upstream remote {
upstream remote_prometheus {
server RELEASE-NAME-sumologic-metadata-metrics:9888;
}
upstream remote_otel {
server RELEASE-NAME-sumologic-metadata-metrics:4318;
}
server {
listen 80 default_server;
location / {
client_body_buffer_size 32k;
proxy_pass http://remote;
proxy_pass http://remote_prometheus;
}
}
server {
listen 4318 default_server;
location / {
client_body_buffer_size 32k;
proxy_pass http://remote_otel;
}
}
1 change: 1 addition & 0 deletions tests/integration/internal/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,7 @@ var (
"otelcol_otelsvc_k8s_ip_lookup_miss",
"otelcol_otelsvc_k8s_other_deleted",
"kube_pod_container_status_waiting_reason",
"kube_pod_container_status_terminated_reason",
// TODO: check different metrics depending on K8s version
// scheduler_scheduling_duration_seconds is present for K8s <1.23
// scheduler_scheduling_attempt_duration_seconds is present for K8s >=1.23
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/values/values_helm_ot_metrics.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ sumologic:
metrics:
enabled: true
remoteWriteProxy:
enabled: false
enabled: true
collector:
otelcol:
enabled: true
Expand Down

0 comments on commit 817fe7e

Please sign in to comment.