Skip to content

Commit

Permalink
Re-introduce DPCR Loki logging for GCP and Azure clusters (openshift#…
Browse files Browse the repository at this point in the history
…39064)

* Revert "Revert "Enable DPCR Loki for specific set of jobs (openshift#38914)""

This reverts commit 2b7a44f.

* Stop sending audit logs to loki

Initial tests shows about 1.9 million log lines sent to loki for a
single job run. 1 million of them were audit logs, so this will
eliminate almost half our logging load by itself.

Remove unused mounts for the audit logs

* Set resource requests on new promtail prod-bearer-token container

Will fail a test without this:

: [sig-arch] Managed cluster should set requests but not limits [Suite:openshift/conformance/parallel] expand_less
Run #0: Failed expand_less 	6s
{  fail [github.com/openshift/origin/test/extended/operators/resources.go:196]: May  5 09:10:34.626: Pods in platform namespaces are not following resource request/limit rules or do not have an exception granted:
  apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token does not have a cpu request (rule: "apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token/request[cpu]")
  apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token does not have a memory request (rule: "apps/v1/DaemonSet/openshift-e2e-loki/loki-promtail/container/prod-bearer-token/request[memory]")

* Enable loki logging for all azure jobs
  • Loading branch information
dgoodwin authored and ascerra committed May 8, 2023
1 parent e4d56ae commit 0763bbe
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 65 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,27 @@ then
exit 0
fi

# Temporarily limit the number of jobs we're going to ingest into loki while we test and scale the new instance:
if [[ $JOB_NAME != "periodic-ci-openshift-multiarch-master-nightly-4.13-ocp-e2e-aws-ovn-arm64-single-node" ]] \
&& [[ $JOB_NAME != "periodic-ci-openshift-release-master-nightly-4.13-e2e-vsphere-8-ovn" ]] \
&& [[ ! "$JOB_NAME" =~ .*gcp.* ]] \
&& [[ ! "$JOB_NAME" =~ .*azure.* ]] \
&& [[ $JOB_NAME != "periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-aws-ovn-arm64-single-node" ]] \
&& [[ $JOB_NAME != "periodic-ci-openshift-multiarch-master-nightly-4.13-ocp-e2e-ovn-remote-libvirt-s390x" ]] \
&& [[ $JOB_NAME != "periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-ovn-remote-libvirt-s390x" ]]; then
echo "This job is not on the list of supported jobs we're testing for the new loki, skipping..."
exit 0
fi

export PROMTAIL_IMAGE="quay.io/openshift-cr/promtail"
export PROMTAIL_VERSION="v2.4.1"
export LOKI_ENDPOINT=https://observatorium-mst.api.stage.openshift.com/api/logs/v1/dptp/loki/api/v1
# openshift-trt taken from the tenants list in the LokiStack CR on DPCR:
export LOKI_ENDPOINT=https://logging-loki-openshift-operators-redhat.apps.cr.j7t7.p1.openshiftapps.com/api/logs/v1/openshift-trt/loki/api/v1

# TODO: may be deprecated, moved to: https://github.com/resmoio/kubernetes-event-exporter
export KUBERNETES_EVENT_EXPORTER_IMAGE="ghcr.io/opsgenie/kubernetes-event-exporter"
export KUBERNETES_EVENT_EXPORTER_VERSION="v0.11"

GRAFANACLOUND_USERNAME=$(cat /var/run/loki-grafanacloud-secret/client-id)
export OPENSHIFT_INSTALL_INVOKER="openshift-internal-ci/${JOB_NAME}/${BUILD_ID}"

cat >> "${SHARED_DIR}/manifest_01_ns.yml" << EOF
Expand Down Expand Up @@ -98,11 +112,9 @@ data:
min_period: 1s
batchsize: 102400
batchwait: 10s
basic_auth:
username: ${GRAFANACLOUND_USERNAME}
password_file: /etc/promtail-grafanacom-secrets/password
bearer_token_file: /tmp/shared/prod_bearer_token
timeout: 10s
url: https://logs-prod3.grafana.net/api/prom/push
url: ${LOKI_ENDPOINT}/push
positions:
filename: "/run/promtail/positions.yaml"
scrape_configs:
Expand Down Expand Up @@ -237,27 +249,6 @@ data:
relabel_configs:
- action: labelmap
regex: __journal__(.+)
- job_name: kubeapi-audit
static_configs:
- targets:
- localhost
labels:
audit: kube-apiserver
__path__: /var/log/kube-apiserver/audit.log
- job_name: openshift-apiserver
static_configs:
- targets:
- localhost
labels:
audit: openshift-apiserver
__path__: /var/log/openshift-apiserver/audit.log
- job_name: oauth-apiserver-audit
static_configs:
- targets:
- localhost
labels:
audit: oauth-apiserver
__path__: /var/log/oauth-apiserver/audit.log
- job_name: events
kubernetes_sd_configs:
- role: pod
Expand Down Expand Up @@ -290,24 +281,17 @@ data:
target_config:
sync_period: 10s
EOF

cat >> "${SHARED_DIR}/manifest_creds.yml" << EOF
apiVersion: v1
kind: Secret
metadata:
name: promtail-creds
name: promtail-prod-creds
namespace: openshift-e2e-loki
data:
client-id: "$(cat /var/run/loki-secret/client-id | base64 -w 0)"
client-secret: "$(cat /var/run/loki-secret/client-secret | base64 -w 0)"
EOF
cat >> "${SHARED_DIR}/manifest_grafanacom_creds.yml" << EOF
apiVersion: v1
kind: Secret
metadata:
name: promtail-grafanacom-creds
namespace: openshift-e2e-loki
data:
password: "$(cat /var/run/loki-grafanacloud-secret/client-secret | base64 -w 0)"
audience: "$(cat /var/run/loki-secret/audience | base64 -w 0)"
EOF
cat >> "${SHARED_DIR}/manifest_ds.yml" << EOF
apiVersion: apps/v1
Expand Down Expand Up @@ -381,8 +365,6 @@ spec:
volumeMounts:
- mountPath: "/etc/promtail"
name: config
- mountPath: "/etc/promtail-grafanacom-secrets"
name: grafanacom-secrets
- mountPath: "/run/promtail"
name: run
- mountPath: "/var/lib/docker/containers"
Expand All @@ -391,18 +373,11 @@ spec:
- mountPath: "/var/log/pods"
name: pods
readOnly: true
- mountPath: "/var/log/kube-apiserver"
name: auditlogs-kube-apiserver
readOnly: true
- mountPath: "/var/log/openshift-apiserver"
name: auditlogs-openshift-apiserver
readOnly: true
- mountPath: "/var/log/oauth-apiserver"
name: auditlogs-oauth-apiserver
readOnly: true
- mountPath: "/var/log/journal"
name: journal
readOnly: true
- mountPath: "/tmp/shared"
name: shared-data
- args:
- --https-address=:9001
- --provider=openshift
Expand Down Expand Up @@ -431,6 +406,38 @@ spec:
name: proxy-tls
- mountPath: /etc/tls/cookie-secret
name: cookie-secret
- name: prod-bearer-token
resources:
requests:
cpu: 10m
memory: 20Mi
args:
- --oidc.audience=\$(AUDIENCE)
- --oidc.client-id=\$(CLIENT_ID)
- --oidc.client-secret=\$(CLIENT_SECRET)
- --oidc.issuer-url=https://sso.redhat.com/auth/realms/redhat-external
- --margin=10m
- --file=/tmp/shared/prod_bearer_token
env:
- name: CLIENT_ID
valueFrom:
secretKeyRef:
name: promtail-prod-creds
key: client-id
- name: CLIENT_SECRET
valueFrom:
secretKeyRef:
name: promtail-prod-creds
key: client-secret
- name: AUDIENCE
valueFrom:
secretKeyRef:
name: promtail-prod-creds
key: audience
volumeMounts:
- mountPath: "/tmp/shared"
name: shared-data
image: quay.io/observatorium/token-refresher
serviceAccountName: loki-promtail
terminationGracePeriodSeconds: 180
tolerations:
Expand All @@ -440,9 +447,6 @@ spec:
- configMap:
name: loki-promtail
name: config
- secret:
secretName: promtail-grafanacom-creds
name: grafanacom-secrets
- hostPath:
path: "/run/promtail"
name: run
Expand All @@ -455,15 +459,6 @@ spec:
- hostPath:
path: "/var/log/journal"
name: journal
- hostPath:
path: "/var/log/kube-apiserver"
name: auditlogs-kube-apiserver
- hostPath:
path: "/var/log/openshift-apiserver"
name: auditlogs-openshift-apiserver
- hostPath:
path: "/var/log/oauth-apiserver"
name: auditlogs-oauth-apiserver
- name: proxy-tls
secret:
defaultMode: 420
Expand All @@ -472,6 +467,8 @@ spec:
secret:
defaultMode: 420
secretName: cookie-secret
- name: shared-data
emptyDir: {}
updateStrategy:
type: RollingUpdate
rollingUpdate:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,8 @@ ref:
memory: 100Mi
credentials:
- namespace: test-credentials
name: loki-stage-collector-test-secret
name: loki-prod-collector-test-secret
mount_path: /var/run/loki-secret
- namespace: test-credentials
name: loki-grafanacloud-secret
mount_path: /var/run/loki-grafanacloud-secret
env:
- name: LOKI_USE_SERVICEMONITOR
default: 'true'
Expand All @@ -23,4 +20,4 @@ ref:
documentation: |-
Configure the cluster for use with Loki.
documentation: |-
The IPI install loki step deploys logging solution for collecting container logs and sending those to Loki hosted on Grafana Cloud.
The IPI install loki step deploys logging solution for collecting container logs and sending those to Loki hosted on the DPCR cluster maintained by the Continuous Release Tooling and Technical Release teams.

0 comments on commit 0763bbe

Please sign in to comment.