Skip to content
This repository has been archived by the owner on Feb 5, 2021. It is now read-only.

Sourcecategories in Sumo appending numbers #115

Open
jeffwroblewski opened this issue Feb 13, 2019 · 12 comments
Open

Sourcecategories in Sumo appending numbers #115

jeffwroblewski opened this issue Feb 13, 2019 · 12 comments

Comments

@jeffwroblewski
Copy link

For 2.1 and beyond, user is seeing an issue where sourecategories are populating like this:

app_prod/some_app_name
app_prod/some_app_name_62
app_prod/some_app_name 63...

Can provide more info offline as needed.

Thanks!
Jeff W.
TAM, Sumo

@frankreno
Copy link
Contributor

cc @bendrucker : ill look into this as soon as I can, but seems likely related to the fix for #78

@bendrucker
Copy link
Contributor

What version of Kubernetes are you running?

@andrews32
Copy link

We're running OpenShift v3.3 which includes Kubernetes v1.3.

@bendrucker
Copy link
Contributor

Gotcha, seems like there's probably no test coverage for that pod name format anymore. I can look into it in a few.

@andrews32
Copy link

andrews32 commented Feb 13, 2019

We are in the middle of upgrading to OpenShift v3.9 which includes Kubernetes v1.9, but the symbolic links where I believe the code retrieves the pod name from is the same format as in K8S v1.3.

For example, the first one is docker-registry-2-mqe0f... Where docker-registry is the pod_name, 2 is the deployment config counter, and mqe0f is the hash.

The problem is its inconsistent on what is retrieves as the _sourceCategory. Sometimes it'll be "docker-registry", sometimes it'll be "docker-registry-2".

[root@infra01-devtest-vxbyr ~]# ls /var/log/containers/ docker-registry-2-mqe0f_default_POD-9171d6915e911a532fb6048191e9713ed36a14ccd1a9057624ece298f08b350a.log docker-registry-2-mqe0f_default_registry-6c077b90f9592770d1e63b0444551d331167551e035f3d25b5922d0b4ec05325.log hawkular-cassandra-1-bd5m3_openshift-infra_hawkular-cassandra-1-86f7ee9fddbff12f935a88b0b54e7b46e82a73657000ff41b209076ed7fcc657.log hawkular-cassandra-1-bd5m3_openshift-infra_POD-a6d27b85d71c4e0c56e600b8d3666e39da8d360515c75d20282318b56f50be47.log hawkular-metrics-5ovsm_openshift-infra_hawkular-metrics-e70c33b6df41717ad12ccfc1d55b462603ac6a79adbae45c7cad0d363bfccd74.log hawkular-metrics-5ovsm_openshift-infra_POD-02596538d07c076a7c2447bd364981296ed2584aea29e1158eca643d78359953.log registry-console-1-6iu26_default_POD-7ce336ef1bb0442e3d93c642c7b63c523e23adf924e4ed1f4f26bd7db6e17c64.log registry-console-1-6iu26_default_registry-console-a3b0ad6a98b33001ef205bd1bb83d027d19d67013f701d66ec05183314a04e3c.log router-25-2gpi9_default_POD-ec02bb74c544cc201fea714ce52b3f7bef9adb7d6c47b03e7b63df4cb8df6819.log router-25-2gpi9_default_router-6cea4aef80d33705c275d19392de0608accec18243e7ef2a7a773103735ee510.log

@bendrucker
Copy link
Contributor

bendrucker commented Feb 13, 2019

So the actual pod name is docker-registry-2-mqe0f if I'm reading right? Would be a huge help if you could get an entire pod (kubectl get pod <name> -o yaml) for confirmation.

@bendrucker
Copy link
Contributor

I suspect the regression that's affecting you from #78 has to do with the pod template hash. Rather than hardcode error-prone patterns based on string formatting (i.e. strip this part if it's numbers), we switched to actually detecting the pod template hash and deterministically stripping the dynamic parts. I'm trying to get a 1.3 cluster up on minikube but in case that doesn't seem viable so a full pod from your cluster would be helpful.

@andrews32
Copy link

`[svc-vxby-ose@master01-devtest-vxbyr ~]$ oc get po docker-registry-3-9n3bx -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/created-by: |
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"docker-registry-3","uid":"3bf3e47f-fad7-11e8-8df7-005056848c95","apiVersion":"v1","resourceVersion":"1205030190"}}
openshift.io/deployment-config.latest-version: "3"
openshift.io/deployment-config.name: docker-registry
openshift.io/deployment.name: docker-registry-3
openshift.io/scc: restricted
creationTimestamp: 2018-12-08T10:52:07Z
generateName: docker-registry-3-
labels:
deployment: docker-registry-3
deploymentconfig: docker-registry
docker-registry: default
name: docker-registry-3-9n3bx
namespace: default
resourceVersion: "1205031022"
selfLink: /api/v1/namespaces/default/pods/docker-registry-3-9n3bx
uid: 4e5362ec-fad7-11e8-8df7-005056848c95
spec:
containers:

  • env:
    • name: REGISTRY_HTTP_ADDR
      value: :5000
    • name: REGISTRY_HTTP_NET
      value: tcp
    • name: REGISTRY_HTTP_SECRET
      value: vN4FVfWmHghp7shKhjZadrA6HLg+9FAPqEORak7+VFQ=
    • name: REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA
      value: "false"
    • name: REGISTRY_HTTP_TLS_CERTIFICATE
      value: /etc/secrets/registry.crt
    • name: REGISTRY_HTTP_TLS_KEY
      value: /etc/secrets/registry.key
      image: openshift3/ose-docker-registry:v3.3.1.46.45
      imagePullPolicy: IfNotPresent
      livenessProbe:
      failureThreshold: 3
      httpGet:
      path: /healthz
      port: 5000
      scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
      name: registry
      ports:
    • containerPort: 5000
      protocol: TCP
      readinessProbe:
      failureThreshold: 3
      httpGet:
      path: /healthz
      port: 5000
      scheme: HTTPS
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
      resources:
      requests:
      cpu: 100m
      memory: 256Mi
      securityContext:
      capabilities:
      drop:
      • KILL
      • MKNOD
      • SETGID
      • SETUID
      • SYS_CHROOT
        privileged: false
        runAsUser: 1000000000
        seLinuxOptions:
        level: s0:c1,c0
        terminationMessagePath: /dev/termination-log
        volumeMounts:
    • mountPath: /registry
      name: registry-storage
    • mountPath: /etc/secrets
      name: volume-zx4oi
    • mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: registry-token-sbvmg
      readOnly: true
      dnsPolicy: ClusterFirst
      host: infra04-devtest-vxbyr.xxxx.com
      imagePullSecrets:
  • name: registry-dockercfg-8xfow
    nodeName: infra04-devtest-vxbyr.xxxx.com
    nodeSelector:
    region: infra
    restartPolicy: Always
    securityContext:
    fsGroup: 1000000000
    seLinuxOptions:
    level: s0:c1,c0
    serviceAccount: registry
    serviceAccountName: registry
    terminationGracePeriodSeconds: 30
    volumes:
  • emptyDir: {}
    name: registry-storage
  • name: volume-zx4oi
    secret:
    secretName: registry-certificates
  • name: registry-token-sbvmg
    secret:
    secretName: registry-token-sbvmg
    status:
    conditions:
  • lastProbeTime: null
    lastTransitionTime: 2018-12-08T10:52:07Z
    status: "True"
    type: Initialized
  • lastProbeTime: null
    lastTransitionTime: 2018-12-08T10:52:37Z
    status: "True"
    type: Ready
  • lastProbeTime: null
    lastTransitionTime: 2018-12-08T10:52:07Z
    status: "True"
    type: PodScheduled
    containerStatuses:
  • containerID: docker://a1ce41ffcfcebd69e7f8887493db8e0e7636467862e62a63f9c6823f996fef2a
    image: openshift3/ose-docker-registry:v3.3.1.46.45
    imageID: docker-pullable://registry.access.redhat.com/openshift3/ose-docker-registry@sha256:7b429aa43daf2a2d63c968f685a1c42481055fb14dd68678467f8d0de94d89eb
    lastState: {}
    name: registry
    ready: true
    restartCount: 0
    state:
    running:
    startedAt: 2018-12-08T10:52:28Z
    hostIP: 10.224.210.31
    phase: Running
    podIP: 10.221.76.3
    startTime: 2018-12-08T10:52:07Z
    `

@bendrucker
Copy link
Contributor

From what you posted, the deployment name is docker-registry-3. This repo was meant to remove random sections included from Deployments/ReplicaSets, not necessarily any numeric ID. Seems like it was a bug that it matched/deleted part of your deployment name from the pod_name. You could consider using the open shift labels directly for your source categories.

@andrews32
Copy link

I'm guessing deployment name wasn't always where the _sourceCategory got his values from. This is new behavior.

Also, as Frank mentiond above, #78 was only fixed/closed in December 2018 which matches the first reports of this new behavior.

What changed in #78 and why? How do we undo it without manually using an old version that will not be maintained?

@bendrucker
Copy link
Contributor

I'm guessing deployment name wasn't always where the _sourceCategory got his values from.

I don't see any reason to assume that

#78 was closed by #100. #78 identified bugs in the original naive implementation of replica pod sanitization. The original implementation would remove the second to last segment of the pod name if it were a number. This is unnecessarily naive.

This numeric value was the pod template hash which is included as a label on the pods. In later versions of k8s, that numeric value was mapped to an alphanumeric encoding, breaking the naive name sanitization. #100 takes the template hash, looks for the numeric or alphanumeric version in the pod name, and removes that segment by exact match. Anything else you do to your pods, including numbers, is left behind.

This feature was meant to target Kubernetes ReplicaSets and this plugin was stripping bits of your pod name due to a bug. It sucks, but sometimes bug fixes are breaking changes if you were depending on buggy behavior.

I made some suggestions above on how to provide a specific metadata template with labels—that would let you define conventions that match your stack. I don't think it would be a good idea to re-introduce behavior that parses pod name conventions outside of what's present in k8s core.

@andrews32
Copy link

I was looking for something else and came across this ticket and noticed it was still open. I think we can close this now. Thanks for explaining it @bendrucker.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants