Skip to content

Testing and Troubleshooting

Artem Kladov edited this page Mar 22, 2023 · 2 revisions

How to test the Deckhouse version

When you commit changes to git, a new Deckhouse docker-image gets built.

The CI pipeline is configured to build relevant images based on each branch. Deckhouse image is available at dev-registry.deckhouse.io/sys/deckhouse-oss:<prNUM>. All you need to do to test Deckhouse from your PR is to change the image in the d8-system/deckhouse Deployment.

A copy of Deckhouse running in the cluster regularly checks if a new image is available in the docker registry (if there is a new digest for the same image tag). If the digest for the tag in the registry does not match the one for the image in the cluster, Deckhouse modifies its deployment manifest and shuts down. The new image is pulled from the registry when a new Deckhouse Pod is being created. So, if you already set an image for your PR in the d8-system/deckhouse Deployment, all you need to do after a new commit is to wait for the end of the build process and Deckhouse restarts.

Debugging

Addon-operator provides specialized commands to facilitate the troubleshooting process.

Run the following command to learn more about them:

kubectl -n d8-system exec deploy/deckhouse -- deckhouse-controller help

(or read the docs).

A script for getting all the necessary debugging information

Run the following script on a master node:

#!/bin/bash

# Prepare deckhouse info for debug
deckhouse_pod=$(kubectl -n d8-system  get pod -l app=deckhouse -o name)
deckhouse_address=$(kubectl -n d8-system  get pod -l app=deckhouse -o json | jq '.items[] | .status.podIP' -r)
deckhouse_debug_dir=$(mktemp -d)
debug_date=$(date +%s)

# Get deckhouse version
kubectl -n d8-system exec -ti ${deckhouse_pod} -- deckhouse-controller version > ${deckhouse_debug_dir}/version
# Get go trace
curl -s ${deckhouse_address}:9650/debug/pprof/trace?seconds=60 > ${deckhouse_debug_dir}/trace
# Get goroutine
curl -s ${deckhouse_address}:9650/debug/pprof/goroutine > ${deckhouse_debug_dir}/goroutine
# Get go heap
curl -s ${deckhouse_address}:9650/debug/pprof/heap > ${deckhouse_debug_dir}/heap
# Get process dump
curl -s ${deckhouse_address}:9650/debug/pprof/profile?seconds=60 > ${deckhouse_debug_dir}/profile
# Get process list
kubectl -n d8-system  exec -ti $deckhouse_pod -- ps auxfww > ${deckhouse_debug_dir}/ps_aux
# Get deckhouse log
kubectl -n d8-system  logs $deckhouse_pod  > ${deckhouse_debug_dir}/log
# Get deckhouse metrics
curl -s ${deckhouse_address}:9650/metrics > ${deckhouse_debug_dir}/metrics
# Get deckhouse queue
kubectl -n d8-system exec -ti ${deckhouse_pod} -- deckhouse-controller queue list > ${deckhouse_debug_dir}/queue_list
# Get modules values
mkdir ${deckhouse_debug_dir}/values
for module in $(kubectl -n d8-system exec -ti ${deckhouse_pod} -- helm list | grep -v NAME | awk '{print $1}'); do kubectl -n d8-system exec -ti ${deckhouse_pod} -- deckhouse-controller module values ${module} -o json > ${deckhouse_debug_dir}/values/${module}; done

# tar debug files
tar -czf /tmp/deckhouse_debug_${debug_date}.tar.gz ${deckhouse_debug_dir}
ls -lah /tmp/deckhouse_debug_${debug_date}.tar.gz

# Clear debug directory
rm -rf ${deckhouse_debug_dir}

This script runs for about 2.5 minutes and generates a .tar.gz file that you need to send to the Deckhouse developers.

Prometheus metrics

You can find a description and a list of available metrics here.

Browsing Deckhouse logs

Currently, al Deckhouse logs are displayed in the JSON format. Use jq to convert them into a viewable form (this tool is excellent at converting strings within a stream).

Examples

How to output logs for each module...
  • Colored:

    kubectl -n d8-system logs deploy/deckhouse -f | jq -r 'select(.module != null) | .color |= (if .level == "error" then 1 else 4 end) | "\(.time) \u001B[1;3\(.color)m[\(.level)]\u001B[0m\u001B[1;35m[\(.module)]\u001B[0m - \u001B[1;33m\(.msg)\u001B[0m"'
  • Monochrome version:

    kubectl -n d8-system logs deploy/deckhouse -f | jq -r 'select(.module != null) | "\(.time) [\(.level)][\(.module)] - \(.msg)"'
  • The specific module:

    kubectl -n d8-system logs deploy/deckhouse -f | jq -r --arg mod cloud-instance-manager 'select(.module == $mod) | "\(.time) [\(.level)][\(.module)] - \(.binding) - \(.msg)"'
How to output logs for each webhook...
  • Colored:

    kubectl -n d8-system logs deploy/deckhouse -f | jq -r 'select(.hook != null) | .color |= (if .level == "error" then 1 else 4 end) | "\(.time) \u001B[1;3\(.color)m[\(.level)]\u001B[0m\u001B[1;35m[\(.hook)]\u001B[0m - \(.binding) - \u001B[1;33m\(.msg)\u001B[0m"'
  • Monochrome version:

    kubectl -n d8-system logs deploy/deckhouse -f | jq -r 'select(.hook != null) | "\(.time) [\(.level)][\(.hook)] - \(.binding) - \(.msg)"'
  • The specific hook:

    kubectl -n d8-system logs deploy/deckhouse -f | jq -r --arg hook 402-ingress-nginx/hooks/ensure_crds 'select(.hook == $hook) | "\(.time) [\(.level)][\(.hook)] - \(.binding) - \(.msg)"'

Debugging hooks

  • You can insert the following into any place of any webhook: debug::breakpoint 127.0.0.1 4284 to make it wait until the connection to the specified port is established.
  • You can use telnet to connect to this port (telnet 127.0.0.1 4284). Any entered command will be evaluated based on the context in which the debug::breakpoint was set, and you will get its output.
  • It is best to start the debugging session by setting set +e so that the webhook does not exit at the first error.
  • Use the if expression for the debug::breakpoint if you need to debug a specific situation.
  • For local development, it is recommended to use the 0.0.0.0 address and 4284 port. In this case, you can telnet directly on the local machine and do not need to exec to the container.

How to reset Deckhouse to use the deckhouse ConfigMap

On updating to Deckhouse 1.41 migration to ModuleConfig resources may fail. To run this process again, you need to reset Deckhouse to use cm/deckhouse.

Script

# 1. Scale Deployment/deckhouse replicas to 0.
kubectl -n d8-system scale deploy/deckhouse --replicas=0

# 2. Edit Deployment/deckhouse, set env ADDON_OPERATOR_CONFIG_MAP to "deckhouse".
kubectl -n d8-system set env deploy/deckhouse ADDON_OPERATOR_CONFIGMAP_NAME=deckhouse

# 3. Disable validating webhook to allow removing created resources.
kubectl delete validatingwebhookconfiguration/deckhouse-config-webhook

# 4. Remove generated ConfigMap.
kubectl -n d8-system delete cm deckhouse-generated-config-do-not-edit

# 5. Remove all ModuleConfig resources.
kubectl delete moduleconfigs

# 6. (Optional) Change image name to run another release or PR.
kubectl -n d8-system set image deploy/deckhouse deckhouse=dev-registry.deckhouse.io/sys/deckhouse-oss:prXXXX

# 7. Scale Deployment/deckhouse replicas to 1 to run migration again.
kubectl -n d8-system scale deploy/deckhouse --replicas=1