Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Cluster Test failed #89

Open
kimlaberinto opened this issue Feb 1, 2022 · 2 comments
Open

CI Cluster Test failed #89

kimlaberinto opened this issue Feb 1, 2022 · 2 comments

Comments

@kimlaberinto
Copy link
Member

Not sure why this CI cluster test failed:

[ Info: Waiting for test-multi-addprocs job. This could take up to 4 minutes...
 Error from server (NotFound): pods "test-multi-addprocs-st7tc" not found
test-multi-addprocs: Error During Test at /home/runner/work/K8sClusterManagers.jl/K8sClusterManagers.jl/test/cluster.jl:279
  Test threw exception
  Expression: pod_phase(manager_pod) == "Succeeded"

https://github.com/beacon-biosignals/K8sClusterManagers.jl/runs/5027583871?check_suite_focus=true#step:9:140

@omus
Copy link
Member

omus commented Feb 2, 2022

Copying relevant logs here as GHA logs don't persist:

[ Info: Waiting for test-multi-addprocs job. This could take up to 4 minutes...
Error from server (NotFound): pods "test-multi-addprocs-st7tc" not found
test-multi-addprocs: Error During Test at /home/runner/work/K8sClusterManagers.jl/K8sClusterManagers.jl/test/cluster.jl:279
  Test threw exception
  Expression: pod_phase(manager_pod) == "Succeeded"
  failed process: Process(setenv(`/home/runner/.julia/artifacts/e549ab3a763d3b31e726aa6336c6dbb75ee90a05/bin/kubectl get pod/test-multi-addprocs-st7tc -o 'jsonpath={.status.phase}'`,["PATH=/home/runner/.julia/artifacts/e549ab3a763d3b31e726aa6336c6dbb75ee90a05/bin:/home/runner/work/_temp:/opt/hostedtoolcache/julia/1.7.1/x64/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/snap/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", "DOTNET_SKIP_FIRST_TIME_EXPERIENCE=1", "GITHUB_RUN_NUMBER=208", "GITHUB_REF_NAME=88/merge", "RUNNER_ARCH=X64", "PERFLOG_LOCATION_SETTING=RUNNER_PERFLOG", "LD_LIBRARY_PATH=/opt/hostedtoolcache/julia/1.7.1/x64/bin/../lib/julia:/opt/hostedtoolcache/julia/1.7.1/x64/bin/../lib", "K8S_CLUSTER_TESTS=true", "ACCEPT_EULA=Y", "ANT_HOME=/usr/share/ant", "RUNNER_USER=runner", "LEIN_HOME=/usr/local/lib/lein", "GITHUB_ACTOR=kimlaberinto", "ANDROID_NDK_LATEST_HOME=/usr/local/lib/android/sdk/ndk/23.1.7779620", "USER=runner", "CONDA=/usr/share/miniconda", "GITHUB_REF_PROTECTED=false", "GITHUB_SHA=b39d201f2b4c3780982f770e755e1c6c91503709", "JAVA_HOME=/usr/lib/jvm/temurin-11-jdk-amd64", "GITHUB_API_URL=https://api.github.com", "GITHUB_RUN_ATTEMPT=1", "GITHUB_ACTIONS=true", "VCPKG_INSTALLATION_ROOT=/usr/local/share/vcpkg", "MINIKUBE_HOME=/home/runner/work/_temp", "ANDROID_SDK_ROOT=/usr/local/lib/android/sdk", "SWIFT_PATH=/usr/share/swift/usr/bin", "GOROOT_1_17_X64=/opt/hostedtoolcache/go/1.17.6/x64", "GITHUB_ENV=/home/runner/work/_temp/_runner_file_commands/set_env_52151825-7529-4f10-9231-f2029174696c", "JAVA_HOME_17_X64=/usr/lib/jvm/temurin-17-jdk-amd64", "GITHUB_ACTION_PATH=/home/runner/work/_actions/julia-actions/julia-runtest/v1", "RUNNER_PERFLOG=/home/runner/perflog", "RUNNER_NAME=GitHub Actions 9", "GITHUB_RUN_ID=1780539670", "HOMEBREW_CELLAR=/home/linuxbrew/.linuxbrew/Cellar", "ImageOS=ubuntu20", "NVM_DIR=/home/runner/.nvm", "GITHUB_HEAD_REF=kpl/update-codecov", "GITHUB_RETENTION_DAYS=90", "GITHUB_SERVER_URL=https://github.com", "GITHUB_JOB=cluster-test", "DEBIAN_FRONTEND=noninteractive", "RUNNER_TRACKING_ID=github_ee352480-6154-44ca-8750-7f7c692fd5f1", "RUNNER_TOOL_CACHE=/opt/hostedtoolcache", "HOMEBREW_CLEANUP_PERIODIC_FULL_DAYS=3650", "AZURE_EXTENSION_DIR=/opt/az/azcliextensions", "HOMEBREW_NO_AUTO_UPDATE=1", "CHROMEWEBDRIVER=/usr/local/share/chrome_driver", "GITHUB_ACTION_REPOSITORY=", "GITHUB_WORKFLOW=CI", "GITHUB_ACTION=__julia-actions_julia-runtest", "HOME=/home/runner", "JAVA_HOME_8_X64=/usr/lib/jvm/temurin-8-jdk-amd64", "GITHUB_EVENT_PATH=/home/runner/work/_temp/_github_workflow/event.json", "K8S_CLUSTER_MANAGERS_TEST_IMAGE=k8s-cluster-managers:b39d201", "HOMEBREW_PREFIX=/home/linuxbrew/.linuxbrew", "SGX_AESM_ADDR=1", "GITHUB_REF=refs/pull/88/merge", "GITHUB_REPOSITORY=beacon-biosignals/K8sClusterManagers.jl", "INVOCATION_ID=3990f835d3004e2b87571c73a406a265", "ImageVersion=20220123.1", "LANG=C.UTF-8", "GITHUB_GRAPHQL_URL=https://api.github.com/graphql", "SHLVL=1", "DOTNET_MULTILEVEL_LOOKUP=0", "RUNNER_WORKSPACE=/home/runner/work/K8sClusterManagers.jl", "GITHUB_BASE_REF=main", "STATS_KEEPALIVE=false", "_=/opt/hostedtoolcache/julia/1.7.1/x64/bin/julia", "HOMEBREW_REPOSITORY=/home/linuxbrew/.linuxbrew/Homebrew", "GRADLE_HOME=/usr/share/gradle-7.3.3", "GITHUB_ACTION_REF=", "DEPLOYMENT_BASEPATH=/opt/runner", "PIPX_HOME=/opt/pipx", "ANDROID_NDK_ROOT=/usr/local/lib/android/sdk/ndk-bundle", "***", "GITHUB_WORKSPACE=/home/runner/work/K8sClusterManagers.jl/K8sClusterManagers.jl", "GRAALVM_11_ROOT=/usr/local/graalvm/graalvm-ce-java11-21.3.0", "XDG_CONFIG_HOME=/home/runner/.config", "ANDROID_HOME=/usr/local/lib/android/sdk", "CHROME_BIN=/usr/bin/google-chrome", "CI=true", "POWERSHELL_DISTRIBUTION_CHANNEL=GitHub-Actions-ubuntu20", "GECKOWEBDRIVER=/usr/local/share/gecko_driver", "GITHUB_PATH=/home/runner/work/_temp/_runner_file_commands/add_path_52151825-7529-4f10-9231-f2029174696c", "RUNNER_OS=Linux", "JOURNAL_STREAM=8:20833", "GITHUB_REF_TYPE=branch", "LEIN_JAR=/usr/local/lib/lein/self-installs/leiningen-2.9.8-standalone.jar", "JULIA_LOAD_PATH=@:/tmp/jl_RrxcF6", "BOOTSTRAP_HASKELL_NONINTERACTIVE=1", "PIPX_BIN_DIR=/opt/pipx_bin", "SELENIUM_JAR_PATH=/usr/share/java/selenium-server.jar", "JAVA_HOME_11_X64=/usr/lib/jvm/temurin-11-jdk-amd64", "RUNNER_TEMP=/home/runner/work/_temp", "GOROOT_1_16_X64=/opt/hostedtoolcache/go/1.16.13/x64", "GITHUB_REPOSITORY_OWNER=beacon-biosignals", "GITHUB_EVENT_NAME=pull_request", "DOTNET_NOLOGO=1", "GOROOT_1_15_X64=/opt/hostedtoolcache/go/1.15.15/x64", "OPENBLAS_MAIN_FREE=1", "ANDROID_NDK_HOME=/usr/local/lib/android/sdk/ndk-bundle", "AGENT_TOOLSDIRECTORY=/opt/hostedtoolcache"]), ProcessExited(1)) [1]
  
  Stacktrace:
   [1] pipeline_error
     @ ./process.jl:531 [inlined]
   [2] read(cmd::Cmd)
     @ Base ./process.jl:418
   [3] read(cmd::Cmd, #unused#::Type{String})
     @ Base ./process.jl:427
   [4] pod_phase(pod_name::SubString{String})
     @ Main ~/work/K8sClusterManagers.jl/K8sClusterManagers.jl/test/utils.jl:36
   [5] macro expansion
     @ /opt/hostedtoolcache/julia/1.7.1/x64/share/julia/stdlib/v1.7/Test/src/Test.jl:445 [inlined]
   [6] macro expansion
     @ ~/work/K8sClusterManagers.jl/K8sClusterManagers.jl/test/cluster.jl:271 [inlined]
   [7] macro expansion
     @ /opt/hostedtoolcache/julia/1.7.1/x64/share/julia/stdlib/v1.7/Test/src/Test.jl:1283 [inlined]
   [8] top-level scope
     @ ~/work/K8sClusterManagers.jl/K8sClusterManagers.jl/test/cluster.jl:235
Error from server (NotFound): jobs.batch "test-multi-addprocs" not found
[ Info: Describe job:
┌ Info: List pods for job test-multi-addprocs:
│ NAME                                     READY   STATUS      RESTARTS   AGE   JOB-NAME=TEST-MULTI-ADDPROCS
│ test-multi-addprocs-st7tc-worker-fnzvf   0/1     Completed   0          43s   
│ test-multi-addprocs-st7tc-worker-jwksk   0/1     Completed   0          28s   
└ test-success-slxr5-worker-jqvm7          0/1     Completed   0          95s   
[ Info: Manager pod "test-multi-addprocs-st7tc" not found
┌ Info: Describe worker 1/2 pod:
│ Name:         test-multi-addprocs-st7tc-worker-fnzvf
│ Namespace:    default
│ Priority:     0
│ Node:         minikube-m02/192.168.49.3
│ Start Time:   Tue, 01 Feb 2022 20:33:00 +0000
│ Labels:       manager=test-multi-addprocs-st7tc
│               worker-id=2
│ Annotations:  <none>
│ Status:       Succeeded
│ IP:           10.244.1.8
│ IPs:
│   IP:  10.244.1.8
│ Containers:
│   worker:
│     Container ID:  docker://25369b89560204270b609c7129b3c36111f5e124b09c10609d946879ce9c52c7
│     Image:         k8s-cluster-managers:b39d201
│     Image ID:      docker://sha256:a3f7dfa9c373b41e28bf6527e7b8801720aa125fa31db5b9cacb7d069eada486
│     Port:          <none>
│     Host Port:     <none>
│     Command:
│       /usr/local/julia/bin/julia
│       --worker=RL03XtNp463y3yuY
│     State:          Terminated
│       Reason:       Completed
│       Exit Code:    0
│       Started:      Tue, 01 Feb 2022 20:33:00 +0000
│       Finished:     Tue, 01 Feb 2022 20:33:29 +0000
│     Ready:          False
│     Restart Count:  0
│     Limits:
│       cpu:     500m
│       memory:  300Mi
│     Requests:
│       cpu:        500m
│       memory:     300Mi
│     Environment:  <none>
│     Mounts:
│       /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-w6rvq (ro)
│ Conditions:
│   Type              Status
│   Initialized       True 
│   Ready             False 
│   ContainersReady   False 
│   PodScheduled      True 
│ Volumes:
│   kube-api-access-w6rvq:
│     Type:                    Projected (a volume that contains injected data from multiple sources)
│     TokenExpirationSeconds:  3607
│     ConfigMapName:           kube-root-ca.crt
│     ConfigMapOptional:       <nil>
│     DownwardAPI:             true
│ QoS Class:                   Guaranteed
│ Node-Selectors:              <none>
│ Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
│                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
│ Events:
│   Type    Reason     Age   From               Message
│   ----    ------     ----  ----               -------
│   Normal  Scheduled  43s   default-scheduler  Successfully assigned default/test-multi-addprocs-st7tc-worker-fnzvf to minikube-m02
│   Normal  Pulled     43s   kubelet            Container image "k8s-cluster-managers:b39d201" already present on machine
│   Normal  Created    43s   kubelet            Created container worker
└   Normal  Started    43s   kubelet            Started container worker
┌ Info: Describe worker 2/2 pod:
│ Name:         test-multi-addprocs-st7tc-worker-jwksk
│ Namespace:    default
│ Priority:     0
│ Node:         minikube-m02/192.168.49.3
│ Start Time:   Tue, 01 Feb 2022 20:33:15 +0000
│ Labels:       manager=test-multi-addprocs-st7tc
│               worker-id=3
│ Annotations:  <none>
│ Status:       Succeeded
│ IP:           10.244.1.9
│ IPs:
│   IP:  10.244.1.9
│ Containers:
│   worker:
│     Container ID:  docker://ddbfb6efe6eaf5aac7fe6a2885e49d145248f02be166e3c83f78ce15936a72e5
│     Image:         k8s-cluster-managers:b39d201
│     Image ID:      docker://sha256:a3f7dfa9c373b41e28bf6527e7b8801720aa125fa31db5b9cacb7d069eada486
│     Port:          <none>
│     Host Port:     <none>
│     Command:
│       /usr/local/julia/bin/julia
│       --worker=RL03XtNp463y3yuY
│     State:          Terminated
│       Reason:       Completed
│       Exit Code:    0
│       Started:      Tue, 01 Feb 2022 20:33:16 +0000
│       Finished:     Tue, 01 Feb 2022 20:33:29 +0000
│     Ready:          False
│     Restart Count:  0
│     Limits:
│       cpu:     500m
│       memory:  300Mi
│     Requests:
│       cpu:        500m
│       memory:     300Mi
│     Environment:  <none>
│     Mounts:
│       /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nlpwr (ro)
│ Conditions:
│   Type              Status
│   Initialized       True 
│   Ready             False 
│   ContainersReady   False 
│   PodScheduled      True 
│ Volumes:
│   kube-api-access-nlpwr:
│     Type:                    Projected (a volume that contains injected data from multiple sources)
│     TokenExpirationSeconds:  3607
│     ConfigMapName:           kube-root-ca.crt
│     ConfigMapOptional:       <nil>
│     DownwardAPI:             true
│ QoS Class:                   Guaranteed
│ Node-Selectors:              <none>
│ Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
│                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
│ Events:
│   Type    Reason     Age   From               Message
│   ----    ------     ----  ----               -------
│   Normal  Scheduled  28s   default-scheduler  Successfully assigned default/test-multi-addprocs-st7tc-worker-jwksk to minikube-m02
│   Normal  Pulled     27s   kubelet            Container image "k8s-cluster-managers:b39d201" already present on machine
│   Normal  Created    27s   kubelet            Created container worker
└   Normal  Started    27s   kubelet            Started container worker
[ Info: No logs for manager (test-multi-addprocs-st7tc)
┌ Info: Logs for worker 1/2 (test-multi-addprocs-st7tc-worker-fnzvf):
└ julia_worker:9001#10.244.1.8
┌ Info: Logs for worker 2/2 (test-multi-addprocs-st7tc-worker-jwksk):
└ julia_worker:9001#10.244.1.9

@omus
Copy link
Member

omus commented Feb 2, 2022

Appears the manager job was terminated and removed before debugging information could be rendered. Probably means we want to adjust some TTL settings so this can be debugged further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants