Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When deploying on air gapped environment gets stuck installing #72

Closed
gustavosr98 opened this issue Feb 27, 2024 · 3 comments
Closed

When deploying on air gapped environment gets stuck installing #72

gustavosr98 opened this issue Feb 27, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@gustavosr98
Copy link

Bug Description

Trying to deploy air gapped I am seing a couple issues with multiple charms

Now, AFAIU there seem to be at least some dependencies like this
kfp-metadata-writer -> mlmd -> envoy

And envoy shows stuck "installing" no much useful logs I could find

To Reproduce

# [..] Deploy other charms offline as per kubeflow-bundle repo

juju deploy --trust --debug ./envoy envoy --resource oci-image=10.10.11.39:32000/gcr.io/ml-pipeline/metadata-envoy:2.0.2

# [..] Add relations as per kubeflow-bundle repo

Environment

Kubeflow 1.8/stable
Microk8s 1.28-strict/stable
Juju 3.1.7/stable

Air Gapped

Relevant Log Output

# juju status | grep -v active
Model     Controller  Cloud/Region              Version  SLA          Timestamp
kubeflow  lxd-mgmt    microk8s-train/localhost  3.1.7    unsupported  06:04:11Z

App                        Version                         Status       Scale  Charm                    Channel  Rev  Address         Exposed  Message
envoy                                                      maintenance      1  envoy                               0                  no       installing charm software
istio-pilot                                                waiting          1  istio-pilot                         2  10.152.183.244  no       installing agent
kfp-metadata-writer                                        waiting          1  kfp-metadata-writer                 0  10.152.183.26   no       installing agent
kfp-profile-controller                                     waiting          1  kfp-profile-controller              0  10.152.183.27   no       installing agent
kfp-ui                                                     waiting          1  kfp-ui                              0  10.152.183.127  no       installing agent
mlmd                       .../tfx-oss-public/ml_metad...  waiting          1  mlmd                                0  10.152.183.212  no       List of <ops.model.Relation grpc:25> versions not found for apps: envoy
oidc-gatekeeper                                            waiting          1  oidc-gatekeeper                     0  10.152.183.84   no       installing agent

Unit                          Workload     Agent      Address       Ports          Message
envoy/0*                      maintenance  executing                               (leader-elected) installing charm software
istio-pilot/0*                waiting      idle       10.1.195.250                 Execution handled 1 errors.  See logs for details.
kfp-metadata-writer/0*        blocked      idle       10.1.195.245                 [relation:grpc] Expected data from exactly 1 related applications - got 0.
kfp-profile-controller/0*     maintenance  idle       10.1.195.197                 Reconciling charm: executing component container:kfp-profile-controller
kfp-ui/0*                     waiting      idle       10.1.195.231                 [container:ml-pipeline-ui] Waiting for Pebble services (ml-pipeline-ui).  If this persists, it could be a blocking co...
mlmd/0*                       waiting      idle       10.1.195.212  8080/TCP       List of <ops.model.Relation grpc:25> versions not found for apps: envoy
oidc-gatekeeper/0*            blocked      idle       10.1.195.225                 Failed to replan

# kk logs envoy-operator-0
Defaulted container "juju-operator" out of: juju-operator, juju-init (init)
2024-02-27 04:53:19 INFO juju.cmd supercommand.go:56 running jujud [3.1.7 0cd207d999fef1fc8b965c410e9f58fafe7ee335 gc go1.21.5]
2024-02-27 04:53:19 DEBUG juju.cmd supercommand.go:57   args: []string{"/var/lib/juju/tools/jujud", "caasoperator", "--application-name=envoy", "--debug"}
2024-02-27 04:53:19 DEBUG juju.agent agent.go:593 read agent config, format "2.0"
2024-02-27 04:53:19 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 3.1.7 have already been run.
2024-02-27 04:53:19 INFO juju.cmd.jujud caasoperator.go:205 caas operator application-envoy start (3.1.7 [gc])
2024-02-27 04:53:19 DEBUG juju.cmd.jujud runner.go:402 start "api"
2024-02-27 04:53:19 INFO juju.cmd.jujud runner.go:578 start "api"
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "caas-units-manager" manifold worker started at 2024-02-27 04:53:19.415579542 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "clock" manifold worker started at 2024-02-27 04:53:19.416484295 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "upgrade-steps-gate" manifold worker started at 2024-02-27 04:53:19.416651233 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "agent" manifold worker started at 2024-02-27 04:53:19.417316281 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:603 "caas-units-manager" manifold worker completed successfully
2024-02-27 04:53:19 DEBUG juju.worker.introspection worker.go:135 introspection worker listening on "@jujud-application-envoy"
2024-02-27 04:53:19 DEBUG juju.cmd.jujud runner.go:410 "api" started
2024-02-27 04:53:19 DEBUG juju.worker.introspection worker.go:161 stats worker now serving
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "upgrade-steps-flag" manifold worker started at 2024-02-27 04:53:19.426088114 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "caas-units-manager" manifold worker started at 2024-02-27 04:53:19.426203073 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.apicaller connect.go:129 connecting with old password
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "api-config-watcher" manifold worker started at 2024-02-27 04:53:19.428977713 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "migration-fortress" manifold worker started at 2024-02-27 04:53:19.436578166 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.api apiclient.go:1172 successfully dialed "wss://10.10.11.54:17070/model/c41734e0-aa2c-4028-8105-ccefc9d4111e/api"
2024-02-27 04:53:19 INFO juju.api apiclient.go:707 connection established to "wss://10.10.11.54:17070/model/c41734e0-aa2c-4028-8105-ccefc9d4111e/api"
2024-02-27 04:53:19 INFO juju.worker.apicaller connect.go:163 [c41734] "application-envoy" successfully connected to "10.10.11.54:17070"
2024-02-27 04:53:19 DEBUG juju.api monitor.go:35 RPC connection died
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:603 "api-caller" manifold worker completed successfully
2024-02-27 04:53:19 DEBUG juju.worker.apicaller connect.go:129 connecting with old password
2024-02-27 04:53:19 DEBUG juju.api apiclient.go:1172 successfully dialed "wss://10.10.11.54:17070/model/c41734e0-aa2c-4028-8105-ccefc9d4111e/api"
2024-02-27 04:53:19 INFO juju.api apiclient.go:707 connection established to "wss://10.10.11.54:17070/model/c41734e0-aa2c-4028-8105-ccefc9d4111e/api"
2024-02-27 04:53:19 INFO juju.worker.apicaller connect.go:163 [c41734] "application-envoy" successfully connected to "10.10.11.54:17070"
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "api-caller" manifold worker started at 2024-02-27 04:53:19.496843869 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:603 "caas-units-manager" manifold worker completed successfully
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "caas-units-manager" manifold worker started at 2024-02-27 04:53:19.50550414 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "migration-minion" manifold worker started at 2024-02-27 04:53:19.507115444 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "upgrader" manifold worker started at 2024-02-27 04:53:19.507267948 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "log-sender" manifold worker started at 2024-02-27 04:53:19.507510505 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "upgrade-steps-runner" manifold worker started at 2024-02-27 04:53:19.509256653 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:603 "upgrade-steps-runner" manifold worker completed successfully
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "migration-inactive-flag" manifold worker started at 2024-02-27 04:53:19.512624978 +0000 UTC
2024-02-27 04:53:19 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received
2024-02-27 04:53:19 DEBUG juju.worker.caasupgrader upgrader.go:128 current agent binary version: 3.1.7
2024-02-27 04:53:19 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check
2024-02-27 04:53:19 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "charm-dir" manifold worker started at 2024-02-27 04:53:19.522822714 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:618 "operator" manifold worker stopped: fortress operation aborted
stack trace:
github.com/juju/juju/worker/fortress.init:43: fortress operation aborted
github.com/juju/juju/worker/fortress.Occupy:60:
github.com/juju/juju/cmd/jujud/agent/engine.Housing.Decorate.occupyStart.func1:93:
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "secret-drain-worker" manifold worker started at 2024-02-27 04:53:19.523128737 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "api-address-updater" manifold worker started at 2024-02-27 04:53:19.523210402 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.logger logger.go:65 initial log config: "<root>=DEBUG"
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "logging-config-updater" manifold worker started at 2024-02-27 04:53:19.523368699 +0000 UTC
2024-02-27 04:53:19 INFO juju.worker.logger logger.go:120 logger worker started
2024-02-27 04:53:19 DEBUG juju.worker.dependency engine.go:580 "proxy-config-updater" manifold worker started at 2024-02-27 04:53:19.524860739 +0000 UTC
2024-02-27 04:53:19 DEBUG juju.worker.logger logger.go:93 reconfiguring logging from "<root>=DEBUG" to "<root>=INFO"
2024-02-27 04:53:19 WARNING juju.worker.proxyupdater proxyupdater.go:241 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
2024-02-27 04:53:19 INFO juju.worker.caasoperator.charm bundles.go:81 downloading local:focal/envoy-0 from API server
2024-02-27 04:53:19 INFO juju.downloader download.go:109 downloading from local:focal/envoy-0
2024-02-27 04:53:19 INFO juju.downloader download.go:92 download complete ("local:focal/envoy-0")
2024-02-27 04:53:19 INFO juju.downloader download.go:172 download verified ("local:focal/envoy-0")
2024-02-27 04:53:23 INFO juju.worker.caasoperator caasoperator.go:430 operator "envoy" started
2024-02-27 04:53:23 INFO juju.worker.caasoperator.runner runner.go:578 start "envoy/0"
2024-02-27 04:53:23 INFO juju.worker.leadership tracker.go:194 envoy/0 promoted to leadership of envoy
2024-02-27 04:53:23 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-envoy-0
2024-02-27 04:53:23 INFO juju.worker.caasoperator.uniter.envoy/0 uniter.go:363 unit "envoy/0" started
2024-02-27 04:53:23 INFO juju.worker.caasoperator.uniter.envoy/0 uniter.go:689 resuming charm install
2024-02-27 04:53:23 INFO juju.worker.caasoperator.uniter.envoy/0.charm bundles.go:81 downloading local:focal/envoy-0 from API server
2024-02-27 04:53:23 INFO juju.downloader download.go:109 downloading from local:focal/envoy-0
2024-02-27 04:53:23 INFO juju.downloader download.go:92 download complete ("local:focal/envoy-0")
2024-02-27 04:53:24 INFO juju.downloader download.go:172 download verified ("local:focal/envoy-0")
2024-02-27 04:53:27 INFO juju.worker.caasoperator.uniter.envoy/0 uniter.go:389 hooks are retried true
2024-02-27 04:53:27 INFO juju.worker.caasoperator.uniter.envoy/0 resolver.go:165 found queued "install" hook
2024-02-27 04:53:28 INFO juju-log Running legacy hooks/install.
2024-02-27 04:53:29 WARNING juju-log 0 containers are present in metadata.yaml and refresh_event was not specified. Defaulting to update_status. Metrics IP may not be set in a timely fashion.
2024-02-27 04:53:30 INFO juju.worker.caasoperator.uniter.envoy/0.operation runhook.go:186 ran "install" hook (via hook dispatching script: dispatch)
2024-02-27 04:53:30 INFO juju.worker.caasoperator.uniter.envoy/0 resolver.go:165 found queued "leader-elected" hook
2024-02-27 04:53:31 WARNING juju-log 0 containers are present in metadata.yaml and refresh_event was not specified. Defaulting to update_status. Metrics IP may not be set in a timely fashion.

Additional Context

No response

@gustavosr98 gustavosr98 added the bug Something isn't working label Feb 27, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5391.

This message was autogenerated

@kimwnasptd
Copy link
Contributor

@gustavosr98 this is also deiscussed in canonical/bundle-kubeflow#818

The current situation is that 1.8 is not yet working on airgap. Mainly because upstream KFP is not yet working on airgap, and we are waiting for the patch release of upstream that will enable this, to then work on it

@NohaIhab
Copy link
Contributor

this is now resolved after addressing canonical/kfp-operators#452 and #98.
A script for deploying CKF 1.8 in airgapped will be added as part of canonical/bundle-kubeflow#818 shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants