Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

katib-db-manager: waiting for relational-db data #961

Open
ACodingfreak opened this issue Jul 1, 2024 · 6 comments
Open

katib-db-manager: waiting for relational-db data #961

ACodingfreak opened this issue Jul 1, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@ACodingfreak
Copy link

ACodingfreak commented Jul 1, 2024

Bug Description

As shown in below logs, Katib-db-manager is continously waiting for the relation-db data from mysql which is still busy in installing the Agent

$ juju status
Model     Controller  Cloud/Region      Version  SLA          Timestamp
kubeflow  uk8sx       my-k8s/localhost  3.4.4    unsupported  11:04:24-07:00

App                        Version                  Status   Scale  Charm                    Channel          Rev  Address         Exposed  Message
admission-webhook                                   active       1  admission-webhook        1.8/stable       301  10.152.183.232  no
argo-controller                                     active       1  argo-controller          3.3.10/stable    424  10.152.183.54   no
dex-auth                                            active       1  dex-auth                 2.36/stable      422  10.152.183.205  no
envoy                      res:oci-image@cc06b3e    active       1  envoy                    2.0/stable       194  10.152.183.32   no
istio-ingressgateway                                active       1  istio-gateway            1.17/stable     1000  10.152.183.49   no
istio-pilot                                         active       1  istio-pilot              1.17/stable     1011  10.152.183.106  no
jupyter-controller                                  active       1  jupyter-controller       1.8/stable       849  10.152.183.233  no
jupyter-ui                                          active       1  jupyter-ui               1.8/stable       858  10.152.183.20   no
katib-controller           res:oci-image@31ccd70    active       1  katib-controller         0.16/stable      576  10.152.183.152  no
katib-db                                            waiting      1  mysql-k8s                8.0/stable       153  10.152.183.57   no       installing agent
katib-db-manager                                    waiting      1  katib-db-manager         0.16/stable      539  10.152.183.107  no       installing agent
katib-ui                                            active       1  katib-ui                 0.16/stable      422  10.152.183.183  no
kfp-api                                             active       1  kfp-api                  2.0/stable      1283  10.152.183.141  no
kfp-db                     8.0.36-0ubuntu0.22.04.1  active       1  mysql-k8s                8.0/stable       153  10.152.183.39   no
kfp-metadata-writer                                 active       1  kfp-metadata-writer      2.0/stable       334  10.152.183.100  no
kfp-persistence                                     active       1  kfp-persistence          2.0/stable      1291  10.152.183.179  no
kfp-profile-controller                              waiting      1  kfp-profile-controller   2.0/stable      1315  10.152.183.46   no       installing agent
kfp-schedwf                                         active       1  kfp-schedwf              2.0/stable      1302  10.152.183.242  no
kfp-ui                                              active       1  kfp-ui                   2.0/stable      1285  10.152.183.91   no
kfp-viewer                                          active       1  kfp-viewer               2.0/stable      1317  10.152.183.70   no
kfp-viz                                             active       1  kfp-viz                  2.0/stable      1235  10.152.183.137  no
knative-eventing                                    active       1  knative-eventing         1.10/stable      353  10.152.183.184  no
knative-operator                                    active       1  knative-operator         1.10/stable      328  10.152.183.206  no
knative-serving                                     active       1  knative-serving          1.10/stable      409  10.152.183.249  no
kserve-controller                                   active       1  kserve-controller        0.11/stable      523  10.152.183.81   no
kubeflow-dashboard                                  active       1  kubeflow-dashboard       1.8/stable       582  10.152.183.83   no
kubeflow-profiles                                   active       1  kubeflow-profiles        1.8/stable       355  10.152.183.222  no
kubeflow-roles                                      active       1  kubeflow-roles           1.8/stable       187  10.152.183.30   no
kubeflow-volumes           res:oci-image@2261827    active       1  kubeflow-volumes         1.8/stable       260  10.152.183.193  no
metacontroller-operator                             active       1  metacontroller-operator  3.0/stable       252  10.152.183.151  no
minio                      res:oci-image@1755999    active       1  minio                    ckf-1.8/stable   278  10.152.183.155  no
mlmd                       res:oci-image@44abc5d    active       1  mlmd                     1.14/stable      127  10.152.183.95   no
oidc-gatekeeper                                     active       1  oidc-gatekeeper          ckf-1.8/stable   350  10.152.183.234  no
pvcviewer-operator                                  active       1  pvcviewer-operator       1.8/stable        30  10.152.183.38   no
seldon-controller-manager                           active       1  seldon-core              1.17/stable      664  10.152.183.25   no
tensorboard-controller                              active       1  tensorboard-controller   1.8/stable       257  10.152.183.238  no
tensorboards-web-app                                active       1  tensorboards-web-app     1.8/stable       245  10.152.183.171  no
training-operator                                   active       1  training-operator        1.7/stable       347  10.152.183.108  no

Unit                          Workload     Agent  Address       Ports          Message
admission-webhook/0*          active       idle   10.1.210.231
argo-controller/0*            active       idle   10.1.69.189
dex-auth/0*                   active       idle   10.1.69.191
envoy/0*                      active       idle   10.1.69.156   9090,9901/TCP
istio-ingressgateway/0*       active       idle   10.1.69.190
istio-pilot/0*                active       idle   10.1.69.133
jupyter-controller/0*         active       idle   10.1.210.233
jupyter-ui/0*                 active       idle   10.1.236.251
katib-controller/0*           active       idle   10.1.69.134   443,8080/TCP
katib-db-manager/0*           waiting      idle   10.1.210.234                 Waiting for relational-db data
katib-db/0*                   unknown      idle   10.1.69.130
katib-ui/0*                   active       idle   10.1.210.235
kfp-api/0*                    active       idle   10.1.236.253
kfp-db/0*                     active       idle   10.1.236.255                 Primary
kfp-metadata-writer/0*        active       idle   10.1.210.236
kfp-persistence/0*            active       idle   10.1.210.237
kfp-profile-controller/0*     maintenance  idle   10.1.210.238                 Reconciling charm: executing component container:kfp-profile-controller
kfp-schedwf/0*                active       idle   10.1.69.131
kfp-ui/0*                     active       idle   10.1.210.242
kfp-viewer/0*                 active       idle   10.1.210.241
kfp-viz/0*                    active       idle   10.1.69.135
knative-eventing/0*           active       idle   10.1.210.239
knative-operator/0*           active       idle   10.1.69.139
knative-serving/0*            active       idle   10.1.210.240
kserve-controller/0*          active       idle   10.1.210.245
kubeflow-dashboard/0*         active       idle   10.1.210.243
kubeflow-profiles/0*          active       idle   10.1.69.141
kubeflow-roles/0*             active       idle   10.1.69.129
kubeflow-volumes/0*           active       idle   10.1.69.136   5000/TCP
metacontroller-operator/0*    active       idle   10.1.69.137
minio/0*                      active       idle   10.1.236.204  9000-9001/TCP
mlmd/0*                       active       idle   10.1.69.138   8080/TCP
oidc-gatekeeper/0*            active       idle   10.1.236.198
pvcviewer-operator/0*         active       idle   10.1.69.140
seldon-controller-manager/0*  active       idle   10.1.236.199
tensorboard-controller/0*     active       idle   10.1.210.246
tensorboards-web-app/0*       active       idle   10.1.236.201
training-operator/0*          active       idle   10.1.210.247

To Reproduce

sudo snap install microk8s --channel=1.29/stable --classic
sudo snap install juju --classic --channel=3.4/stable
microk8s config | juju add-k8s my-k8s --client
juju bootstrap my-k8s uk8sx
juju add-model kubeflow
juju deploy kubeflow --trust  --channel=1.8/stable

Environment

Microk8s: 1.29/stable
Juju: 3.4/stable
Kubeflow:1.8/stable

Relevant Log Output

$ kubectl logs katib-db-manager-0 -n kubeflow

2024-07-01T17:53:26.548Z [container-agent] 2024-07-01 17:53:26 INFO juju-log HTTP Request: PATCH https://10.152.183.1/apis/rbac.authorization.k8s.io/v1/clusterroles/katib-db-manager?force=true&fieldManager=lightkube "HTTP/1.1 200 OK"
2024-07-01T17:53:26.760Z [container-agent] 2024-07-01 17:53:26 INFO juju-log HTTP Request: PATCH https://10.152.183.1/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/katib-db-manager?force=true&fieldManager=lightkube "HTTP/1.1 200 OK"
2024-07-01T17:53:26.835Z [container-agent] 2024-07-01 17:53:26 INFO juju-log Reconcile completed successfully
2024-07-01T17:53:27.004Z [container-agent] 2024-07-01 17:53:27 INFO juju-log Found empty relation data for relational-db relation.
2024-07-01T17:53:27.052Z [container-agent] 2024-07-01 17:53:27 ERROR juju-log Failed to handle <UpdateStatusEvent via KatibDBManagerOperator/on/update_status[181]> with error: Waiting for relational-db data
2024-07-01T17:53:27.260Z [container-agent] 2024-07-01 17:53:27 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/katib-db-manager "HTTP/1.1 200 OK"
2024-07-01T17:53:27.423Z [container-agent] 2024-07-01 17:53:27 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/kubeflow/services/katib-db-manager "HTTP/1.1 200 OK"
2024-07-01T17:53:27.496Z [container-agent] 2024-07-01 17:53:27 INFO juju-log Kubernetes service 'katib-db-manager' patched successfully
2024-07-01T17:53:27.825Z [container-agent] 2024-07-01 17:53:27 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)
2024-07-01T17:58:35.227Z [container-agent] 2024-07-01 17:58:35 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/apiextensions.k8s.io/v1/customresourcedefinitions "HTTP/1.1 200 OK"
2024-07-01T17:58:35.632Z [container-agent] 2024-07-01 17:58:35 INFO juju-log Rendering manifests
2024-07-01T17:58:35.871Z [container-agent] 2024-07-01 17:58:35 INFO juju-log HTTP Request: PATCH https://10.152.183.1/apis/rbac.authorization.k8s.io/v1/clusterroles/katib-db-manager?force=true&fieldManager=lightkube "HTTP/1.1 200 OK"
2024-07-01T17:58:36.038Z [container-agent] 2024-07-01 17:58:36 INFO juju-log HTTP Request: PATCH https://10.152.183.1/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/katib-db-manager?force=true&fieldManager=lightkube "HTTP/1.1 200 OK"
2024-07-01T17:58:36.110Z [container-agent] 2024-07-01 17:58:36 INFO juju-log Reconcile completed successfully
2024-07-01T17:58:36.278Z [container-agent] 2024-07-01 17:58:36 INFO juju-log Found empty relation data for relational-db relation.
2024-07-01T17:58:36.322Z [container-agent] 2024-07-01 17:58:36 ERROR juju-log Failed to handle <UpdateStatusEvent via KatibDBManagerOperator/on/update_status[186]> with error: Waiting for relational-db data
2024-07-01T17:58:36.517Z [container-agent] 2024-07-01 17:58:36 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/katib-db-manager "HTTP/1.1 200 OK"
2024-07-01T17:58:36.680Z [container-agent] 2024-07-01 17:58:36 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/kubeflow/services/katib-db-manager "HTTP/1.1 200 OK"
2024-07-01T17:58:36.750Z [container-agent] 2024-07-01 17:58:36 INFO juju-log Kubernetes service 'katib-db-manager' patched successfully
2024-07-01T17:58:37.078Z [container-agent] 2024-07-01 17:58:37 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)

Logs from mysql katib-db container

mysql.zip

Additional Context

No response

@ACodingfreak ACodingfreak added the bug Something isn't working label Jul 1, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5950.

This message was autogenerated

@DnPlas
Copy link
Contributor

DnPlas commented Jul 1, 2024

Hi @ACodingfreak thanks for reporting this!

Please note there is a katib-db-manager and a katib-db charm (which underneath is just the mysql-k8s charm). The katib-db-manager depends on the katib-db charm to be active, idle and serving; otherwise it will just go into a waiting status.

From the juju status output you have provided I can see:

katib-db                                            waiting      1  mysql-k8s                8.0/stable       153  10.152.183.57   no       installing agent

...
katib-db/0*                   unknown      idle   10.1.69.130

As you mentioned, it is stuck in waiting status with installing agent message. Unfortunately, the logs you have provided come from the katib-db-manager charm, which doesn't seem to be the one causing issues.

Pinging @paulomach, @shayancanonical - have you folks run into this? what other logs could be useful for debugging this issue?

@shayancanonical
Copy link
Contributor

my guess is that there is some sort of pebble issue causing mysqld to not start up, followed by errors connecting to the mysqld service. would you have the katib-db container still running? if so, would you be able to provide the output of pebble services as well as the content of /var/log/mysql/error.log and any logs from /var/log/mysql/archive_error/*.log?

relevant error traces from the mysql katib-db container logs.

the following error occurs numerous times first:

2024-07-01T16:44:14.625Z [container-agent] 2024-07-01 16:44:14 ERROR juju-log Uncaught exception while in charm code:
2024-07-01T16:44:14.625Z [container-agent] Traceback (most recent call last):
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 839, in <module>
2024-07-01T16:44:14.625Z [container-agent]     main(MySQLOperatorCharm)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 548, in main
2024-07-01T16:44:14.625Z [container-agent]     manager.run()
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 527, in run
2024-07-01T16:44:14.625Z [container-agent]     self._emit()
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 516, in _emit
2024-07-01T16:44:14.625Z [container-agent]     _emit_charm_event(self.charm, self.dispatcher.event_name)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 147, in _emit_charm_event
2024-07-01T16:44:14.625Z [container-agent]     event_to_emit.emit(*args, **kwargs)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 348, in emit
2024-07-01T16:44:14.625Z [container-agent]     framework._emit(event)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 860, in _emit
2024-07-01T16:44:14.625Z [container-agent]     self._reemit(event_path)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 950, in _reemit
2024-07-01T16:44:14.625Z [container-agent]     custom_handler(event)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:44:14.625Z [container-agent]     return callable(*args, **kwargs)  # type: ignore
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 648, in _on_mysql_pebble_ready
2024-07-01T16:44:14.625Z [container-agent]     self._configure_instance(container)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:44:14.625Z [container-agent]     return callable(*args, **kwargs)  # type: ignore
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 574, in _configure_instance
2024-07-01T16:44:14.625Z [container-agent]     container.restart(MYSQLD_SAFE_SERVICE)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/model.py", line 2226, in restart
2024-07-01T16:44:14.625Z [container-agent]     self._pebble.restart_services(service_names)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 2065, in restart_services
2024-07-01T16:44:14.625Z [container-agent]     return self._services_action('restart', services, timeout, delay)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 2085, in _services_action
2024-07-01T16:44:14.625Z [container-agent]     resp = self._request('POST', '/v1/services', body=body)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 1859, in _request
2024-07-01T16:44:14.625Z [container-agent]     response = self._request_raw(method, path, query, headers, data)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 1898, in _request_raw
2024-07-01T16:44:14.625Z [container-agent]     response = self.opener.open(request, timeout=self.timeout)
2024-07-01T16:44:14.625Z [container-agent]   File "/usr/lib/python3.10/urllib/request.py", line 519, in open
2024-07-01T16:44:14.625Z [container-agent]     response = self._open(req, data)
2024-07-01T16:44:14.625Z [container-agent]   File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
2024-07-01T16:44:14.625Z [container-agent]     result = self._call_chain(self.handle_open, protocol, protocol +
2024-07-01T16:44:14.625Z [container-agent]   File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
2024-07-01T16:44:14.625Z [container-agent]     result = func(*args)
2024-07-01T16:44:14.625Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 373, in http_open
2024-07-01T16:44:14.625Z [container-agent]     return self.do_open(
2024-07-01T16:44:14.625Z [container-agent]   File "/usr/lib/python3.10/urllib/request.py", line 1352, in do_open
2024-07-01T16:44:14.625Z [container-agent]     r = h.getresponse()
2024-07-01T16:44:14.625Z [container-agent]   File "/usr/lib/python3.10/http/client.py", line 1375, in getresponse
2024-07-01T16:44:14.625Z [container-agent]     response.begin()
2024-07-01T16:44:14.625Z [container-agent]   File "/usr/lib/python3.10/http/client.py", line 318, in begin
2024-07-01T16:44:14.625Z [container-agent]     version, status, reason = self._read_status()
2024-07-01T16:44:14.625Z [container-agent]   File "/usr/lib/python3.10/http/client.py", line 279, in _read_status
2024-07-01T16:44:14.625Z [container-agent]     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
2024-07-01T16:44:14.625Z [container-agent]   File "/usr/lib/python3.10/socket.py", line 705, in readinto
2024-07-01T16:44:14.625Z [container-agent]     return self._sock.recv_into(b)
2024-07-01T16:44:14.625Z [container-agent] TimeoutError: timed out

after a while, the following error repeats in the logs

2024-07-01T16:51:12.515Z [container-agent] 2024-07-01 16:51:12 INFO juju-log Adding pebble layer
2024-07-01T16:51:13.639Z [container-agent] 2024-07-01 16:51:13 ERROR juju-log Failed to connect to MySQL with mysqlsh
2024-07-01T16:51:13.639Z [container-agent] Traceback (most recent call last):
2024-07-01T16:51:13.639Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/src/mysql_k8s_helpers.py", line 602, in _run_mysqlsh_script
2024-07-01T16:51:13.639Z [container-agent]     stdout, _ = process.wait_output()
2024-07-01T16:51:13.639Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 1635, in wait_output
2024-07-01T16:51:13.639Z [container-agent]     raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
2024-07-01T16:51:13.639Z [container-agent] ops.pebble.ExecError: non-zero exit code 1 executing ['/usr/bin/mysqlsh', '--no-wizard', '--python', '--verbose=1', '-f', '/tmp/script.py', ';', 'rm', '/tmp/script.py'], stdout='', stderr='Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\nverbose: 2024-07-01T16:51:13Z: Loading startup files...\nverbose: 2024-07-01T16:51:13Z: Loading plugins...\nverbose: 2024-07-01T16:51:13Z: Connecting to MySQL at: serverconfig@katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local\nTraceback (most recent call last):\n  File "<string>", line 1, in <module>\nmysqlsh.DBError: MySQL Error (1045): Shell.connect: Access denied for user \'serverconfig\'@\'katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local\' (using password: YES)\n'
2024-07-01T16:51:13.639Z [container-agent] 
2024-07-01T16:51:13.639Z [container-agent] During handling of the above exception, another exception occurred:
2024-07-01T16:51:13.639Z [container-agent] 
2024-07-01T16:51:13.639Z [container-agent] Traceback (most recent call last):
2024-07-01T16:51:13.639Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/mysql/v0/mysql.py", line 3108, in check_mysqlsh_connection
2024-07-01T16:51:13.639Z [container-agent]     self._run_mysqlsh_script("\n".join(connect_commands))
2024-07-01T16:51:13.639Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:51:13.639Z [container-agent]     return callable(*args, **kwargs)  # type: ignore
2024-07-01T16:51:13.639Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/src/mysql_k8s_helpers.py", line 605, in _run_mysqlsh_script
2024-07-01T16:51:13.639Z [container-agent]     raise MySQLClientError(e.stderr)
2024-07-01T16:51:13.639Z [container-agent] charms.mysql.v0.mysql.MySQLClientError: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
2024-07-01T16:51:13.639Z [container-agent] verbose: 2024-07-01T16:51:13Z: Loading startup files...
2024-07-01T16:51:13.639Z [container-agent] verbose: 2024-07-01T16:51:13Z: Loading plugins...
2024-07-01T16:51:13.639Z [container-agent] verbose: 2024-07-01T16:51:13Z: Connecting to MySQL at: serverconfig@katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local
2024-07-01T16:51:13.639Z [container-agent] Traceback (most recent call last):
2024-07-01T16:51:13.639Z [container-agent]   File "<string>", line 1, in <module>
2024-07-01T16:51:13.639Z [container-agent] mysqlsh.DBError: MySQL Error (1045): Shell.connect: Access denied for user 'serverconfig'@'katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local' (using password: YES)

2024-07-01T16:51:46.858Z [container-agent] 
2024-07-01T16:51:46.916Z [container-agent] 2024-07-01 16:51:46 ERROR juju-log Uncaught exception while in charm code:
2024-07-01T16:51:46.916Z [container-agent] Traceback (most recent call last):
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 839, in <module>
2024-07-01T16:51:46.916Z [container-agent]     main(MySQLOperatorCharm)
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 548, in main
2024-07-01T16:51:46.916Z [container-agent]     manager.run()
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 527, in run
2024-07-01T16:51:46.916Z [container-agent]     self._emit()
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 516, in _emit
2024-07-01T16:51:46.916Z [container-agent]     _emit_charm_event(self.charm, self.dispatcher.event_name)
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 147, in _emit_charm_event
2024-07-01T16:51:46.916Z [container-agent]     event_to_emit.emit(*args, **kwargs)
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 348, in emit
2024-07-01T16:51:46.916Z [container-agent]     framework._emit(event)
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 860, in _emit
2024-07-01T16:51:46.916Z [container-agent]     self._reemit(event_path)
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 950, in _reemit
2024-07-01T16:51:46.916Z [container-agent]     custom_handler(event)
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:51:46.916Z [container-agent]     return callable(*args, **kwargs)  # type: ignore
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 642, in _on_mysql_pebble_ready
2024-07-01T16:51:46.916Z [container-agent]     self._reconcile_pebble_layer(container)
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:51:46.916Z [container-agent]     return callable(*args, **kwargs)  # type: ignore
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 395, in _reconcile_pebble_layer
2024-07-01T16:51:46.916Z [container-agent]     self._mysql.wait_until_mysql_connection()
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:51:46.916Z [container-agent]     return callable(*args, **kwargs)  # type: ignore
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 330, in wrapped_f
2024-07-01T16:51:46.916Z [container-agent]     return self(f, *args, **kw)
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 467, in __call__
2024-07-01T16:51:46.916Z [container-agent]     do = self.iter(retry_state=retry_state)
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 368, in iter
2024-07-01T16:51:46.916Z [container-agent]     result = action(retry_state)
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 410, in exc_check
2024-07-01T16:51:46.916Z [container-agent]     raise retry_exc.reraise()
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 183, in reraise
2024-07-01T16:51:46.916Z [container-agent]     raise self.last_attempt.result()
2024-07-01T16:51:46.916Z [container-agent]   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
2024-07-01T16:51:46.916Z [container-agent]     return self.__get_result()
2024-07-01T16:51:46.916Z [container-agent]   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2024-07-01T16:51:46.916Z [container-agent]     raise self._exception
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 470, in __call__
2024-07-01T16:51:46.916Z [container-agent]     result = fn(*args, **kwargs)
2024-07-01T16:51:46.916Z [container-agent]   File "/var/lib/juju/agents/unit-katib-db-0/charm/src/mysql_k8s_helpers.py", line 232, in wait_until_mysql_connection
2024-07-01T16:51:46.916Z [container-agent]     raise MySQLServiceNotRunningError("Connection with mysqlsh not possible")
2024-07-01T16:51:46.916Z [container-agent] charms.mysql.v0.mysql.MySQLServiceNotRunningError: Connection with mysqlsh not possible

@ACodingfreak
Copy link
Author

ACodingfreak commented Jul 1, 2024

Hi All,

I did attach logs from katib-db container in the bug named as mysql.zip. Is this good enough ?

https://github.com/user-attachments/files/16057534/mysql.zip

Sorry to say but as I need to quickly bring up CKF, I ended up downgrading the setup into microk8s:1.24 juju:2.9 kubeflow:1.7
Even there I am facing a different katib-manager issue as shown below.

#963

Like microk8s inspect is there a command for juju to dump all the logs needed for troubleshooting ?

@ACodingfreak ACodingfreak changed the title katib-db-manager - waiting for relational-db data katib-db-manager: waiting for relational-db data Jul 2, 2024
@shayancanonical
Copy link
Contributor

Unfortunately, we dont yet have a tool similar to microk8s inspect which will dump all the logs required for troubleshooting the mysql charm - but I believe we have something similar in our backlog.

Were you able to get CKF running? If not, would you be able to provide us with the environment details where you're deploying CKF so we can reproduce the issue?

@DnPlas
Copy link
Contributor

DnPlas commented Jul 2, 2024

@shayancanonical I think they did:

sudo snap install microk8s --channel=1.29/stable --classic
sudo snap install juju --classic --channel=3.4/stable
microk8s config | juju add-k8s my-k8s --client
juju bootstrap my-k8s uk8sx
juju add-model kubeflow
juju deploy kubeflow --trust  --channel=1.8/stable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants