Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microceph not starting after upgrade to reef/stable #342

Open
usma0118 opened this issue Apr 22, 2024 · 12 comments
Open

Microceph not starting after upgrade to reef/stable #342

usma0118 opened this issue Apr 22, 2024 · 12 comments
Labels
APPARMOR Denials bug Something isn't working workaround-available Label for issues/bugs which have a workaround available.

Comments

@usma0118
Copy link

usma0118 commented Apr 22, 2024

Issue report

After upgrading from snap 707 to 975, i am getting error failed to initialize trust store.

I have looked into issue reported #336 and followed copied values to trust store as defined in #269. but still fails.

What version of MicroCeph are you using ?

18.2.0+snap450240f5dd (Single node)

Use this section to describe the channel/revision which produces the unexpected behavior.
This information can be fetched from the installed: section of sudo snap info microceph output.

What are the steps to reproduce this issue ?

Upgraded from snap version 707 to 975

What happens (observed behaviour) ?

Mar 01 20:27:02 antaresinc-cluster microceph.daemon[2178]: Error: Unable to start daemon: Daemon failed to start: Failed to initialize trust store: Failed to parse local record "". Found empty certificate
Mar 01 20:27:02 antaresinc-cluster microceph.daemon[2359]: time="2024-03-01T20:27:02+01:00" level=info msg="Daemon stopped"
Mar 01 20:27:02 antaresinc-cluster microceph.daemon[2359]: Error: Unable to start daemon: Daemon failed to start: Failed to initialize trust store: Failed to parse local record "". Found empty certificate
Mar 01 20:27:02 antaresinc-cluster systemd[1]: snap.microceph.daemon.service: Main process exited, code=exited, status=1/FAILURE
Mar 01 20:27:02 antaresinc-cluster systemd[1]: snap.microceph.daemon.service: Failed with result 'exit-code'.
Mar 01 20:27:02 antaresinc-cluster microceph.mds[1408]: starting mds.antaresinc-cluster at
Mar 01 20:27:02 antaresinc-cluster systemd[1]: snap.microceph.daemon.service: Scheduled restart job, restart counter is at 5.
Mar 01 20:27:02 antaresinc-cluster systemd[1]: Stopped snap.microceph.daemon.service - Service for snap application microceph.daemon.
Mar 01 20:27:02 antaresinc-cluster systemd[1]: snap.microceph.daemon.service: Start request repeated too quickly.
Mar 01 20:27:02 antaresinc-cluster systemd[1]: snap.microceph.daemon.service: Failed with result 'exit-code'.
Mar 01 20:27:02 antaresinc-cluster systemd[1]: Failed to start snap.microceph.daemon.service - Service for snap application microceph.daemon.

What were you expecting to happen ?

Relevant logs, error output, etc.

Mar 03 01:01:59 antaresinc-cluster audit[2422]: AVC apparmor="DENIED" operation="capable" class="cap" profile="snap.microceph.osd" pid=2422 comm="admin_socket" capability=24  capname="sys_resource"
Mar 03 01:01:59 antaresinc-cluster kernel: audit: type=1400 audit(1709424119.298:190): apparmor="DENIED" operation="capable" class="cap" profile="snap.microceph.osd" pid=2422 comm="admin_socket" capability=24  capname="sys_resource"
Mar 03 01:01:59 antaresinc-cluster audit[3590529]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590529 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster audit[3590529]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590529 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster kernel: audit: type=1400 audit(1709424119.438:191): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590529 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster kernel: audit: type=1400 audit(1709424119.438:192): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590529 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster audit[3590542]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590542 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster kernel: audit: type=1400 audit(1709424119.642:193): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590542 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster kernel: audit: type=1400 audit(1709424119.642:194): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590542 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:01:59 antaresinc-cluster audit[3590542]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.osd" name="/usr/bin/sudo" pid=3590542 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster audit[1410]: AVC apparmor="DENIED" operation="capable" class="cap" profile="snap.microceph.mon" pid=1410 comm="admin_socket" capability=24  capname="sys_resource"
Mar 03 01:02:00 antaresinc-cluster kernel: audit: type=1400 audit(1709424120.126:195): apparmor="DENIED" operation="capable" class="cap" profile="snap.microceph.mon" pid=1410 comm="admin_socket" capability=24  capname="sys_resource"
Mar 03 01:02:00 antaresinc-cluster audit[3590559]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590559 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster audit[3590559]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590559 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster kernel: audit: type=1400 audit(1709424120.198:196): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590559 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster kernel: audit: type=1400 audit(1709424120.198:197): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590559 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster audit[3590561]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590561 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster audit[3590561]: AVC apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590561 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster kernel: audit: type=1400 audit(1709424120.374:198): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590561 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 01:02:00 antaresinc-cluster kernel: audit: type=1400 audit(1709424120.374:199): apparmor="DENIED" operation="exec" class="file" profile="snap.microceph.mon" name="/usr/bin/sudo" pid=3590561 comm="admin_socket" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
Mar 03 05:01:38 antaresinc-cluster microceph.mgr[1409]: 2024-03-03T05:01:38.591+0100 7f5e60947640 -1 mgr handle_mgr_map I was active but no longer am

Additional comments.

@usma0118 usma0118 changed the title Microceph after upgrade to Microceph not starting after upgrade to reef/stable Apr 22, 2024
@usma0118
Copy link
Author

Related #219

@usma0118
Copy link
Author

@UtkarshBhatthere any help?

@UtkarshBhatthere
Copy link
Contributor

Thanks for sharing this issue @usma0118. We will take a look at it. I also do not think this is related to #219. That was a simpler command timeout that happened if the bootstrap process was slow.

@UtkarshBhatthere UtkarshBhatthere added bug Something isn't working APPARMOR Denials labels Apr 23, 2024
@usma0118
Copy link
Author

App armor settings:

x profiles are in enforce mode.

  snap.microceph.ceph
   snap.microceph.daemon
   snap.microceph.hook.install
   snap.microceph.hook.post-refresh
   snap.microceph.mds
   snap.microceph.mgr
   snap.microceph.microceph
   snap.microceph.mon
   snap.microceph.osd
   snap.microceph.rados
   snap.microceph.radosgw-admin
   snap.microceph.rbd
   snap.microceph.rgw

Processes are in enforce mode.

   /snap/microceph/975/bin/ceph-mds (14118) snap.microceph.mds
   /snap/microceph/975/bin/ceph-mgr (14119) snap.microceph.mgr
   /usr/bin/dash (14191) snap.microceph.osd
   /snap/microceph/975/bin/ceph-osd (14222) snap.microceph.osd

@usma0118
Copy link
Author

usma0118 commented Apr 24, 2024

Another observation, truststore after upgrade was empty, had to manually create cluster.yaml

@UtkarshBhatthere
Copy link
Contributor

yes, the empty truststore was an old issue (which you possibly observed due to upgrade from an older revision). You should not see this going forward since the fix has been merged in microcluster and we have refreshed our dependencies.

@usma0118
Copy link
Author

After manually fixing truststore, I am can see microceph services running.

but ceph status gives timeout. any ideas?

@UtkarshBhatthere
Copy link
Contributor

Are the required ceph config files present in the /var/snap/microceph path ?

@UtkarshBhatthere
Copy link
Contributor

Also @usma0118 please feel free to discuss this directly in our Matrix Room if there is a bit of to and fro required.

@usma0118
Copy link
Author

usma0118 commented May 7, 2024

Had to manually create config files. after that microceph services are started.

@UtkarshBhatthere
Copy link
Contributor

it would be awesome if you could share a bit about what config files you had to create to get it working.

@usma0118
Copy link
Author

difficult to remember but based on command history:

/var/snap/microceph/common/state/truststore/ (Culster file)

symlink /var/snap/microceph//current/conf/ceph.client.admin.keyring and /var/snap/microceph//current/conf/ceph.conf to required places (can't recall which)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APPARMOR Denials bug Something isn't working workaround-available Label for issues/bugs which have a workaround available.
Projects
None yet
Development

No branches or pull requests

2 participants