Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] bookie-2 is not able to recover after lossing the filesystem #23047

Open
3 tasks done
cccdemon opened this issue Jul 17, 2024 · 1 comment
Open
3 tasks done

[Bug] bookie-2 is not able to recover after lossing the filesystem #23047

cccdemon opened this issue Jul 17, 2024 · 1 comment
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@cccdemon
Copy link

cccdemon commented Jul 17, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

Kubernetes 1.29
Pulsar 3.2.3

Minimal reproduce step

scale down to 1 bookies
delete the related pvc/pv for bookie-2

What did you expect to see?

a full recover by 1 bookie

What did you see instead?

2024-07-17T10:10:23,047+0000 [main] ERROR org.apache.bookkeeper.bookie.LegacyCookieValidation - There are directories without a cookie, and this is neither a new environment, nor is storage expansion enabled. Empty directories are [/pulsar/data/bookkeeper/journal/current, /pulsar/data/bookkeeper/ledgers/current] 2024-07-17T10:10:23,048+0000 [main] ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: at org.apache.bookkeeper.bookie.LegacyCookieValidation.checkCookies(LegacyCookieValidation.java:113) ~[org.apache.bookkeeper-bookkeeper-server-4.16.5.jar:4.16.5] at org.apache.bookkeeper.server.EmbeddedServer$Builder.build(EmbeddedServer.java:408) ~[org.apache.bookkeeper-bookkeeper-server-4.16.5.jar:4.16.5] at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:277) ~[org.apache.bookkeeper-bookkeeper-server-4.16.5.jar:4.16.5] at org.apache.bookkeeper.server.Main.doMain(Main.java:216) ~[org.apache.bookkeeper-bookkeeper-server-4.16.5.jar:4.16.5] at org.apache.bookkeeper.server.Main.main(Main.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.16.5.jar:4.16.5]

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@cccdemon cccdemon added the type/bug The PR fixed a bug or issue reported a bug label Jul 17, 2024
@vonsch
Copy link

vonsch commented Sep 13, 2024

Hello, we also encountered this issue in our environment. We have HA setup with three bookies across three availability zones and we lost bookie storage/disk in one of the availability zones (deleted kubernetes PVC+PV). When new bookie was started and new blank kubernetes PV was auto-provisioned, bookie failed to start.

We were able to recover it without destroying the whole pulsar deployment by manual removal of broken bookie from the cluster and then restarting the bookie POD:

# kubectl -n pulsar exec -it pulsar-bookie-0 -- /bin/bash # connect to any functional bookie POD
# ./bin/bookkeeper shell listbookies -a # Get proper BookieID from output
# ./bin/bookkeeper shell decommissionbookie -bookieid pulsar-bookie-1.pulsar-bookie.pulsar.svc.cluster.local:3181 # example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

2 participants