Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CURRENT file pointing to missing MANIFEST file. [JIRA: RIAK-1789] #153

Open
angrycub opened this issue Apr 30, 2015 · 5 comments
Open

CURRENT file pointing to missing MANIFEST file. [JIRA: RIAK-1789] #153

angrycub opened this issue Apr 30, 2015 · 5 comments

Comments

@angrycub
Copy link

Re: Zendesk Ticket #10730

After a restart of a node, the riak vnodes wouldn't start because the CURRENT file was pointing to an non-existent version of the MANIFEST. How did these get out of sync?

@Basho-JIRA Basho-JIRA changed the title CURRENT file pointing to missing MANIFEST file. CURRENT file pointing to missing MANIFEST file. [JIRA: RIAK-1789] Apr 30, 2015
@engelsanchez
Copy link

It is possible that we are seeing the problem described in the All filesystems are not created equal paper. Basically, doing a sync on data files and atomic renames is not enough to ensure consistency of the set of files in the directory. The directory entry itself may need a sync in Linux systems to ensure that the file operations are executed in the expected order and the new file mods survive a crash.

@matthewvon
Copy link
Contributor

@angrycub ... define "restart of a node" ... riak restart, linux restart, machine died and was restarted, etc.

@angrycub
Copy link
Author

angrycub commented Jun 1, 2015

From the ticket narrative:

"At the same time this node was discovered down, two other nodes were being reported by ring-status as being down, but their beam processes were still running. Since the cluster already believed them unavailable, I killed the beam process on each of these nodes and restarted riak successfully. Currently all but one of the cluster members are up."

So I'm going to say, killed after being in an "indeterminate" state and then Riak was restarted. No indication that the physical nodes were rebooted; however, I can contact the user in question for more details if you'd like.

@matthewvon
Copy link
Contributor

let me ponder this ... however, a vnode repair would fix. You already do that?

@angrycub
Copy link
Author

angrycub commented Jun 1, 2015

That was how we addressed that particular ticket. Dumb shell script to look where the filename in the CURRENT file did not exist in that partitions folder and echo out the partition IDs. Ran eleveldb:repair on all of them and all was well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants