Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksums #502

Open
wants to merge 37 commits into
base: 1.8
Choose a base branch
from
Open

Checksums #502

wants to merge 37 commits into from

Conversation

314eter
Copy link
Contributor

@314eter 314eter commented Aug 19, 2014

No description provided.


Problem
=======
If a node crashed, and failed to write some tlog entries to disk, this is not detected by Arakoon. The node announces it's in sync up to the last entry in the tlogs, even if other nodes diverged while the node was offline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should mention here that usually it should not be possible for this to happen. Usually means: fsync set to true (which is the default) and none of the layers below (file system, hardware) lie about fsync behaviour.


let f_entry (i, value) =
let validate = validate () in
tlog_coll # log_value_explicit i value ~validate:validate false None >>= fun _ ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write this as tlog_coll # log_value_explicit i value ~validate false None >>= fun () ->
note the ~validate and >>= fun ()

@@ -882,6 +901,8 @@ struct
>>= fun () ->
_insert_updates store updates' kt >>= fun (urs:update_result list) ->
_with_transaction store kt (fun tx -> _incr_i store tx) >>= fun () ->
let cs = Value.checksum_of value in
_with_transaction store kt (fun tx -> _set_checksum store cs tx) >>= fun () ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're already changing this, but as a reminder to myself: set_checksum should happen in the same transaction as the one that bumps the i just above

@domsj
Copy link
Contributor

domsj commented Aug 20, 2014

Should we have some test for the upgrade scenario (as discussed earlier) ?
Ideally a cluster is started with all 1.7 nodes which are then 1 by 1 upgraded to 1.8.
After each node upgrade it should be possible to perform (e.g.) a test_and_set operation.

cp -r pylabs/test/* /opt/qbase3/var/tests/arakoon_system_tests/
mkdir -p /opt/qbase3/lib/python/site-packages/arakoon/
cp src/client/python/*.py /opt/qbase3/lib/python/site-packages/arakoon/
sudo mkdir -p /opt/qbase3/apps/arakoon/bin/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These shouldn't be required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants