Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix undefined behavior on simultaneous write in daemon state file #326

Merged
merged 42 commits into from
Jun 6, 2024

Conversation

Buckram123
Copy link
Contributor

@Buckram123 Buckram123 commented Feb 12, 2024

Simultaneous daemon state file writes currently are not handled. This PR aims to handle such cases.
So there are 2 scenarios when simultaneous write can happen:

  1. Another program that uses daemon state in the process of writing in this file
  2. Another daemon of this program tries to write into a file

How this PR tries to fix it:
For the 1.

  • We lock a file with file-lock, which ensures another program won't be able to write in this file (We don't lock read-only daemonstates)

For the 2.

  • We create global state of locked files by this program with once_cell::sync::Lazy.
  • We disallow other daemons from any thread to access this lock, unless it's cloned directly from the daemon, who originally locked it.
  • As soon as every daemon dropped state we unlock file

Beyond safety this PR adds few optimizations:

  • Write json to a file only when all of the state clones done with using this state file
  • Read from a shared state instead of file, when it's in a locked state

Closes ORC-75, Closes ORC-84 Closes ORC-140

Copy link

cloudflare-workers-and-pages bot commented Feb 12, 2024

Deploying cw-orchestrator with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7eae16a
Status:⚡️  Build in progress...

View logs

@Buckram123
Copy link
Contributor Author

It also can corrupt state, in such cases:

{
  "juno": {
    "juno-1": {
      "code_ids": {},
      "test": {
        "test0": "test-0",      }  "te}
  :
"test-3"
      }
    }
  }
}

@Buckram123 Buckram123 changed the title MVE bug: Undefined behavior on simultaneous read or write MRE bug: Undefined behavior on simultaneous read or write Feb 12, 2024
Copy link

codecov bot commented Feb 12, 2024

Codecov Report

Attention: Patch coverage is 80.15267% with 52 lines in your changes missing coverage. Please review.

Project coverage is 56.6%. Comparing base (9c43514) to head (a795bbe).

Additional details and impacted files
Files Coverage Δ
cw-orch-daemon/src/channel.rs 83.7% <100.0%> (+0.7%) ⬆️
cw-orch-daemon/src/error.rs 57.1% <ø> (ø)
cw-orch-daemon/src/queriers/cosmwasm.rs 51.5% <100.0%> (ø)
cw-orch-daemon/src/sync/core.rs 78.1% <100.0%> (+3.1%) ⬆️
cw-orch-daemon/src/tx_builder.rs 71.2% <100.0%> (ø)
packages/clone-testing/src/state.rs 100.0% <100.0%> (ø)
cw-orch-daemon/src/sync/builder.rs 88.6% <63.6%> (-5.6%) ⬇️
cw-orch-daemon/src/json_lock.rs 92.1% <92.1%> (ø)
cw-orch-daemon/src/builder.rs 69.4% <78.5%> (+1.6%) ⬆️
cw-orch-daemon/src/core.rs 58.6% <33.3%> (+0.9%) ⬆️
... and 3 more

... and 1 file with indirect coverage changes

@Kayanski
Copy link
Contributor

This is particularly crucial when using daemons that interact with different chains !

@Buckram123
Copy link
Contributor Author

Buckram123 commented Feb 12, 2024

This is particularly crucial when using daemons that interact with different chains !

This is also needed when interacting with 1 chain

@Buckram123
Copy link
Contributor Author

It also can corrupt state, in such cases

Update: seems like it was because of mid-way aborted thread.

@Buckram123
Copy link
Contributor Author

Buckram123 commented Feb 13, 2024

@Kayanski wrote a quick proof of concept implementation, that don't break any existing test as well as succeeds the new multi-threaded ones

@CyberHoward
Copy link
Contributor

I'm a bit confused by the global Mutex. It seems like you're using a locking library (lock-file) but then added your own locking solution on top. Why did you do that? Could you explain your implementation (preferably in the description of the PR)?

@Buckram123 Buckram123 changed the title MRE bug: Undefined behavior on simultaneous read or write Fix undefined behavior on simultaneous read or write Feb 14, 2024
@Buckram123 Buckram123 changed the title Fix undefined behavior on simultaneous read or write Fix undefined behavior on simultaneous write Feb 14, 2024
@Buckram123
Copy link
Contributor Author

I'm a bit confused by the global Mutex. It seems like you're using a locking library (lock-file) but then added your own locking solution on top. Why did you do that? Could you explain your implementation (preferably in the description of the PR)?

Updated PR description! This PR originally had only failing tests with missleading error "EOF while parsing a value", which you can find in cw-orch-daemon/tests/daemon_state.rs, to discuss how we want to handle it.

@Buckram123 Buckram123 changed the title Fix undefined behavior on simultaneous write Fix undefined behavior on simultaneous write in daemon state file Feb 14, 2024
@CyberHoward
Copy link
Contributor

See clippy err

@Kayanski Kayanski mentioned this pull request Apr 9, 2024
@Buckram123 Buckram123 marked this pull request as ready for review April 19, 2024 14:32
Copy link
Contributor

@CyberHoward CyberHoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand this PR correctly the JSON state file will be locked when the Daemon is constructed? Any other threads that attempt to construct a Daemon will be blocked until this original Daemon is dropped?

Copy link
Contributor

@CyberHoward CyberHoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure about moving the channel into the Sender.

IMO it would make more sense to rename Sender -> Signer and then have the Daemon do all the transmitting. That way we can also more easily update the signing logic (like adding multisig or MPC support)

Buckram123 and others added 2 commits May 15, 2024 12:05
Co-authored-by: CyberHoward <88450409+CyberHoward@users.noreply.github.com>
Co-authored-by: CyberHoward <88450409+CyberHoward@users.noreply.github.com>
Copy link
Contributor

@CyberHoward CyberHoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
(This is a breaking change)

@Buckram123
Copy link
Contributor Author

@Kayanski @CyberHoward On poisoned Mutex state should we allow saving state in case user called manually force_write? Alternatively we can add parameter ignore_poison to give more control

@Kayanski
Copy link
Contributor

Kayanski commented Jun 4, 2024

@Kayanski @CyberHoward On poisoned Mutex state should we allow saving state in case user called manually force_write? Alternatively we can add parameter ignore_poison to give more control

Let's fail if there's a poison, we can develop the feature if someone asks later

Copy link
Contributor

@Kayanski Kayanski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes to clarify things and nitify others

cw-orch-daemon/src/builder.rs Outdated Show resolved Hide resolved
cw-orch-daemon/src/builder.rs Outdated Show resolved Hide resolved
cw-orch-daemon/src/builder.rs Outdated Show resolved Hide resolved
cw-orch-daemon/src/builder.rs Outdated Show resolved Hide resolved
cw-orch-daemon/src/sync/core.rs Show resolved Hide resolved
cw-orch-daemon/src/tx_builder.rs Outdated Show resolved Hide resolved
cw-orch-daemon/src/state.rs Outdated Show resolved Hide resolved
cw-orch-daemon/src/state.rs Outdated Show resolved Hide resolved
cw-orch-daemon/src/state.rs Show resolved Hide resolved
cw-orch-daemon/src/state.rs Show resolved Hide resolved
cw-orch-daemon/src/state.rs Show resolved Hide resolved
@Kayanski Kayanski merged commit 0a2f299 into main Jun 6, 2024
12 of 15 checks passed
@Kayanski Kayanski deleted the daemon-state-ub branch June 6, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants