Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate snapshotting into core #1344

Merged
merged 17 commits into from
Jun 11, 2024
Merged

Integrate snapshotting into core #1344

merged 17 commits into from
Jun 11, 2024

Conversation

gefjon
Copy link
Contributor

@gefjon gefjon commented Jun 6, 2024

Description of Changes

This PR integrates snapshotting into RelationalDB.

In this initial version, we take a snapshot every 1 million txes written to the commitlog.

Part of this involves making the committed_state module pub(crate) where previously it was private.
In order to preserve the previous visibility of its members, anything that was previously pub is now pub(super).

Notable TODOs, some or all of which may be left as future work:

  • Mark snapshots newer than the durable_tx_offset as invalid when restoring.
  • Write a test that exercises snapshot capture and restore. This is challenging because of the large SNAPSHOT_INTERVAL; unless/until we make that configurable, such a test would have to commit 1 million txes.
  • Make SNAPSHOT_INTERVAL configurable.

API and ABI breaking changes

Nothing broken, but we expose a new API and on-disk format which we will have to maintain in the future.

Expected complexity level and risk

4 - bootstrapping and replaying a database is complex and error-prone, and this PR adds significantly more surface area to that codepath which can break.

Testing

Describe any testing you've done, and any testing you'd like your reviewers to do,
so that you're confident that all the changes work as expected!

  • Manual testing in Phoebe/snapshot/naive capture #1279 .
  • Edit the definition of SNAPSHOT_INTERVAL to a small number, possibly 1000, then publish your favorite module, run it to capture several snapshots, then restart, observe the host logs about the snapshot being restored, and verify the state is correct.
    • I did this, but you should too.
  • BitCraft bot test long enough to generate snapshots.

This commit integrates snapshotting into `RelationalDB`.

In this initial version, we take a snapshot every 1 million txes written to the commitlog.

Part of this involves making the `committed_state` module `pub(crate)`
where previously it was private.
In order to preserve the previous visibility of its members,
anything that was previously `pub` is now `pub(super)`.

Notable TODOs, some or all of which may be left as future work:

- Mark snapshots newer than the `durable_tx_offset` as invalid when restoring.
- Write a test that exercises snapshot capture and restore.
  This is challenging because of the large `SNAPSHOT_INTERVAL`;
  unless/until we make that configurable, such a test would have to commit 1 million txes.
- Make `SNAPSHOT_INTERVAL` configurable.
@gefjon gefjon mentioned this pull request Jun 6, 2024
2 tasks
Durable DBs have a new dependency on Tokio in order to run the snapshot worker.
This was causing tests to fail
because the std test harness does not install a Tokio runtime.

This commit fixes the test failures by installing a Tokio runtime
around every test which constructs a durable DB.
@gefjon
Copy link
Contributor Author

gefjon commented Jun 6, 2024

I just ran a test with my counter-module, which is:

use spacetimedb::{schedule, spacetimedb};

#[spacetimedb(table)]
pub struct Counter {
    #[primarykey]
    id: u64,
    count: u64,
}

#[spacetimedb(init)]
pub fn init() {
    Counter::insert(Counter { id: 0, count: 0 }).unwrap();
}

#[spacetimedb(reducer)]
pub fn count(repeat: bool) {
    let mut current = Counter::filter_by_id(&0).unwrap();
    current.count += 1;
    log::info!("Counted to {}", current.count);
    Counter::update_by_id(&0, current);
    if repeat {
        schedule!("0ms", count(true));
    }
}

The test was:

  • Build StDB with SNAPSHOT_FREQUENCY = 1000.
  • Run:
    for i in {0..3000}; do spacetime call counter count false; done
  • Interrupt the above when I got bored, which turned out to be at 1989.
  • Run:
    spacetime sql counter 'SELECT * FROM Counter'
    and see 1989.
  • Restart StDB and run the above query again, see the same value.

This worked and the host logged appropriately about restoring from a snapshot (which took ~500 us to capture and < 1 ms to read and restore, since it contained only 8 pages).

Curiously, significantly more TXes happened than I was expecting. I made 1989 calls to the reducer. Including connect/disconnect calls, this amounts to ~6000 TXes. But the tx offset of my snapshot was 7000, and the durable_tx_offset at restart time was 7956. The astute reader will notice that 7956 / 1989 = 4, meaning each reducer call via the CLI resulted in 4 mutable TXes, not the 3 I was expecting.

I'm not worried about this, but I am confused and intrigued.

@kim
Copy link
Contributor

kim commented Jun 6, 2024

each reducer call via the CLI resulted in 4 mutable TXes, not the 3 I was expecting.

This could be the extra disconnect tx introduced in #1288

@gefjon
Copy link
Contributor Author

gefjon commented Jun 6, 2024

each reducer call via the CLI resulted in 4 mutable TXes, not the 3 I was expecting.

This could be the extra disconnect tx introduced in #1288

Oh yeah, that'd do it. Thanks Kim!

My previous attempt to fix the tests failed to notice that
we already have a Tokio runtime in all the `TestDB::durable` tests.
This commit uses that instead of spawning a second one.
@@ -60,6 +65,7 @@ pub struct RelationalDB {
// TODO(cloutiertyler): This should not be public
pub(crate) inner: Locking,
durability: Option<Arc<dyn Durability<TxData = Txdata>>>,
snapshots: Option<Arc<SnapshotWorker>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
snapshots: Option<Arc<SnapshotWorker>>,
snapshotter: Option<Arc<SnapshotWorker>>,

perhaps?

I wonder if this should be owned by the RelationalDB or if it would just be better to have this be something held by the Host or DatabaseInstanceContext and to put maybe_do_snapshot in there as well. Although I haven't through through exactly who should call maybe_do_snapshot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes the most sense to store in the RelationalDB because:

  • It acts a lot like the Durability, which is also owned by the RelationalDB.
  • Opportunities to call maybe_do_snapshot become increasingly tenuous as you get more abstracted than this.
  • I would prefer not to leak the CommittedState, or even the Locking, any further than this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed all the snapshots vars and members to either snapshot_worker or snapshot_repo as appropriate.

crates/core/src/host/host_controller.rs Outdated Show resolved Hide resolved
crates/core/src/db/relational_db.rs Show resolved Hide resolved
Some((durability.clone(), disk_size_fn)),
Some(snapshot_repo),
)?
.apply(durability);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to determine the ConnectedClients entirely from st_clients, as it appears right after this point.

Consider: the database crashed right after a snapshot, so the history suffix is empty -- but there might be connected clients in st_clients. Before we had snapshotting, an empty history meant an empty database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the case now that I've "rebased" onto latest master, including st_clients. Please verify that I'm doing this correctly.

Copy link
Contributor

@kim kim Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply_history returns the unmatched __connect__s during replay -- but what I meant was, what if the snapshot's tx offset is the last offset in the commitlog (hence the history is empty)?

Arguably an edge case, so fine by me to merge and fix in a later patch (I can submit one if you want).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I assumed it would read the connected clients out of the table. I thought we had done away with groveling the logs for connected clients. I think a follow-up would be good, and if you're willing to do so, that would be great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates/table/src/table.rs Show resolved Hide resolved
crates/snapshot/src/lib.rs Show resolved Hide resolved
crates/core/src/db/datastore/system_tables.rs Outdated Show resolved Hide resolved
crates/core/src/db/relational_db.rs Show resolved Hide resolved
..
} = *committed_state;

if let Err(e) = snapshot_repo.create_snapshot(tables.values_mut(), blob_store, tx_offset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just an observation.... Nothing to do here: Incidentally, the order of the tables will be deterministic due to IntMap<TableId, _>, but we do not rely on this determinism and .reconstruct_tables(..) will yield BTreeMap<TableId, _> which would have resolved any non-determinism due to the stored order.)

crates/core/src/db/relational_db.rs Show resolved Hide resolved
crates/core/src/db/relational_db.rs Outdated Show resolved Hide resolved
crates/core/src/db/relational_db.rs Outdated Show resolved Hide resolved
@gefjon
Copy link
Contributor Author

gefjon commented Jun 10, 2024

Rebased and performed manual testing. Appears to still work.

Apparently gets installed in our `thread_spawn_handler` for Rayon.

Possibly this happened recently, and when I wrote this change it didn't happen?
@Centril
Copy link
Contributor

Centril commented Jun 10, 2024

Nothing new to add wrt. latest changes :)

@@ -64,6 +69,7 @@ pub struct RelationalDB {

inner: Locking,
durability: Option<Arc<dyn Durability<TxData = Txdata>>>,
snapshot_worker: Option<Arc<SnapshotWorker>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be best to move this outside of the RelationalDB (DatabaseEngine future name), to something slightly higher level (e.g. DatabaseInstanceContext). That way we maintain that RelationalDB remains runtime independent. I don't feel it strongly enough to block the review, but it makes me uncomfortable to have the DatabaseEngine/RelationalDB dependent on tokio.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I think the ability to snapshot itself would remain inside of RelationalDB, I'm specifically just referring to the scheduling/triggering system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Durability, at least as implemented by Local, is also dependent on Tokio. As with the durability, we can avoid depending on Tokio by not supplying a SnapshotRepository when constructing the RelationalDB.

}
}

fn take_snapshot(committed_state: &RwLock<CommittedState>, snapshot_repo: &SnapshotRepository) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reference to the scheduling comment above, I would imagine that this function would be added to the RelationalDB type.

.into());
}
let start = std::time::Instant::now();
let res = Locking::restore_from_snapshot(snapshot);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will eventually need to be a datastore trait function.

@bfops bfops added the release-any To be landed in any release window label Jun 10, 2024
Copy link
Contributor

@cloutiertyler cloutiertyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now looks good to me. 🙏

crates/table/src/table.rs Outdated Show resolved Hide resolved
crates/table/src/table.rs Outdated Show resolved Hide resolved
bfops pushed a commit that referenced this pull request Jun 10, 2024
bfops pushed a commit that referenced this pull request Jun 10, 2024
@joshua-spacetime joshua-spacetime added release-0.10 and removed release-any To be landed in any release window labels Jun 10, 2024
Copy link
Contributor

@Shubham8287 Shubham8287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked table_size, row_count related logic. Looks good.

kim added a commit that referenced this pull request Jun 11, 2024
When opening a `RelationalDB`, determine the set of dangling clients
(`ConnectedClients`) by scanning the `st_clients` table instead of
tracking unmatched `__identity_connected__` calls during history replay.

We left the replay tracking in place in #1288, treating the commitlog as
the sole source of truth. With #1344 (snapshotting), this is no longer
correct: the snapshot may contain rows in `st_clients`, but leave no
history suffix for replay.
@gefjon gefjon added this pull request to the merge queue Jun 11, 2024
Merged via the queue into master with commit 6c45e76 Jun 11, 2024
9 checks passed
Centril pushed a commit that referenced this pull request Jun 11, 2024
When opening a `RelationalDB`, determine the set of dangling clients
(`ConnectedClients`) by scanning the `st_clients` table instead of
tracking unmatched `__identity_connected__` calls during history replay.

We left the replay tracking in place in #1288, treating the commitlog as
the sole source of truth. With #1344 (snapshotting), this is no longer
correct: the snapshot may contain rows in `st_clients`, but leave no
history suffix for replay.
bfops pushed a commit that referenced this pull request Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants