Improve mem3 #5552

nickva · 2025-05-29T20:19:39Z

Improve cluster startup behavior as three separate commits:

Sometimes if mem3 is not initialized yet but rexi could start calling us, and so we have to handle mem3_shards ets tables not being created yet. Previously, we handled the case of missing ets tables by catching error:badarg in places like for_db/2, but then forgot to handle it the lower level maybe_spawn_shard_writer/3 function, so call was still crashing and making a mess in the logs. This is more apparent on startup with nodes being upgraded in a busy cluster.
Improve mem3 supervisor. There was a question there left from years back if order is important, it turns out it is. So fix the order. See the reasoning in the commit message. Also since order is important we want to use a rest_for_one supervisor not a one_for_one.
Cleanup node startup logging a bit. We were emitting duplicate logs and also at level that's a bit higher than needed.

Previously, we handled the case of missing ets tables by catching `error:badarg` in places like `for_db/2`, but then forgot to handle it the lower level `maybe_spawn_shard_writer/3` function, so call was still crashing and making a mess in the logs. This is more apparent on startup with nodes being upgraded in a busy cluster.

rnewson

very nice.

davisp

+1

src/mem3/src/mem3_sup.erl

The answer to the question if order is important in the comments is: "yes" for most of the workers. So fix their order and use the `rest_for_one` [1] strategy. * `mem3_events` gen_event should be started before all the others * `mem3_nodes` gen_server is needed so everyone can query `mem3:nodes()` from it * `mem3_sync_nodes` needs to run before `mem3_sync` and `mem3_sync_event` so they can both can call `mem3_sync_nodes:add/1` * `mem3_distribution` force connects nodes from `mem3:nodes()`, so start it before `mem3_sync` since `mem3_sync:initial_sync/0` expects the connected nodes to be there when calling `mem3_sync_nodes:add(nodes())` * `mem3_sync_event_listener` has to start after `mem3_sync` so it can all `mem3_sync:push/2` * `mem3_seeds` and `mem3_reshard_sup` can wait till the end as they will spawn background work that can go for a while: seeding system dbs from other nodes or running resharding jobs. [1] https://www.erlang.org/doc/system/sup_princ.html#rest_for_one

Previously, we logged these events as notice but let's lower them to info. On startup or node upgrades they'll always be logged and they are not that interesting just add noise. Also, the `cluster_unstable/1` callback in `rexi_server_mon` log was duplicated so remove one of them.

rnewson approved these changes May 29, 2025

View reviewed changes

davisp approved these changes May 29, 2025

View reviewed changes

jaydoane approved these changes May 29, 2025

View reviewed changes

src/mem3/src/mem3_sup.erl Show resolved Hide resolved

nickva added 2 commits May 29, 2025 17:22

nickva force-pushed the improve-mem3 branch from d699cf2 to 6ed8fe7 Compare May 29, 2025 21:22

nickva merged commit 5754bcf into main May 29, 2025
23 checks passed

nickva deleted the improve-mem3 branch May 29, 2025 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve mem3 #5552

Improve mem3 #5552

Uh oh!

nickva commented May 29, 2025

Uh oh!

rnewson left a comment

Uh oh!

davisp left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Improve mem3 #5552

Improve mem3 #5552

Uh oh!

Conversation

nickva commented May 29, 2025

Uh oh!

rnewson left a comment

Choose a reason for hiding this comment

Uh oh!

davisp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants