Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sync latest gens in-memory for instant invalidation #676

Merged
merged 10 commits into from
Jun 29, 2022
Merged

Conversation

sborrazas
Copy link
Contributor

This is done using the new MemStore which updates the in-memory
database changes using the GbTree structure.

@sborrazas sborrazas requested review from thepiwo and jyeshe May 12, 2022 11:41
@sborrazas sborrazas self-assigned this May 12, 2022
Base automatically changed from db-store to master May 12, 2022 16:26
Copy link
Member

@jyeshe jyeshe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and the amount of unit tests helping to understand the kind of state transactions are expected.

Looking from a different angle there were some refactoring opportunities and would add one PoC test as I have never used persistent_term to write big chunks of data, so depending on the number and size of blocks that the MemStore will keep, it would be needed to investigate if there is any limit for the persistent_term.put size/frequency.

With the iEx tried these two amounts of data:
1)

iex(1)> list = for i <- 1..100_000_000, do: i; nil
nil
iex(2)> :timer.tc(fn -> :persistent_term.put(State, %{list: list}) end)
literal_alloc: Cannot allocate 1600000088 bytes of memory (of type "literal").
iex(1)> list = for i <- 1..50_000_000, do: i; nil                      
nil
iex(2)> :timer.tc(fn -> :persistent_term.put(State, %{list: list}) end)
{411536, :ok}

(1) maybe fixable with vm args if needed, (2) took half of a second, doing only this write, so if the data per write/put is bigger than that or the processing in parallel affects the performance of persistent_term it would become a bottleneck with the time of writing taking more time than the time of producing data.

Comment on lines +5 to +6
Uses a GbTree implementation for fast access to the keys, plus, being able to
iterate over them both forwards and backwards.
Copy link
Member

@jyeshe jyeshe May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without thinking on Hyperchains, it might be a good choice as the data is being written with :persistent_term which has the advantage over ets of reads without memory copy. It has also the benefit of a better control when the data is available for reads (with ets would require a GenServer).

lib/ae_mdw/db/mem_store.ex Show resolved Hide resolved
E.g.
```
handler = EventHandler.init()
{:ok, handler2} = EventHandler.process_event({:new_height, 123456}, handler)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit pick naming issue as stateful event handlers are state machines. This leads to thinking about them so that perhaps it's an opportunity for representing/modeling it with a GenStateMachine. This could help separating the state of the machine from the db_state and mem_state making it explicit what part of the whole system state matters for the sync event handling (for the transition of the state machine). It looks like it would be feasible if the spawning is extracted out of the EventHandler in a way that instead of receiving a custom spawner, the state machine informs that there is something to be spawned (with the potential benefit of spawning without the need of a spawner abstraction or at least of using it, the spawner where it's instantiated).

@sborrazas
Copy link
Contributor Author

@jyeshe I've now added the state machine implementation using GenStateMachine on 44115da

@sborrazas sborrazas requested a review from jyeshe May 24, 2022 22:05
Copy link
Member

@jyeshe jyeshe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code readability and maintainability was improved. Some refactoring or fixes suggested on encapsulation and process monitoring.

@@ -1,181 +1,353 @@
defmodule AeMdw.Sync.Server do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is an opportunity for a meaningful name as the behavior is being changed. In case it's difficult to find a name that describes the purpose of the module it's commonly a situation where the module has non cohesive responsibilities.

{:new_height, height()}
| {:db_done, gens_per_min()}
| {:mem_done, gens_per_min()}
| {:fork, height()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clean up or review the purpose of :fork event? It is not making any transition as described by the diagram above and besides that it only reads and writes fork_height which is not used in any other transition.

) do
gens_per_min = calculate_gens_per_min(gens_per_min, new_gens_per_min)
# CHECK FORK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you document why the fork from Node notification is not sufficient and not used anymore?


new_state
end)
end

defp spawn_with_monitor(fun) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be missing the monitor part... in order to make it a code more structured in Elixir way (in a sense that other devs know already what to expect), a Task.Supervisor.async_nolink could be used as it already monitors that task and despite the processing be very similar it provides more conveniences in terms of introspection and debugging (spawn). It also documents what to do next in case when the spawned process is finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Done


def handle_info({:DOWN, ref, :process, _, reason}, %__MODULE__{sync_mon_ref: ref} = s) do
notify_operators(s.operators, reason)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could improve the notification feature rather than remove it. It might become helpful to be notified of unexpected behaviors with debugging info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but right now this functionality is of no use at all, writing files and sending emails with the mail command. I think we should work on an actual useful feature with a module called ErrorNotifier or some sort of more descriptive name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

%__MODULE__{restarts: restarts} = state_data
)
when restarts < @max_restarts do
Log.info("Mem Sync.Server error: #{inspect(reason)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After monitoring it's better use symmetrical calls including the demonitor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, implemented using Task.Supervisor as suggested

@spec handle_event(:internal, internal_event(), state(), state_data()) ::
:gen_statem.event_handler_result(state())
def handle_event(:cast, {:new_height, chain_height}, :initialized, state_data) do
actions = [{:next_event, :internal, :check_sync}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A constant (module attribute) would be helpful to know what actions are passed in :next_state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by this, there's a typespec for all possible states and events. Can you elaborate please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a small detail and on a personal taste for bigger values that are repeated I usually do something like:
@check_sync_next_event [{:next_event, :internal, :check_sync}]

Comment on lines 106 to 107
db_state: db_state,
mem_state: mem_state,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is a way to avoid exposing the whole database and in memory state to the State machine, having in the state_data only the state needed for the transitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal would be separate what is state needed to control the sync (the state machine state_data) from the state needed to execute the sync. But with the last refactor the use of db_state and mem_state are better encapsulated 👍

I would add only a comment explaining on the state machine notes that it needs only to compare the height of db_state and mem_state with the chain_height in order to decide what is the next transition (to the state syncing_db or syncing_mem).

@sborrazas sborrazas marked this pull request as ready for review June 10, 2022 13:11
@sborrazas sborrazas requested a review from jyeshe June 11, 2022 11:37
Copy link
Member

@jyeshe jyeshe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts:

  • Similarly to the controllers that were updated to read from in-memory state using the recent StatePlug, the async tasks would also need some way to be enqueued (written and read from) in-memory state.
  • Fork handling is missing on Watcher (subscriber) and Server
  • Since Server.gens_per_min() is called after each status controller request it would be better to move gens_per_min out of the syncing server state in order to avoid :gens_per_min messages competing with :new_height on Server mailbox.

@sborrazas
Copy link
Contributor Author

Some thoughts:

  • Similarly to the controllers that were updated to read from in-memory state using the recent StatePlug, the async tasks would also need some way to be enqueued (written and read from) in-memory state.

Async-tasks are not persisted in-memory because the time it takes to process them is unknown, and by the time they are done, the memory contents might have completely changed. I think this is why we made them async tasks in the first place.

  • Fork handling is missing on Watcher (subscriber) and Server

For every new key block that appears in the chain (either by a new kb added or by a fork) we re-process all of the latest generations, regardless of whether it was caused by a fork or not. This is why there isn't really a need for determining if it's a fork or not, we just handle both the same way

  • Since Server.gens_per_min() is called after each status controller request it would be better to move gens_per_min out of the syncing server state in order to avoid :gens_per_min messages competing with :new_height on Server mailbox.

As per OTP guidelines it's not recommended to have messages that take long to process, which is why all messages that arrive on Sync.Server are instantly processed, including :new_height and :gens_per_min.

@sborrazas sborrazas requested a review from jyeshe June 13, 2022 18:40
@jyeshe
Copy link
Member

jyeshe commented Jun 13, 2022

Some thoughts:

  • Similarly to the controllers that were updated to read from in-memory state using the recent StatePlug, the async tasks would also need some way to be enqueued (written and read from) in-memory state.

Async-tasks are not persisted in-memory because the time it takes to process them is unknown, and by the time they are done, the memory contents might have completely changed. I think this is why we made them async tasks in the first place.

The purpose of the async tasks is to run them in parallel with the sync. After 32c2575 for 1.7.3 the async task are enqueued only after 8 heights. I was waiting the in memory handling to fix this delay in order to restore the previous behaviour of 1.7.2. Opening an issue though.

  • Fork handling is missing on Watcher (subscriber) and Server

For every new key block that appears in the chain (either by a new kb added or by a fork) we re-process all of the latest generations, regardless of whether it was caused by a fork or not. This is why there isn't really a need for determining if it's a fork or not, we just handle both the same way

What if the fork happens during the processing of the in-memory first height backwards? Shouldn't the data that was just committed be really invalidated? (: I am giving the example of one height of difference but the fork might be even detected with higher delay (after more heights).

  • Since Server.gens_per_min() is called after each status controller request it would be better to move gens_per_min out of the syncing server state in order to avoid :gens_per_min messages competing with :new_height on Server mailbox.

As per OTP guidelines it's not recommended to have messages that take long to process, which is why all messages that arrive on Sync.Server are instantly processed, including :new_height and :gens_per_min.

It would be to avoid concurrency between multiple /status requests and any syncing process message as the GenServers can process only one message at a time while ETS allows multiple reads. The idea would would be to reduce a low risk of bottleneck (considering a change or a condition that makes any message slower) to none risk.

Copy link
Member

@jyeshe jyeshe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note on required invalidation and optional :gens_per_minute improvement

@jyeshe
Copy link
Member

jyeshe commented Jun 13, 2022

Note on required invalidation and optional :gens_per_minute improvement

An example/history of sync and fork that can happen:

  1. in memory sync 80, 81, 82, 83, 84, 85, 86, 87
  2. 80 to 87 heights are committed (persisted to database) and during the processing of 88 the chain detects that 87 doesn't belong to the main/rightful chain.

@sborrazas
Copy link
Contributor Author

@jyeshe I've now updated the gens_per_min to be stored into a global persistent_term readable from anywhere as you requested.

Regarding the fork handling, like I mentioned previously the entire in-memory storage is re-processed for every change in the chain, including forks and new blocks, so that's handled already

@sborrazas sborrazas requested a review from jyeshe June 14, 2022 13:39
Copy link
Member

@jyeshe jyeshe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor and for clarifying that a fork cannot happen on Aeternity blockchain after 10 heights from the top. I was still thinking of:
https://cointelegraph.com.br/news/ethereum-beacon-chain-experiences-7-block-reorg-what-s-going-on

Copy link
Collaborator

@thepiwo thepiwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks really good from what I can see

Copy link
Member

@jyeshe jyeshe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for resolution of:
#723 (comment)

Or a explicit requirement that allows this delay.

This is done using the new `MemStore` which updates the in-memory
database changes using the GbTree structure.
This allows more explicitness in switching states
Copy link
Member

@jyeshe jyeshe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed pending points documented here: "run in-memory aex9 tasks asynchronously" #749

@sborrazas sborrazas merged commit af95379 into master Jun 29, 2022
@sborrazas sborrazas deleted the mem-store branch June 29, 2022 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

None yet

3 participants