Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pt 159406473 sc continuous chain monitoring 4 #2431

Merged
merged 33 commits into from Jun 5, 2019

Conversation

Projects
None yet
3 participants
@uwiger
Copy link
Contributor

commented May 15, 2019

See PT #159406302 and PT #159406473

Supersedes #2407 after having extracted functionality into separate PRs and then rebased on top of them.

@lucafavatella
Copy link
Contributor

left a comment

Not planned for this release so blocking.

@uwiger uwiger force-pushed the PT-159406473-sc-continuous-chain-monitoring-4 branch from bbb8685 to 238b661 May 24, 2019

%% This would indicate a fork switch. TODO: detect channel status
%% and respond accordingly. For now. Panic and terminate
report(info, zombie_channel, D),
close(zombie_channel, log(rcv, msg_type(Msg), Msg, D));

This comment has been minimized.

Copy link
@velzevur

velzevur May 24, 2019

Contributor

This is already tracked as Handle zombie state channels. Ok

@uwiger uwiger force-pushed the PT-159406473-sc-continuous-chain-monitoring-4 branch from 944be07 to bcb18d0 May 28, 2019

@lucafavatella

This comment has been minimized.

Copy link
Contributor

commented May 28, 2019

From test_minerva CI job:

%%% aesc_fsm_SUITE ==> all_tests.transactions.channel_detects_close_solo: FAILED
%%% aesc_fsm_SUITE ==> {timeout,[{aesc_fsm_SUITE,receive_from_fsm,6,
                          [{file,"/home/builder/aeternity/apps/aechannel/test/aesc_fsm_SUITE.erl"},
                           {line,1613}]},
          {aesc_fsm_SUITE,receive_from_fsm,5,
                          [{file,"/home/builder/aeternity/apps/aechannel/test/aesc_fsm_SUITE.erl"},
                           {line,1601}]},
          {aesc_fsm_SUITE,settle_,5,
                          [{file,"/home/builder/aeternity/apps/aechannel/test/aesc_fsm_SUITE.erl"},
                           {line,1271}]},
          {aesc_fsm_SUITE,channel_detects_close_solo,1,
                          [{file,"/home/builder/aeternity/apps/aechannel/test/aesc_fsm_SUITE.erl"},
                           {line,670}]},
          {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1546}]},
          {test_server,run_test_case_eval1,6,
                       [{file,"test_server.erl"},{line,1062}]},
          {test_server,run_test_case_eval,9,
                       [{file,"test_server.erl"},{line,994}]}]}

EDIT: test_mnesia_leveled has same failure (BTW it is equivalent to job for lima).

@@ -22,6 +22,7 @@

init(Req, _Opts) ->
lager:debug("init(~p, ~p)", [Req, _Opts]),
process_flag(trap_exit, true),

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 28, 2019

Contributor

Based on the cowboy doc, this is not needed for the terminate to be called. Could you please elaborate on why it addresses?

My concern is unintended behaviour change (in calling process) hence distinct behaviour in error case in tests and user experience.

This comment has been minimized.

Copy link
@uwiger

uwiger May 28, 2019

Author Contributor

This is to ensure that the ws handler is killed immediately through cascading exit before being able to process the termination info message sent by the fsm. That is, the fsm sends an info msg before terminating, but the exit may propagate to the linked ws handler, killing it, before it has time to receive and forward that message. Trapping exits ensures that it will be able to process all messages sent by the fsm before it gets to the 'DOWN' message.


CloseSolo =
fun(CloserConn, CloserPrivKey, OtherConn, OtherPrivKey) ->
ws_send_tagged(CloserConn, <<"channels.close_solo">>, #{}, Config),

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 28, 2019

Contributor

Where are these methods meant to be documented? In intended usage of channels API? It seems doc requires integration.

This comment has been minimized.

Copy link
@uwiger

uwiger May 28, 2019

Author Contributor

Working on the documentation additions.

<<"channels.close_solo_tx">> -> close_solo_tx;
<<"channels.close_solo_sign">> -> close_solo_tx;
<<"channels.slash_tx">> -> slash_tx;
<<"channels.shash_sign">> -> slash_tx;

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 28, 2019

Contributor
Suggested change
<<"channels.shash_sign">> -> slash_tx;
<<"channels.slash_sign">> -> slash_tx;

?

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 28, 2019

Contributor

I admit I did not understand yet where these different binaries are hooked (I see similar binaries e.g. <<"channels.slash_tx">>, <<"channels.shash_sign">>, <<"channels.slash">>) and which test is meant to fail if any do not match.

This comment has been minimized.

Copy link
@uwiger

uwiger May 28, 2019

Author Contributor

:)

@@ -39,7 +45,13 @@
<<"channels.update_ack">> -> update_ack;
<<"channels.shutdown_sign">> -> shutdown;
<<"channels.shutdown_sign_ack">> -> shutdown_ack;
<<"channels.leave">> -> leave
<<"channels.leave">> -> leave;

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 28, 2019

Contributor

I see this <<"channels.leave">> is the only element in METHOD_TAG but not in METHOD_SIGNED. Is this intended?

This comment has been minimized.

Copy link
@uwiger

uwiger May 28, 2019

Author Contributor

The leave action doesn't require signing.

@uwiger uwiger force-pushed the PT-159406473-sc-continuous-chain-monitoring-4 branch from a47987c to 5fdeeb2 May 29, 2019

check_closing_event_(Ch, #data{state = St}) ->
Height = cur_height(),
check_closing_event(Info, D) ->
aec_db:dirty(fun() ->

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 29, 2019

Contributor

I see the only DB operation inside this dirty context is aec_chain:get_header that anyway adds transactional context. If optimized DB read here is really necessary, I wonder: why not removing need for handling transaction inside dirty context by cutting one layer of abstraction and perform read in dirty activity directly? Something like:

check_closing_event(Info, D) ->
    {value, Hdr} = aec_db:find_header(async_dirty, maps:get(block_hash, Info)), %% TODO `aec_db:find_header/2`.
    try check_closing_event_(maps:get(tx, Info), maps:get(channel, Info), Hdr, D)
    catch
        error:E ->
            ?LOG_CAUGHT(E),
            error(E)
    end.

If need be, read can be made conditional to aesc_channels:is_solo_closing(Ch).

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 29, 2019

Contributor

I see that calls to aec_db:ensure_activity(async_dirty, ...) are out of scope for this ^^^ comment, as those follow other code path not attempting to nest transaction inside dirty activity.

EDIT: There are transactions nested inside aec_db:ensure_activity(async_dirty, ...). It looks like the intention is keeping transactions in library DB functions while not creating a larger transactional context.

This comment has been minimized.

Copy link
@uwiger

uwiger May 30, 2019

Author Contributor

I see the only DB operation inside this dirty context is aec_chain:get_header that anyway adds transactional context.

Actually, this didn't use to be true, until PR #2477 changed the ?t(Expr) macro in aec_db.erl. I believe this was a mistake. The aec_db read operations shouldn't add a transaction context if there is already a transaction or non-transaction context.

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 31, 2019

Contributor

As this PR is not based on latest master, when merged this PR will keep the ?t macro definition in master.

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella Jun 3, 2019

Contributor

The ?t macro is now aligned to master in 6ca8c38

-spec check_requests_([req()], #st{}, cache(), [req()]) -> #st{}.
check_requests_([#{check_at_height := CheckAt} = R|Reqs], St, C, Acc) ->
lager:debug("CheckAt: R = ~p", [R]),
{TopHeight, C1} = top_height(C),

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 29, 2019

Contributor

Is new cache C1 here unused on purpose?

This comment has been minimized.

Copy link
@uwiger

uwiger May 30, 2019

Author Contributor

No, it's a bug.

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella Jun 4, 2019

Contributor

I see addressed.

@@ -193,7 +217,7 @@ handle_info({'DOWN', Ref, process, Pid, _Reason},
#st{parent_mon_ref=Ref, parent=Pid, econn=EConn}=St) ->
lager:debug("Got DOWN from parent (~p)", [Pid]),
close_econn(EConn),
{stop, normal, St};
{stop, shutdown, St};

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 29, 2019

Contributor

I expect this change to change the terminate reason though without any functional or logging change. Ok.

end.

sleep(T, Ref) ->
receive
{'$gen_call', From, close} = Msg ->

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 29, 2019

Contributor

I see that init calls proc_lib:init_ack, then the function that calls this (delayed initialization) then gen_server:enter_loop. So: while process is trying-and-retrying to establish TCP connection, after each attempt it inspects mailbox for checking if it is the case to give up - i.e. either it was instructed to close (someone called gen_server:call(..., close), and this code assumed implementation of messages generated by gen_server OTP module) or parent died (and this process not killed because processing flag...).

There may be room for simplification though ok.

get_reconnect_params() ->
%% {ConnectTimeout, Retries}
{10000, 30}.
{3000, 40}. %% total 2 minutes

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 29, 2019

Contributor

Timeout and number of retries may be made user configurable though seems ok for now.

case current_depth(TxHash, C) of
{undefined, C1} ->
{R, C1};
{Depth, C1} when Depth >= MinDepth ->
lager:debug("min_depth achieved", []),

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 29, 2019

Contributor

It feels the computation of the current depth does not enforce the expectation that the block hash in which the tx hash was found is an ancestor of the top block hash. So in case of fork the depth computation may be misleading (shorter fork may have more work on it). Though seems good enough. Ok.

This comment has been minimized.

Copy link
@uwiger

uwiger May 30, 2019

Author Contributor

Hmm, when there is a top_changed after a fork switch, if TxHash was in a block that's no longer on the chain, it will be thrown back into the mempool, and current_depth/2 will return {undefined, C1}. Also, the function does check whether its enclosing block is in the main chain.

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella Jun 3, 2019

Contributor

Hmm, when there is a top_changed after a fork switch, if TxHash was in a block that's no longer on the chain, it will be thrown back into the mempool, and current_depth/2 will return {undefined, C1}.

The tx may be included in the new fork too.

Also, the function does check whether its enclosing block is in the main chain.

Correct! I had missed that. (I had probably confused block hash for tx hash.)

Good then.

@@ -194,7 +207,7 @@ patterns() ->
not lists:member(F, [module_info, patterns])].

record_fields(data ) -> record_info(fields, data);
record_fields(w ) -> record_info(fields, w);
record_fields(w ) -> aesc_window:record_fields(w);
record_fields(Other) -> aesc_offchain_state:record_fields(Other).
%% ==================================================================

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella May 30, 2019

Contributor

Is the state machine at line 222 up-to-date? Shall the copy of the FSM in the protocol repo be made canonical, this deleted, and the FSM in the protocol repo enhanced?

uwiger and others added some commits May 15, 2019

Apply suggestions from code review
Co-Authored-By: Luca Favatella <lucafavatella@users.noreply.github.com>
Co-Authored-By: Dimitar Ivanov <dimitar.p.ivanov@gmail.com>
Pt 159406473 sc continuous chain monitoring 4 -- review (#2478)
* Hook heap property-based test to CI (#2468)

* Add aec_db:ensure_activity/1 & support dirty/transaction mixes (#2477)

* Delete unimplemented disconnected channel client FSM state

* Uncomment type spec for channels depth watcher cache

* Delete reference to non-existent state channel_changed

* Make channels client FSM more readable commenting from gen_statem

* Reduce channel FSM state deleting duplicate role field

* Enhance type spec for channels client FSM

* Reduce channel client FSM data splitting reestablish args

* Apply suggestions from code review

Add line breaks + indentation to keep within ca 80 chars width.

Co-Authored-By: Ulf Wiger <ulf@wiger.net>

* Reset `?t` DB transaction macro to before ff6acb8

Addresses #2478 (comment)

See also #2431 (comment)
Pt 159406473 sc continuous chain monitoring 4 -- review (#2486)
* Delete fetching unused option `watcher` from `data.opts`

* Delete conditional definition of requests in watcher - as unused

* Delete opts in watcher - as unused

* Clarify that in channel watcher the scenario is mandatory
Pt 159406473 sc continuous chain monitoring 4 -- review (#2487)
* Make channel watcher stricter - for readability

* Refactor channel watcher: reuse window entry key variable

* Delete invalid default for channel client FSM field log

Based on in the `init` function of the process, that default is
unused.
Pt 159406473 sc continuous chain monitoring 4 -- review (#2488)
* Document entries in aesc_window used in channels code

* Prefer record to tuple in internal watcher logs so to avoid magic indexes

* Fix type of scenario `has_tx` in channel watcher

Also make checkes on scenario more strict.

Using records could make code easier to read and maintain though would
require more invasive changes.

* code changes due to stricter types (#2493)

* take pains to avoid non-existing txs in the txlog

@uwiger uwiger force-pushed the PT-159406473-sc-continuous-chain-monitoring-4 branch from c39748a to 2cf66d6 Jun 5, 2019

, aesc_channels:is_active()
, aesc_channels:locked_until()
, aesc_channels:state_hash() }.
-type ch_status() :: undefined

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella Jun 5, 2019

Contributor

I do not see this undefined used.

This comment has been minimized.

Copy link
@uwiger

uwiger Jun 5, 2019

Author Contributor

It would be the return value of get_basic_ch_status_/3 before the channel object has been created. Ideally, this should never happen, since status isn't checked until the channel is active (I think).

@lucafavatella
Copy link
Contributor

left a comment

What was the intention of including a watch request in the init of the watcher, considering that this request is discarded?

, tx_hash => TxHash
, min_depth => MinDepth
, info => I},
#{ mode => watch

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella Jun 5, 2019

Contributor

I see the init function sends to itself check_status message and places these Reqs in #st.requests. Ok.

handle_info({'DOWN', _, process, Parent, _}, #st{parent = Parent} = St) ->
{stop, normal, St};
handle_info(check_status, St) ->
{noreply, check_status(St)};
{noreply, check_status(top, St)};

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella Jun 5, 2019

Contributor

I see that on reception of check_status message the function check_status/2 is called with top as first argument (later called "scenario"). Ok.

end;
check_status_(#{mode := close, parent := Parent} = R, #st{chan_id = ChId}, C) ->
lager:debug("check_status(type = close, parent = ~p)", [Parent]),
check_requests_([Req|Reqs], St, Cache, Acc) ->

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella Jun 5, 2019

Contributor

I see requests are processed sequentially. So, coming from init, request tx_hash first then watch. Ok.

end;
fork_switch ->
watch_for_channel_change(R, St, C);
_ ->

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella Jun 5, 2019

Contributor

I see that scenario top gets this request watch discarded.

This comment has been minimized.

Copy link
@uwiger

uwiger Jun 5, 2019

Author Contributor

Well, the watch request will remain in play, but will not perform a check in a top scenario, which triggers only when a new request object is introduced.

This comment has been minimized.

Copy link
@lucafavatella

lucafavatella Jun 5, 2019

Contributor

Ah! the watch request is not discarded but placed in requests field of state. Thanks.

@lucafavatella
Copy link
Contributor

left a comment

Where are the release notes gone? I remember having seen them 🤔 Maybe lost in rebase.

Could you please add notes directly in https://github.com/aeternity/aeternity/blame/fa823c4e354776babba1f88aef05906bf1d35475/docs/release-notes/RELEASE-NOTES-3.1.0.md#L3-L10 ?

(No other changes left for this release.)

@uwiger uwiger merged commit 1900d4a into master Jun 5, 2019

24 checks passed

build_test_deploy Workflow: build_test_deploy
Details
ci/circleci: aevm_tests Your tests passed on CircleCI!
Details
ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: build_otp21 Your tests passed on CircleCI!
Details
ci/circleci: docker_smoke_tests Your tests passed on CircleCI!
Details
ci/circleci: docker_system_smoke_tests Your tests passed on CircleCI!
Details
ci/circleci: eunit_fortuna Your tests passed on CircleCI!
Details
ci/circleci: eunit_lima Your tests passed on CircleCI!
Details
ci/circleci: eunit_minerva Your tests passed on CircleCI!
Details
ci/circleci: eunit_roma Your tests passed on CircleCI!
Details
ci/circleci: linux_package Your tests passed on CircleCI!
Details
ci/circleci: rebar_lock_check Your tests passed on CircleCI!
Details
ci/circleci: static_analysis Your tests passed on CircleCI!
Details
ci/circleci: static_analysis_otp21 Your tests passed on CircleCI!
Details
ci/circleci: test_fortuna Your tests passed on CircleCI!
Details
ci/circleci: test_lima Your tests passed on CircleCI!
Details
ci/circleci: test_minerva Your tests passed on CircleCI!
Details
ci/circleci: test_mnesia_leveled Your tests passed on CircleCI!
Details
ci/circleci: test_mnesia_rocksdb Your tests passed on CircleCI!
Details
ci/circleci: test_roma Your tests passed on CircleCI!
Details
ci/circleci: uat_tests Your tests passed on CircleCI!
Details
ci/circleci: upload_build_artifacts Your tests passed on CircleCI!
Details
continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details

@uwiger uwiger deleted the PT-159406473-sc-continuous-chain-monitoring-4 branch Jun 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.