Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New onion_message support 3/3: Make fetchinvoice/offers send/receive both #4800

Merged
merged 8 commits into from
Oct 4, 2021

Conversation

rustyrussell
Copy link
Contributor

(Depends on #4799)

@rustyrussell rustyrussell added this to the v0.10.2 milestone Sep 21, 2021
Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a typo. I'll merge the entire stack of 3 PRs once they all pass CI, so if you want to address the nits on any you have some time :-)

ACK 96f4d98

"blinding_in" field, the former never will.
These three hooks are almost identical, in that they are called when
an onion message is received. The `onion_message` hook is only used
for obsolete unblinded messages, and can be ignored for modern usage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obsolete sounds very harsh, especially in the docs, but I can't come up with a better word.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will get removed next version anyway, so that will resolve the wording debate :)

&payload->reply_first_node,
&payload->reply_path,
&submsg)) {
log_broken(ld->log, "bad got_onionmsg_tous: %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: got_onionmsg_tous -> got_onionmsg_to_us

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this was actually in 2/3 and I already merged that. Oh well, this code is obsolete anyway....

@cdecker
Copy link
Member

cdecker commented Sep 21, 2021

Seems like test_sendinvoice_obsolete needs a skipIf guard for DEVELOPER=0, and test_dev_rawrequest looks like it has a use-after-free, though not sure if it is caused by this PR:

Valgrind error file: valgrind-errors.78570
==78570== Invalid write of size 8
==78570==    at 0x174B55: list_del_ (list.h:328)
==78570==    by 0x174FCC: plugin_hook_killed (plugin_hook.c:135)
==78570==    by 0x21DC3F: notify (tal.c:240)
==78570==    by 0x21E156: del_tree (tal.c:402)
==78570==    by 0x21E1A8: del_tree (tal.c:412)
==78570==    by 0x21E4F2: tal_free (tal.c:486)
==78570==    by 0x16EBD1: plugin_kill (plugin.c:345)
==78570==    by 0x16F9C4: plugin_conn_finish (plugin.c:724)
==78570==    by 0x20F1A5: destroy_conn (poll.c:244)
==78570==    by 0x20F1C9: destroy_conn_close_fd (poll.c:250)
==78570==    by 0x21DC3F: notify (tal.c:240)
==78570==    by 0x21E156: del_tree (tal.c:402)
==78570==  Address 0x6aee688 is 40 bytes inside a block of size 72 free'd
==78570==    at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==78570==    by 0x21E224: del_tree (tal.c:421)
==78570==    by 0x21E1A8: del_tree (tal.c:412)
==78570==    by 0x21E4F2: tal_free (tal.c:486)
==78570==    by 0x16EBD1: plugin_kill (plugin.c:345)
==78570==    by 0x16F9C4: plugin_conn_finish (plugin.c:724)
==78570==    by 0x20F1A5: destroy_conn (poll.c:244)
==78570==    by 0x20F1C9: destroy_conn_close_fd (poll.c:250)
==78570==    by 0x21DC3F: notify (tal.c:240)
==78570==    by 0x21E156: del_tree (tal.c:402)
==78570==    by 0x21E4F2: tal_free (tal.c:486)
==78570==    by 0x20D7B6: io_close (io.c:450)
==78570==  Block was alloc'd at
==78570==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==78570==    by 0x21DCAD: allocate (tal.c:250)
==78570==    by 0x21E26E: tal_alloc_ (tal.c:428)
==78570==    by 0x175599: plugin_hook_call_ (plugin_hook.c:259)
==78570==    by 0x13616F: plugin_hook_call_onion_message_blinded (onion_message.c:126)
==78570==    by 0x13643B: handle_obs_onionmsg_to_us (onion_message.c:187)
==78570==    by 0x138BBD: gossip_msg (gossip_control.c:140)
==78570==    by 0x178AEC: sd_msg_read (subd.c:495)
==78570==    by 0x20CA00: next_plan (io.c:59)
==78570==    by 0x20D608: do_plan (io.c:407)
==78570==    by 0x20D64A: io_ready (io.c:417)
==78570==    by 0x20F8F1: io_loop (poll.c:445)
==78570==

@rustyrussell
Copy link
Contributor Author

Found it: actually a longstanding issue. See commit message for first commit...

When we are calling hooks, we track them via a linked list.  As they
execute, we pop them off the list in plugin_hook_killed().

When we kill a plugin, we have a destructor which remove its entry from the linked list: plugin_hook_killed.

If it's at the head of the list, that means the plugin died while
processing the hook, so instead of just deleting it, we call
plugin_hook_killed() which behaves as if it said "result: continue".

But plugin_hook_killed() just returns if we're shutting down; this
leaves the link (then freed) on the list, and the *next* plugin tries
to unlink from the list, accessing the previous free entry.

The fix is simple: unlink from the list in plugin_hook_killed() even
if we're shutting down.

```
Valgrind error file: valgrind-errors.78570
==78570== Invalid write of size 8
==78570==    at 0x174B55: list_del_ (list.h:328)
==78570==    by 0x174FCC: plugin_hook_killed (plugin_hook.c:135)
==78570==    by 0x21DC3F: notify (tal.c:240)
==78570==    by 0x21E156: del_tree (tal.c:402)
==78570==    by 0x21E1A8: del_tree (tal.c:412)
==78570==    by 0x21E4F2: tal_free (tal.c:486)
==78570==    by 0x16EBD1: plugin_kill (plugin.c:345)
==78570==    by 0x16F9C4: plugin_conn_finish (plugin.c:724)
==78570==    by 0x20F1A5: destroy_conn (poll.c:244)
==78570==    by 0x20F1C9: destroy_conn_close_fd (poll.c:250)
==78570==    by 0x21DC3F: notify (tal.c:240)
==78570==    by 0x21E156: del_tree (tal.c:402)
==78570==  Address 0x6aee688 is 40 bytes inside a block of size 72 free'd
==78570==    at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==78570==    by 0x21E224: del_tree (tal.c:421)
==78570==    by 0x21E1A8: del_tree (tal.c:412)
==78570==    by 0x21E4F2: tal_free (tal.c:486)
==78570==    by 0x16EBD1: plugin_kill (plugin.c:345)
==78570==    by 0x16F9C4: plugin_conn_finish (plugin.c:724)
==78570==    by 0x20F1A5: destroy_conn (poll.c:244)
==78570==    by 0x20F1C9: destroy_conn_close_fd (poll.c:250)
==78570==    by 0x21DC3F: notify (tal.c:240)
==78570==    by 0x21E156: del_tree (tal.c:402)
==78570==    by 0x21E4F2: tal_free (tal.c:486)
==78570==    by 0x20D7B6: io_close (io.c:450)
==78570==  Block was alloc'd at
==78570==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==78570==    by 0x21DCAD: allocate (tal.c:250)
==78570==    by 0x21E26E: tal_alloc_ (tal.c:428)
==78570==    by 0x175599: plugin_hook_call_ (plugin_hook.c:259)
==78570==    by 0x13616F: plugin_hook_call_onion_message_blinded (onion_message.c:126)
==78570==    by 0x13643B: handle_obs_onionmsg_to_us (onion_message.c:187)
==78570==    by 0x138BBD: gossip_msg (gossip_control.c:140)
==78570==    by 0x178AEC: sd_msg_read (subd.c:495)
==78570==    by 0x20CA00: next_plan (io.c:59)
==78570==    by 0x20D608: do_plan (io.c:407)
==78570==    by 0x20D64A: io_ready (io.c:417)
==78570==    by 0x20F8F1: io_loop (poll.c:445)
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This adds a new hook: onion_message_ourpath for when we know a message
came in via a blinded path we created.  The onion_message_blinded hook
is now called for all other messages, since all messages are now
blinded.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This comes in via the onion_message_ourpath hook, and we identify the
path by checking the node alias it came to (vs the obsolete version
which used the blinding).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We carefully copied the buffer, but the tok is inside an array.  We get away
with it for now, but with coming changes it gets freed.  We need to copy
the token and all the tokens within it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Experimental: Protocol: Updated onion_message support to match updated draft specification (with backwards compat for old version)
This lets us test that both work, as expected.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@cdecker
Copy link
Member

cdecker commented Oct 4, 2021

ACK c46211d

@cdecker cdecker merged commit 09c2fef into ElementsProject:master Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants