Improve UX of experimental options/features and other small fixes #5047

SimonVrouwe · 2022-02-21T09:51:13Z

So I was trying to connect the dots to understand dual-fund and leases and then give it a shot.
Did my first dual-fund on mainnet a couple of days ago and then a lease, it all worked out, very cool.

I needed to run custom pytest's, looked up Bolt pull-requests, to get some confidence in the meaning of various options and parameters. And came across a few issues during my journey.

For which I propose fixes in this PR:

a single location to see our features (bits) would be handy (proposed in issue #4310) edit: just found out plugin init configuration has these also and a plugin can be used to show them also
docs missing details and some confusing comments
other small fixes, typos comments etc.

SimonVrouwe · 2022-02-21T16:22:30Z

That's odd, test_plugin_shutdown failed while waiting for the lines:

        l1.daemon.wait_for_logs(['misc_notifications.py: via lightningd shutdown, datastore failed',
>                                'test_libplugin: failed to self-terminate in time, killing.'])

Which are both there:

lightningd-1: 2022-02-21T14:24:48.264Z INFO    plugin-misc_notifications.py: via lightningd shutdown, datastore failed
...
lightningd-1: 2022-02-21T14:25:18.189Z DEBUG   lightningd: test_libplugin: failed to self-terminate in time, killing.
lightningd-1: 2022-02-21T14:25:18.194Z DEBUG   plugin-test_libplugin: Killing plugin: exited during normal operation
lightningd-1: 2022-02-21T14:25:18.196Z DEBUG   lightningd: Command returned result after jcon close

The CI test failure:

                  ValueError: Process died while waiting for logs

Which is here in wait_for_logs

lightning/contrib/pyln-testing/pyln/testing/utils.py

Lines 301 to 302 in d0c7e18

    
           if not self.running: 
        
               raise ValueError('Process died while waiting for logs')

These lines are at the end of the log. Maybe process died quicker then its (log) tail ?

rustyrussell

Great series! Only minor changes needed, tbh...

tests/test_plugin.py

plugins/funder.c

SimonVrouwe · 2022-02-22T12:28:34Z

Maybe I got lucky, but the wait_for_logs fix attempt seems-to-work ™️

Will push another fixup for another test round, if that also passes will squash.

doc/undoc-flags.json

niftynei

utACK, nice cleanup

rustyrussell · 2022-02-22T23:44:57Z

Maybe I got lucky, but the wait_for_logs fix attempt seems-to-work tm

Will push another fixup for another test round, if that also passes will squash.

This fix doesn't do anything. If there is more in the logs, we don't check if it's dead; only once we've exhausted the logs do we see if it's dead.

The real problem is that wait_for_log only reads forward. Here's what happens when I run locally (with valgrind, it's too fast to trigger otherwise):

2022-02-22T23:33:24.306Z INFO    plugin-misc_notifications.py: via lightningd shutdown, datastore failed
2022-02-22T23:33:24.332Z DEBUG   plugin-keysend: Killing plugin: exited during normal operation
2022-02-22T23:33:24.333Z DEBUG   plugin-topology: Killing plugin: exited during normal operation
2022-02-22T23:33:24.335Z DEBUG   plugin-autoclean: Killing plugin: exited during normal operation
2022-02-22T23:33:24.336Z DEBUG   plugin-bcli: Killing plugin: exited during normal operation
2022-02-22T23:33:24.340Z DEBUG   plugin-test_libplugin: shutdown called
2022-02-22T23:33:24.343Z DEBUG   plugin-offers: Killing plugin: exited during normal operation
2022-02-22T23:33:24.360Z DEBUG   plugin-misc_notifications.py: Killing plugin: exited during normal operation
2022-02-22T23:33:24.362Z DEBUG   plugin-funder: Killing plugin: exited during normal operation
2022-02-22T23:33:24.366Z DEBUG   plugin-txprepare: Killing plugin: exited during normal operation
2022-02-22T23:33:24.368Z DEBUG   plugin-spenderp: Killing plugin: exited during normal operation
2022-02-22T23:33:24.384Z DEBUG   plugin-fetchinvoice: Killing plugin: exited during normal operation
2022-02-22T23:33:24.386Z DEBUG   plugin-pay: Killing plugin: exited during normal operation
2022-02-22T23:33:54.299Z DEBUG   lightningd: test_libplugin: failed to self-terminate in time, killing.

Now, we l1.daemon.wait_for_log(r'test_libplugin: shutdown called') and that skips past the plugin-misc_notifications.py: via lightningd shutdown, datastore failed, so waiting for that fails. Which is why we used to manually move the search needle back.

Switching to a single wait_for_logs() on all three fixes this:

diff --git a/tests/test_plugin.py b/tests/test_plugin.py
index 5963a437f..ad69ead61 100644
--- a/tests/test_plugin.py
+++ b/tests/test_plugin.py
@@ -2513,6 +2513,6 @@ def test_plugin_shutdown(node_factory):
     # Now, should also shutdown or timeout on finish, RPC calls then fail with error code -5
     l1.rpc.plugin_start(p, dont_shutdown=True)
     l1.rpc.stop()
-    l1.daemon.wait_for_log(r'test_libplugin: shutdown called')
-    l1.daemon.wait_for_logs(['misc_notifications.py: via lightningd shutdown, datastore failed',
+    l1.daemon.wait_for_logs(['test_libplugin: shutdown called',
+                             'misc_notifications.py: via lightningd shutdown, datastore failed',
                              'test_libplugin: failed to self-terminate in time, killing.'])

SimonVrouwe · 2022-02-23T09:56:12Z

-    l1.daemon.wait_for_log(r'test_libplugin: shutdown called')
-    l1.daemon.wait_for_logs(['misc_notifications.py: via lightningd shutdown, datastore failed',
+    l1.daemon.wait_for_logs(['test_libplugin: shutdown called',
+                             'misc_notifications.py: via lightningd shutdown, datastore failed',
                              'test_libplugin: failed to self-terminate in time, killing.'])

fixup with above recommendation (I keep tripping over that same stone)
fixup with added --lease-fee-base-msat as plugin_option_deprecated and another Changelog note
rebased on latest master and squashed fixups with relevant commits

SimonVrouwe · 2022-02-26T17:13:36Z

Trying to debug the failing test:

lightning/tests/test_connection.py

Lines 3697 to 3706 in 86b83e4

    
           def test_ping_timeout(node_factory): 
        
               # Disconnects after this, but doesn't know it. 
        
               l1_disconnects = ['xWIRE_PING'] 
        
               l1, l2 = node_factory.line_graph(2, opts=[{'dev-no-reconnect': None, 
        
                                                          'disconnect': l1_disconnects}, 
        
                                                         {}]) 
        
               # Takes 15-45 seconds, then another to try second ping 
        
               l1.daemon.wait_for_log('Last ping unreturned: hanging up', 
        
                                      timeout=45 + 45 + 5)

this is in the log where instead of l1, l2 logs the awaited-for line. But now I think it doesn't matter which one logs that line, see explanation below.

lightningd-1: 2022-02-25T18:10:59.080Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: dev_disconnect: xWIRE_PING (WIRE_PING)
lightningd-2: 2022-02-25T18:11:38.118Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Last ping unreturned: hanging up

I can reproduce locally after few runs with valgrind=1
Added this status_peer_debug(&peer->id, "peer->expecting_pong %i", peer->expecting_pong); just above 1st if in

lightning/connectd/multiplex.c

Lines 354 to 366 in 86b83e4

    
           static void send_ping(struct peer *peer) 
        
           { 
        
           	/* Already have a ping in flight? */ 
        
           	if (peer->expecting_pong != PONG_UNEXPECTED) { 
        
           		status_peer_debug(&peer->id, "Last ping unreturned: hanging up"); 
        
           		if (peer->to_peer) 
        
           			io_close(peer->to_peer); 
        
           		return; 
        
           	} 
        
           	inject_peer_msg(peer, take(make_ping(NULL, 1, 0))); 
        
           	peer->expecting_pong = PONG_EXPECTED_PROBING; 
        
           	set_ping_timer(peer);

then run the test with log-level=io:connectd, below the case where the test fails.
I substituted l1 id with L1 etc. and the msg IO in the log a bit for clarity and the <== stuff are my comments.

lightningd-2: 2022-02-26T15:27:53.007Z DEBUG   L1-connectd: peer->expecting_pong 0
lightningd-2: 2022-02-26T15:27:53.008Z L1-connectd: [OUT] WIRE_PING
lightningd-1: 2022-02-26T15:27:53.229Z L2-connectd: [IN] WIRE_PING
lightningd-1: 2022-02-26T15:27:53.230Z L2-connectd: [OUT] WIRE_PONG
lightningd-2: 2022-02-26T15:27:53.438Z L1-connectd: [IN] WIRE_PONG
lightningd-2: 2022-02-26T15:27:53.440Z DEBUG   L1-connectd: handle_pong_in, case PONG_EXPECTED_PROBING <== this resets peer->expecting_pong

lightningd-1: 2022-02-26T15:28:07.992Z DEBUG   L2-connectd: peer->expecting_pong 0
lightningd-1: 2022-02-26T15:28:07.994Z L2-connectd: [OUT] WIRE_PING
lightningd-1: 2022-02-26T15:28:07.996Z DEBUG   L2-connectd: dev_disconnect: xWIRE_PING (WIRE_PING)
lightningd-2: 2022-02-26T15:28:08.221Z L1-connectd: [IN] WIRE_PING      <== leaves l2's peer->expecting_pong alone, see handle_ping_in
lightningd-2: 2022-02-26T15:28:08.223Z L1-connectd: [OUT] WIRE_PONG     <==

lightningd-2: 2022-02-26T15:28:16.009Z DEBUG   L1-connectd: peer->expecting_pong 0
lightningd-2: 2022-02-26T15:28:16.010Z L1-connectd: [OUT] WIRE_PING     <== Bingo! This sets peer->expecting_pong to 2
~ 32s since last l2's WIRE_PING <== is 15 + pseudorand(30)
lightningd-2: 2022-02-26T15:28:48.019Z DEBUG   L1-connectd: peer->expecting_pong 2
lightningd-2: 2022-02-26T15:28:48.021Z DEBUG   L1-connectd: Last ping unreturned: hanging up <== which also kills l1's ping timer

What is happening in above (fail) case:
l2's last [OUT] WIRE_PING is never PONGed because of l1's dev_disconnect.
l2's ping_timer went off (in this fail case before l1's ping_timer) and wants to send [OUT] WIRE_PING, but still expects pong and therefore closes connection.
Closing the connection makes l1 delete its ping_timer (I think) and thus never send its [OUT] WIRE_PING and never logs Last ping unreturned: hanging up

Solution is to check for desired log-line in both l1 and l2.

SimonVrouwe · 2022-02-27T09:56:27Z

Just a last point with respect to getinfo's "our_features" field.
Instead of a hex-string we can make it a comma-separated string of bit numbers (using fmt_featurebits).
The highest bit number that can show up in "our_features" currently is 105 (shutdown_wrong_funding), giving hex-strings like:

"our_features": {
      "init": "0200000000000000008828226aa2",
      "node": "0200000000000080008828226aa2",
      "channel": "",
      "invoice": "20024200"
   }

would become:

"our_features": {
      "init": "1,5,7,9,11,13,14,17,21,27,29,35,39,105",
      "node": "1,5,7,9,11,13,14,17,21,27,29,35,39,55,105",
      "channel": "",
      "invoice": "9,14,17,29"
   }

I think the latter is more readible and would be more compact when bits jump higher.
For example bit number 2023 (no idea what it means) exist in many node_announcement

   "nodes": [
      {
         "nodeid": "03b09440c75e335ae72000231ff0358c47e5943eef73be5746799f624cfc034cf0",
         "alias": "LabNode",
         "color": "3399ff",
         "last_timestamp": 1645837821,
         "features": "800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000808a52a1",
         "addresses": [
            {
               "type": "torv3",
               "address": "yerr22pgypq23lp4sppm7rkudtq24beaq4uyppjgwua5xbtchiukdqyd.onion",
               "port": 9735

as comma-separated string:

"features":  "0,5,7,9,12,14,17,19,23,31,45,2023"

However it would not be consistent with "features" field returned by connect, listchannels and listnodes, these all return hex-string.

rustyrussell · 2022-02-28T02:43:58Z

Nice, I'm folding fixups now...

rustyrussell · 2022-02-28T02:47:15Z

Just a last point with respect to getinfo's "our_features" field. Instead of a hex-string we can make it a comma-separated string of bit numbers (using fmt_featurebits). The highest bit number that can show up in "our_features" currently is 105 (shutdown_wrong_funding), giving hex-strings like:
"our_features": {
      "init": "0200000000000000008828226aa2",
      "node": "0200000000000080008828226aa2",
      "channel": "",
      "invoice": "20024200"
   }
would become:
"our_features": {
      "init": "1,5,7,9,11,13,14,17,21,27,29,35,39,105",
      "node": "1,5,7,9,11,13,14,17,21,27,29,35,39,55,105",
      "channel": "",
      "invoice": "9,14,17,29"
   }
I think the latter is more readible and would be more compact when bits jump higher. For example bit number 2023 (no idea what it means) exist in many node_announcement
   "nodes": [
      {
         "nodeid": "03b09440c75e335ae72000231ff0358c47e5943eef73be5746799f624cfc034cf0",
         "alias": "LabNode",
         "color": "3399ff",
         "last_timestamp": 1645837821,
         "features": "800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000808a52a1",
         "addresses": [
            {
               "type": "torv3",
               "address": "yerr22pgypq23lp4sppm7rkudtq24beaq4uyppjgwua5xbtchiukdqyd.onion",
               "port": 9735
as comma-separated string:
"features":  "0,5,7,9,12,14,17,19,23,31,45,2023"
However it would not be consistent with "features" field returned by connect, listchannels and listnodes, these all return hex-string.

Let's go better and add "featuremap"? {0: "option_data_loss_protect/even", 1023: "option_unknown_odd_1022/odd" etc...?)

Separate patch though IMHO

…ent with tlvs Terminal prompt got messed up because missing newline in case of empty fields. printwire_addresses expected it to be the last field, which is not the case of a node_announcement with tlv

All build flags and (experimental) options make it hard to find out what features are supported or enabled. And the undocumented `--list-features-only`, does not account for all our featurebits, for example bit 55 (keysend). Changelog-Added: JSON-RPC: `getinfo` result now includes `our_features` (bits) for various Bolt ElementsProject#9 contexts

It did not pick-up the last (bolt11) bits set.

…` field returned by listnodes, listchannels and connect.

…z-percent Changelog-Experimental: option `--lease-fee-base-msat` renamed to `--lease-fee-base-sat` Changelog-Experimental: option `--lease-fee-base-msat` deprecated and will be removed next release

…o works with dualfund and some cleanup, fundchannel has been returning tx, txid and outnum for years

…re reliable it now waits for 'alias' in node_announcement, not just block confirms. more cleanup

[ Folded 2 fixups --RR ]

rustyrussell · 2022-02-28T02:49:04Z

Ack 3dd54c9

SimonVrouwe requested a review from cdecker as a code owner February 21, 2022 09:51

SimonVrouwe force-pushed the 2022-02-20_TODOs_ready branch from 5ccb568 to b70b842 Compare February 21, 2022 11:37

rustyrussell reviewed Feb 21, 2022

View reviewed changes

tests/test_plugin.py Show resolved Hide resolved

plugins/funder.c Show resolved Hide resolved

niftynei reviewed Feb 22, 2022

View reviewed changes

doc/undoc-flags.json Show resolved Hide resolved

niftynei approved these changes Feb 22, 2022

View reviewed changes

SimonVrouwe force-pushed the 2022-02-20_TODOs_ready branch from 414de25 to a0c0499 Compare February 23, 2022 09:40

SimonVrouwe added 11 commits February 28, 2022 13:18

devtools: in decodemsg, add newline and fix decoding a node_announcem…

5b4d038

…ent with tlvs Terminal prompt got messed up because missing newline in case of empty fields. printwire_addresses expected it to be the last field, which is not the case of a node_announcement with tlv

testing: cleanup in test_plugin_shutdown

b80e538

lightningd: fix parsing of option dev-force-features

5b72075

It did not pick-up the last (bolt11) bits set.

doc: add details about: log-level, experimental options and `features…

55768e6

…` field returned by listnodes, listchannels and connect.

plugin-funder: fix typos in option lease-fee-base-msat and funder-fuz…

14272d4

…z-percent Changelog-Experimental: option `--lease-fee-base-msat` renamed to `--lease-fee-base-sat` Changelog-Experimental: option `--lease-fee-base-msat` deprecated and will be removed next release

pyln-testing: in utils, update method fundbalancedchannel, now it als…

9a12cfb

…o works with dualfund and some cleanup, fundchannel has been returning tx, txid and outnum for years

pyln-testing: in LightningNode.openchannel, make wait_for_announce mo…

bd52e52

…re reliable it now waits for 'alias' in node_announcement, not just block confirms. more cleanup

pyln-testing: improve description of wait_for_logs

fba71cf

pyln-client: Plugin, improve background comment

1227969

testing: fix flake in test_ping_timeout

3dd54c9

[ Folded 2 fixups --RR ]

rustyrussell force-pushed the 2022-02-20_TODOs_ready branch from 4895631 to 3dd54c9 Compare February 28, 2022 02:48

rustyrussell merged commit 175e0b5 into ElementsProject:master Feb 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve UX of experimental options/features and other small fixes #5047

Improve UX of experimental options/features and other small fixes #5047

SimonVrouwe commented Feb 21, 2022 •

edited

Loading

SimonVrouwe commented Feb 21, 2022

rustyrussell left a comment

SimonVrouwe commented Feb 22, 2022

niftynei left a comment

rustyrussell commented Feb 22, 2022

SimonVrouwe commented Feb 23, 2022

SimonVrouwe commented Feb 26, 2022

SimonVrouwe commented Feb 27, 2022 •

edited

Loading

rustyrussell commented Feb 28, 2022

rustyrussell commented Feb 28, 2022

rustyrussell commented Feb 28, 2022

Improve UX of experimental options/features and other small fixes #5047

Improve UX of experimental options/features and other small fixes #5047

Conversation

SimonVrouwe commented Feb 21, 2022 • edited Loading

SimonVrouwe commented Feb 21, 2022

rustyrussell left a comment

Choose a reason for hiding this comment

SimonVrouwe commented Feb 22, 2022

niftynei left a comment

Choose a reason for hiding this comment

rustyrussell commented Feb 22, 2022

SimonVrouwe commented Feb 23, 2022

SimonVrouwe commented Feb 26, 2022

SimonVrouwe commented Feb 27, 2022 • edited Loading

rustyrussell commented Feb 28, 2022

rustyrussell commented Feb 28, 2022

rustyrussell commented Feb 28, 2022

SimonVrouwe commented Feb 21, 2022 •

edited

Loading

SimonVrouwe commented Feb 27, 2022 •

edited

Loading