Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't retrieve link database for devices that aren't the PLM #2

Closed
rspier opened this issue Dec 30, 2018 · 42 comments
Closed

can't retrieve link database for devices that aren't the PLM #2

rspier opened this issue Dec 30, 2018 · 42 comments
Assignees

Comments

@rspier
Copy link
Contributor

rspier commented Dec 30, 2018

(forked from #1 (comment))

This is using a modified version of message.go that sets TTL and Max TLL to 3.

$ cmd/ic/ic -timeout 60s -log trace device info 33.A9.41
2018/12/30 16:49:52 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 33.a9.41 SD     3:3 Engine Version []} to network
2018/12/30 16:49:52 DEBUG Sending packet to port
2018/12/30 16:49:53 plm.(*Port).send:94 TRACE TX 02 62 33 a9 41 0f 0d 00
2018/12/30 16:49:53 plm.(*PLM).receive:158 TRACE RX Send INSTEON Msg 00 00 00 33 a9 41 0f 0d 00 ACK
2018/12/30 16:49:53 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 34 78 d9 2f 0d 02
2018/12/30 16:49:53 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 34.78.d9 SD Ack 3:3 Command(0x02, 0x0d, 0x02) []}
2018/12/30 16:49:53 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 33.a9.41 SD     3:3 ID Request []} to network
2018/12/30 16:49:53 DEBUG Sending packet to port
2018/12/30 16:49:53 plm.(*Port).send:94 TRACE TX 02 62 33 a9 41 0f 10 00
2018/12/30 16:49:53 plm.(*PLM).receive:158 TRACE RX Send INSTEON Msg 00 00 00 33 a9 41 0f 10 00 ACK
2018/12/30 16:49:53 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 34 78 d9 2f 10 00
2018/12/30 16:49:53 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 34.78.d9 SD Ack 3:3 Command(0x02, 0x10, 0x00) []}
2018/12/30 16:49:53 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 01 20 44 8f 01 76
2018/12/30 16:49:53 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 01 20 44 8f 01 76
2018/12/30 16:49:53 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 01.20.44 SB     3:3 Set-button Pressed (responder)(118) []}
2018/12/30 16:49:53 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 01.20.44 SB     3:3 Set-button Pressed (responder)(118) []}
       Device: Dimmable Light (33.a9.41)
     Category: 01.20
     Firmware: 0x44
2018/12/30 16:49:53 DEBUG Retrieving Device link database
2018/12/30 16:49:53 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 33.a9.41 ED     3:3 Read/Write ALDB [0 0 0 0 0 0 0 0 0 0 0 0 0 0]} to network
2018/12/30 16:49:53 DEBUG Sending packet to port
2018/12/30 16:49:54 plm.(*Port).send:94 TRACE TX 02 62 33 a9 41 1f 2f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1

<hang>

It seems to work ok, if slowly using a nodeJS library. (home-controller)

$ time ./insteon-link /dev/ttyUSB0 33a941
Connecting to /dev/ttyUSB0
Polling device: 33a941
Device found: 33a941 - Dimmable Lighting Control
Getting links for device: 33a941
Found 6 link(s):

real 0m42.134s
user 0m0.400s
sys 0m0.016s

I'll try and compare the requests between the two libraries.

@rspier
Copy link
Contributor Author

rspier commented Dec 31, 2018

The node library is sending a slightly different message.

home-controller:insteon Send: {"command":{"cmd1":"2F","cmd2":"00","extended":true,"userData":["00","00","0FFF","01"],"type":"62","checksum":194,"raw":"026233a9411F2F0000000FFF010000000000000000C2","id":"33a941"},"deferred":{"promise":{}},"timeout":5000} +1ms

2018/12/30 16:49:53 DEBUG Retrieving Device link database
2018/12/30 16:49:53 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 33.a9.41 ED 3:3 Read/Write ALDB [0 0 0 0 0 0 0 0 0 0 0 0 0 0]} to network
2018/12/30 16:49:53 DEBUG Sending packet to port
2018/12/30 16:49:54 plm.(*Port).send:94 TRACE TX 02 62 33 a9 41 1f 2f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1

                        d1 d2 d3 d4 d5 d6            
A: 02 62 33 a9 41 1f 2f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1
B: 02 62 33 a9 41 1f 2f 00 00 00 0f ff 01 00 00 00 00 00 00 00 00 c2

This patch makes the sent packets match:

diff --git a/i2device.go b/i2device.go
index 7cde1e8..cf92481 100644
--- a/i2device.go
+++ b/i2device.go
@@ -45,7 +45,7 @@ func (i2 *I2Device) RemoveLinks(oldLinks ...*LinkRecord) error {
 func (i2 *I2Device) Links() (links []*LinkRecord, err error) {
        Log.Debugf("Retrieving Device link database")
        lastAddress := MemAddress(0)
-       buf, _ := (&LinkRequest{Type: ReadLink, NumRecords: 0}).MarshalBinary()
+       buf, _ := (&LinkRequest{Type: ReadLink, MemAddress: BaseLinkDBAddress, NumRecords: 1}).MarshalBinary()
        recvCh, err := i2.SendCommandAndListen(CmdReadWriteALDB, buf)
 
        for response := range recvCh {

But it still hangs:

./ic -log trace device info 33.A9.41
2018/12/30 17:13:33 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 33.a9.41 SD     3:3 Engine Version []} to network
2018/12/30 17:13:33 DEBUG Sending packet to port
2018/12/30 17:13:34 plm.(*Port).send:94 TRACE TX 02 62 33 a9 41 0f 0d 00
2018/12/30 17:13:34 plm.(*PLM).receive:158 TRACE RX Send INSTEON Msg 00 00 00 33 a9 41 0f 0d 00 ACK
2018/12/30 17:13:34 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 34 78 d9 2f 0d 02
2018/12/30 17:13:34 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 34.78.d9 SD Ack 3:3 Command(0x02, 0x0d, 0x02) []}
2018/12/30 17:13:34 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 33.a9.41 SD     3:3 ID Request []} to network
2018/12/30 17:13:34 DEBUG Sending packet to port
2018/12/30 17:13:34 plm.(*Port).send:94 TRACE TX 02 62 33 a9 41 0f 10 00
2018/12/30 17:13:34 plm.(*PLM).receive:158 TRACE RX Send INSTEON Msg 00 00 00 33 a9 41 0f 10 00 ACK
2018/12/30 17:13:34 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 34 78 d9 2f 10 00
2018/12/30 17:13:34 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 34.78.d9 SD Ack 3:3 Command(0x02, 0x10, 0x00) []}
2018/12/30 17:13:34 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 01 20 44 8f 01 76
2018/12/30 17:13:34 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 01.20.44 SB     3:3 Set-button Pressed (responder)(118) []}
       Device: Dimmable Light (33.a9.41)
     Category: 01.20
     Firmware: 0x44
2018/12/30 17:13:34 DEBUG Retrieving Device link database
2018/12/30 17:13:34 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 33.a9.41 ED     3:3 Read/Write ALDB [0 0 15 255 1 0 0 0 0 0 0 0 0 0]} to network
2018/12/30 17:13:34 DEBUG Sending packet to port
2018/12/30 17:13:35 plm.(*Port).send:94 TRACE TX 02 62 33 a9 41 1f 2f 00 00 00 0f ff 01 00 00 00 00 00 00 00 00 c2
2018/12/30 17:13:35 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 01 20 44 8f 01 76
2018/12/30 17:13:40 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 01.20.44 SB     3:3 Set-button Pressed (responder)(118) []}

@abates
Copy link
Owner

abates commented Dec 31, 2018

Is that the complete trace? It seems like there are some PLM acks missing to me. When I do the same command to one of my devices it looks like this:

abates@smilinjack:~/local/src/github.com/abates/insteon/cmd/ic$ ./ic -log trace device info 4c.0e.d4
2018/12/30 21:07:34 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 4c.0e.d4 SD     2:2 Engine Version []} to network
2018/12/30 21:07:34 DEBUG Sending packet to port
2018/12/30 21:07:34 plm.(*Port).send:94 TRACE TX 02 62 4c 0e d4 0a 0d 00
2018/12/30 21:07:34 plm.(*PLM).receive:158 TRACE RX Send INSTEON Msg 00 00 00 4c 0e d4 0a 0d 00 ACK
2018/12/30 21:07:34 plm.(*PLM).receive:158 TRACE RX Std Msg Received 4c 0e d4 3d 96 e1 26 0d 02
2018/12/30 21:07:34 insteon.(*Network).receive:100 TRACE Received Insteon Message &{4c.0e.d4 3d.96.e1 SD Ack 2:1 Command(0x02, 0x0d, 0x02) []}
2018/12/30 21:07:34 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 4c.0e.d4 SD     2:2 ID Request []} to network
2018/12/30 21:07:34 DEBUG Sending packet to port
2018/12/30 21:07:35 plm.(*Port).send:94 TRACE TX 02 62 4c 0e d4 0a 10 00
2018/12/30 21:07:35 plm.(*PLM).receive:158 TRACE RX Send INSTEON Msg 00 00 00 4c 0e d4 0a 10 00 ACK
2018/12/30 21:07:35 plm.(*PLM).receive:158 TRACE RX Std Msg Received 4c 0e d4 3d 96 e1 26 10 00
2018/12/30 21:07:35 insteon.(*Network).receive:100 TRACE Received Insteon Message &{4c.0e.d4 3d.96.e1 SD Ack 2:1 Command(0x02, 0x10, 0x00) []}
2018/12/30 21:07:35 plm.(*PLM).receive:158 TRACE RX Std Msg Received 4c 0e d4 02 2a 45 83 01 72
2018/12/30 21:07:35 insteon.(*Network).receive:100 TRACE Received Insteon Message &{4c.0e.d4 02.2a.45 SB     3:0 Set-button Pressed (responder)(114) []}
       Device: Switch (4c.0e.d4)
     Category: 02.2a
     Firmware: 0x45
2018/12/30 21:07:35 DEBUG Retrieving Device link database
2018/12/30 21:07:35 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 4c.0e.d4 ED     2:2 Read/Write ALDB [0 0 0 0 0 0 0 0 0 0 0 0 0 0]} to network
2018/12/30 21:07:35 DEBUG Sending packet to port
2018/12/30 21:07:35 plm.(*Port).send:94 TRACE TX 02 62 4c 0e d4 1a 2f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1
2018/12/30 21:07:35 plm.(*PLM).receive:158 TRACE RX Send INSTEON Msg 00 00 00 4c 0e d4 1a 2f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1 ACK
2018/12/30 21:07:35 plm.(*PLM).receive:158 TRACE RX Std Msg Received 4c 0e d4 3d 96 e1 26 2f 00
...

@abates abates self-assigned this Dec 31, 2018
@rspier
Copy link
Contributor Author

rspier commented Dec 31, 2018

Yes, there's no ACK from the Read ALDB command.

$ ./ic -log trace device info 33.A9.41
2018/12/30 23:07:59 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 33.a9.41 SD     3:3 Engine Version []} to network
2018/12/30 23:07:59 DEBUG Sending packet to port
2018/12/30 23:07:59 plm.(*Port).send:94 TRACE TX 02 62 33 a9 41 0f 0d 00
2018/12/30 23:07:59 plm.(*PLM).receive:158 TRACE RX Send INSTEON Msg 00 00 00 33 a9 41 0f 0d 00 ACK
2018/12/30 23:07:59 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 34 78 d9 2f 0d 02
2018/12/30 23:07:59 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 34.78.d9 SD Ack 3:3 Command(0x02, 0x0d, 0x02) []}
2018/12/30 23:07:59 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 33.a9.41 SD     3:3 ID Request []} to network
2018/12/30 23:07:59 DEBUG Sending packet to port
2018/12/30 23:08:00 plm.(*Port).send:94 TRACE TX 02 62 33 a9 41 0f 10 00
2018/12/30 23:08:00 plm.(*PLM).receive:158 TRACE RX Send INSTEON Msg 00 00 00 33 a9 41 0f 10 00 ACK
2018/12/30 23:08:00 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 34 78 d9 2f 10 00
2018/12/30 23:08:00 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 34.78.d9 SD Ack 3:3 Command(0x02, 0x10, 0x00) []}
2018/12/30 23:08:00 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 01 20 44 8f 01 76
2018/12/30 23:08:00 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 01.20.44 SB     3:3 Set-button Pressed (responder)(118) []}
       Device: Dimmable Light (33.a9.41)
     Category: 01.20
     Firmware: 0x44
2018/12/30 23:08:00 DEBUG Retrieving Device link database
2018/12/30 23:08:00 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 33.a9.41 ED     3:3 Read/Write ALDB [0 0 15 255 1 0 0 0 0 0 0 0 0 0]} to network
2018/12/30 23:08:00 DEBUG Sending packet to port
2018/12/30 23:08:00 plm.(*Port).send:94 TRACE TX 02 62 33 a9 41 1f 2f 00 00 00 0f ff 01 00 00 00 00 00 00 00 00 c2
2018/12/30 23:08:00 plm.(*PLM).receive:158 TRACE RX Std Msg Received 33 a9 41 01 20 44 8f 01 76
2018/12/30 23:08:05 insteon.(*Network).receive:100 TRACE Received Insteon Message &{33.a9.41 01.20.44 SB     3:3 Set-button Pressed (responder)(118) []}
# waited 5 minutes and nothing else happened

Here's another with similar hardware to what you're using (a Switch)

$ ./ic -log trace device info 2C.7D.24
2018/12/30 23:19:40 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 2c.7d.24 SD     3:3 Engine Version []} to network
2018/12/30 23:19:40 DEBUG Sending packet to port
2018/12/30 23:19:40 plm.(*Port).send:94 TRACE TX 02 62 2c 7d 24 0f 0d 00
2018/12/30 23:19:40 plm.(*PLM).receive:158 TRACE RX Send INSTEON Msg 00 00 00 2c 7d 24 0f 0d 00 ACK
2018/12/30 23:19:40 plm.(*PLM).receive:158 TRACE RX Std Msg Received 2c 7d 24 34 78 d9 2f 0d 02
2018/12/30 23:19:40 insteon.(*Network).receive:100 TRACE Received Insteon Message &{2c.7d.24 34.78.d9 SD Ack 3:3 Command(0x02, 0x0d, 0x02) []}
2018/12/30 23:19:40 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 2c.7d.24 SD     3:3 ID Request []} to network
2018/12/30 23:19:40 DEBUG Sending packet to port
2018/12/30 23:19:41 plm.(*Port).send:94 TRACE TX 02 62 2c 7d 24 0f 10 00
2018/12/30 23:19:41 plm.(*PLM).receive:158 TRACE RX Send INSTEON Msg 00 00 00 2c 7d 24 0f 10 00 ACK
2018/12/30 23:19:41 plm.(*PLM).receive:158 TRACE RX Std Msg Received 2c 7d 24 34 78 d9 2f 10 00
2018/12/30 23:19:41 insteon.(*Network).receive:100 TRACE Received Insteon Message &{2c.7d.24 34.78.d9 SD Ack 3:3 Command(0x02, 0x10, 0x00) []}
2018/12/30 23:19:41 plm.(*PLM).receive:158 TRACE RX Std Msg Received 2c 7d 24 02 2f 43 8f 01 00
2018/12/30 23:19:41 plm.(*PLM).receive:158 TRACE RX Std Msg Received 2c 7d 24 02 2f 43 8f 01 00
2018/12/30 23:19:41 insteon.(*Network).receive:100 TRACE Received Insteon Message &{2c.7d.24 02.2f.43 SB     3:3 Set-button Pressed (responder) []}
2018/12/30 23:19:41 insteon.(*Network).receive:100 TRACE Received Insteon Message &{2c.7d.24 02.2f.43 SB     3:3 Set-button Pressed (responder) []}
       Device: Switch (2c.7d.24)
     Category: 02.2f
     Firmware: 0x43
2018/12/30 23:19:41 DEBUG Retrieving Device link database
2018/12/30 23:19:41 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 2c.7d.24 ED     3:3 Read/Write ALDB [0 0 0 0 0 0 0 0 0 0 0 0 0 0]} to network
2018/12/30 23:19:41 DEBUG Sending packet to port
2018/12/30 23:19:41 plm.(*Port).send:94 TRACE TX 02 62 2c 7d 24 1f 2f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1
# crickets

@abates
Copy link
Owner

abates commented Dec 31, 2018

So the way the PLM and Insteon works is this:

  1. Send a serial message via tty/usb to request an Insteon standard or extended message be sent
  2. PLM sends the message to powerline/rf mesh and then sends an ack back to the serial port. This ACKs that the PLM received the request and sent the message
    2.a) If the network is busy and the PLM cannot deliver the message, it responds with a NAK
  3. Network device (dimmer, switch, etc) receives the message and either sends an ACK (message received) or sends a NAK with some information on what caused the NAK (not linked, malformed, etc).

Once the device ACK's the message it may respond further, as in the case with the Read/Write ALDB. In your case we should be seeing:

  1. Read/Write ALDB -> PLM
  2. PLM ACK/NAK
  3. Device Ack/NAK
  4. Read/Write ALDB response

It seems like we're only seeing the message being sent to the PLM but the PLM never acknowledges that it received the request. This is really strange to me and I wonder if there is something wrong with your PLM? Maybe the javascript package isn't waiting for PLM ACKs?

In any case, I thought I had timeouts set throughout the code so that if any Ack/Nak along the way didn't come then a timeout would occur and a specific error would bubble back up to the caller. I'll look into this specifically.

abates added a commit that referenced this issue Dec 31, 2018
@abates
Copy link
Owner

abates commented Dec 31, 2018

I added a timeout as well as an additional trace log. Please pull the latest commit and re-run the trace.

@rspier
Copy link
Contributor Author

rspier commented Dec 31, 2018

2018/12/31 09:10:39 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 2c.7d.24 SD     2:2 Engine Version []} to network
2018/12/31 09:10:39 DEBUG Sending packet to port
2018/12/31 09:10:39 plm.(*Port).readLoop:60 TRACE RX 02 62 2c 7d 24 0a 0d 00 06
2018/12/31 09:10:40 plm.(*Port).readLoop:60 TRACE RX 02 50 2c 7d 24 34 78 d9 2a 0d 02
2018/12/31 09:10:40 plm.(*Port).send:95 TRACE TX 02 62 2c 7d 24 0a 0d 00
2018/12/31 09:10:40 plm.(*PLM).receive:159 TRACE RX Send INSTEON Msg 00 00 00 2c 7d 24 0a 0d 00 ACK
2018/12/31 09:10:40 plm.(*PLM).receive:159 TRACE RX Std Msg Received 2c 7d 24 34 78 d9 2a 0d 02
2018/12/31 09:10:40 insteon.(*Network).receive:100 TRACE Received Insteon Message &{2c.7d.24 34.78.d9 SD Ack 2:2 Command(0x02, 0x0d, 0x02) []}
2018/12/31 09:10:40 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 2c.7d.24 SD     2:2 ID Request []} to network
2018/12/31 09:10:40 DEBUG Sending packet to port
2018/12/31 09:10:40 plm.(*Port).readLoop:60 TRACE RX 02 62 2c 7d 24 0a 10 00 06
2018/12/31 09:10:40 plm.(*Port).readLoop:60 TRACE RX 02 50 2c 7d 24 34 78 d9 2a 10 00
2018/12/31 09:10:40 plm.(*Port).send:95 TRACE TX 02 62 2c 7d 24 0a 10 00
2018/12/31 09:10:40 plm.(*PLM).receive:159 TRACE RX Send INSTEON Msg 00 00 00 2c 7d 24 0a 10 00 ACK
2018/12/31 09:10:40 plm.(*Port).readLoop:60 TRACE RX 02 50 2c 7d 24 02 2f 43 8f 01 00
2018/12/31 09:10:40 plm.(*PLM).receive:159 TRACE RX Std Msg Received 2c 7d 24 34 78 d9 2a 10 00
2018/12/31 09:10:40 plm.(*Port).readLoop:60 TRACE RX 02 50 2c 7d 24 02 2f 43 8f 01 00
2018/12/31 09:10:40 insteon.(*Network).receive:100 TRACE Received Insteon Message &{2c.7d.24 34.78.d9 SD Ack 2:2 Command(0x02, 0x10, 0x00) []}
2018/12/31 09:10:40 plm.(*PLM).receive:159 TRACE RX Std Msg Received 2c 7d 24 02 2f 43 8f 01 00
2018/12/31 09:10:40 plm.(*PLM).receive:159 TRACE RX Std Msg Received 2c 7d 24 02 2f 43 8f 01 00
2018/12/31 09:10:40 insteon.(*Network).receive:100 TRACE Received Insteon Message &{2c.7d.24 02.2f.43 SB     3:3 Set-button Pressed (responder) []}
2018/12/31 09:10:40 insteon.(*Network).receive:100 TRACE Received Insteon Message &{2c.7d.24 02.2f.43 SB     3:3 Set-button Pressed (responder) []}
       Device: Switch (2c.7d.24)
     Category: 02.2f
     Firmware: 0x43
2018/12/31 09:10:40 DEBUG Retrieving Device link database
2018/12/31 09:10:40 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 2c.7d.24 ED     2:2 Read/Write ALDB [0 0 0 0 0 0 0 0 0 0 0 0 0 0]} to network
2018/12/31 09:10:40 DEBUG Sending packet to port
2018/12/31 09:10:41 plm.(*Port).send:95 TRACE TX 02 62 2c 7d 24 1a 2f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1
Link Database:
    No links defined
Timeout reading from plm

@abates
Copy link
Owner

abates commented Dec 31, 2018

So the Read/Write ALDB send request is sent to the PLM and then the PLM never responds again. Serial I/O runs in a go routine and everything is traced, so it's not like something is hanging and preventing a read. Can I see a trace from the javascript package all the way through completion of querying the link database? Also, how long does that take? Maybe for some reason your PLM is taking a really long time to respond and the go package is just timing out too soon.

@rspier
Copy link
Contributor Author

rspier commented Dec 31, 2018

Here's a trace of the JS version:
1.txt.gz

It takes a minute and 5 seconds to retrieve the entire linkDB. (Setting a 120s timeout on the go version doesn't help, it's getting into some sort of stuck state.)

I can factory reset the PLM, but it's not clear if I need to remove all the links to to the modem from the devices first or if that's going to cause me trouble later. (And doing that without using the modem... will be problematic.)

@rspier
Copy link
Contributor Author

rspier commented Dec 31, 2018

This is the JS library: https://github.com/automategreen/home-controller/

abates added a commit that referenced this issue Dec 31, 2018
@abates
Copy link
Owner

abates commented Dec 31, 2018

I wouldn't factory reset the modem just yet. The one place that a read blocks indefinitely in the go package is here. I've added some logging statements to see if that is indeed where we stop getting traffic from the PLM. Please try again and update the trace.

@rspier
Copy link
Contributor Author

rspier commented Dec 31, 2018

It never gets to the new log lines:

 ./ic -timeout 30s  -log trace device info 2C.7D.24
2018/12/31 10:07:34 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 2c.7d.24 SD     2:2 Engine Version []} to network
2018/12/31 10:07:34 DEBUG Sending packet to port
2018/12/31 10:07:34 plm.(*Port).send:90 TRACE TX 02 62 2c 7d 24 0a 0d 00
2018/12/31 10:07:34 plm.(*Port).readPacket:111 TRACE Attempting to read 7 more bytes
2018/12/31 10:07:34 plm.(*Port).readPacket:113 TRACE Completed read: 02 62 2c 7d 24 0a 0d 00 06
2018/12/31 10:07:34 plm.(*Port).readLoop:56 TRACE RX 02 62 2c 7d 24 0a 0d 00 06
2018/12/31 10:07:34 plm.(*PLM).receive:173 TRACE RX Send INSTEON Msg 00 00 00 2c 7d 24 0a 0d 00 ACK
2018/12/31 10:07:35 plm.(*Port).readPacket:111 TRACE Attempting to read 9 more bytes
2018/12/31 10:07:35 plm.(*Port).readPacket:113 TRACE Completed read: 02 50 2c 7d 24 34 78 d9 2a 0d 02
2018/12/31 10:07:35 plm.(*Port).readLoop:56 TRACE RX 02 50 2c 7d 24 34 78 d9 2a 0d 02
2018/12/31 10:07:35 plm.(*PLM).receive:173 TRACE RX Std Msg Received 2c 7d 24 34 78 d9 2a 0d 02
2018/12/31 10:07:35 insteon.(*Network).receive:100 TRACE Received Insteon Message &{2c.7d.24 34.78.d9 SD Ack 2:2 Command(0x02, 0x0d, 0x02) []}
2018/12/31 10:07:35 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 2c.7d.24 SD     2:2 ID Request []} to network
2018/12/31 10:07:35 DEBUG Sending packet to port
2018/12/31 10:07:35 plm.(*Port).send:90 TRACE TX 02 62 2c 7d 24 0a 10 00
2018/12/31 10:07:35 plm.(*Port).readPacket:111 TRACE Attempting to read 7 more bytes
2018/12/31 10:07:35 plm.(*Port).readPacket:113 TRACE Completed read: 02 62 2c 7d 24 0a 10 00 06
2018/12/31 10:07:35 plm.(*Port).readLoop:56 TRACE RX 02 62 2c 7d 24 0a 10 00 06
2018/12/31 10:07:35 plm.(*PLM).receive:173 TRACE RX Send INSTEON Msg 00 00 00 2c 7d 24 0a 10 00 ACK
2018/12/31 10:07:35 plm.(*Port).readPacket:111 TRACE Attempting to read 9 more bytes
2018/12/31 10:07:35 plm.(*Port).readPacket:113 TRACE Completed read: 02 50 2c 7d 24 34 78 d9 2a 10 00
2018/12/31 10:07:35 plm.(*Port).readLoop:56 TRACE RX 02 50 2c 7d 24 34 78 d9 2a 10 00
2018/12/31 10:07:35 plm.(*PLM).receive:173 TRACE RX Std Msg Received 2c 7d 24 34 78 d9 2a 10 00
2018/12/31 10:07:35 insteon.(*Network).receive:100 TRACE Received Insteon Message &{2c.7d.24 34.78.d9 SD Ack 2:2 Command(0x02, 0x10, 0x00) []}
2018/12/31 10:07:35 plm.(*Port).readPacket:111 TRACE Attempting to read 9 more bytes
2018/12/31 10:07:35 plm.(*Port).readPacket:113 TRACE Completed read: 02 50 2c 7d 24 02 2f 43 8f 01 00
2018/12/31 10:07:35 plm.(*Port).readLoop:56 TRACE RX 02 50 2c 7d 24 02 2f 43 8f 01 00
2018/12/31 10:07:35 plm.(*PLM).receive:173 TRACE RX Std Msg Received 2c 7d 24 02 2f 43 8f 01 00
2018/12/31 10:07:35 insteon.(*Network).receive:100 TRACE Received Insteon Message &{2c.7d.24 02.2f.43 SB     3:3 Set-button Pressed (responder) []}
       Device: Switch (2c.7d.24)
     Category: 02.2f
     Firmware: 0x43
2018/12/31 10:07:35 DEBUG Retrieving Device link database
2018/12/31 10:07:35 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 2c.7d.24 ED     2:2 Read/Write ALDB [0 0 0 0 0 0 0 0 0 0 0 0 0 0]} to network
2018/12/31 10:07:35 DEBUG Sending packet to port
2018/12/31 10:07:35 plm.(*Port).send:90 TRACE TX 02 62 2c 7d 24 1a 2f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1
runtime.notetsleepg(0x6251c0, 0x6d0a024ac, 0x0)
	/usr/lib/go-1.10/src/runtime/lock_futex.go:227 +0x42 fp=0xc4200fc760 sp=0xc4200fc730 pc=0x40ea72
runtime.timerproc(0x6251a0)
	/usr/lib/go-1.10/src/runtime/time.go:261 +0x2e7 fp=0xc4200fc7d8 sp=0xc4200fc760 pc=0x445377
runtime.goexit()
	/usr/lib/go-1.10/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc4200fc7e0 sp=0xc4200fc7d8 pc=0x4539e1
created by runtime.(*timersBucket).addtimerLocked
	/usr/lib/go-1.10/src/runtime/time.go:160 +0x107

goroutine 1 [chan receive]:
github.com/abates/insteon.(*I1Device).sendCommand(...)
	/tmp/goi/gopath/src/github.com/abates/insteon/i1device.go:159
github.com/abates/insteon.(*I1Device).SendCommandAndListen(...)
	/tmp/goi/gopath/src/github.com/abates/insteon/i1device.go:184
github.com/abates/insteon.(*I2Device).Links(0xc42014a048, 0x0, 0x0, 0x0, 0x0, 0x0)
	/tmp/goi/gopath/src/github.com/abates/insteon/i2device.go:49 +0x230
main.printLinkDatabase(0x7f1e87d94168, 0xc420134100, 0x1, 0x7f1e87d94168)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/linkdatabase.go:26 +0x4c
main.printDevInfo.func1(0x7f1e87d94168, 0xc420134100, 0xc420134100, 0x7f1e87d94168)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/device.go:126 +0x35
main.devLink(0x55f398, 0x12, 0xc420067b30)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/device.go:77 +0xeb
main.printDevInfo(0x5703a0, 0xc420134100, 0x0, 0x0, 0x1, 0xc4201301e0)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/device.go:125 +0x29a
main.devInfoCmd(0xc4200b4070, 0x1, 0x1, 0xc4201301e0, 0x0, 0x0)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/device.go:108 +0x41
github.com/abates/cli.(*Command).Run(0xc4200dc0e0, 0xc4200b4070, 0x1, 0x1, 0x8, 0xc4200aa258)
	/tmp/goi/gopath/src/github.com/abates/cli/commands.go:163 +0x1f1
github.com/abates/cli.(*Command).Run.func1(0xc4200b7620, 0xc420247d2c)
	/tmp/goi/gopath/src/github.com/abates/cli/commands.go:159 +0x4a
main.devCmd(0xc4200b4070, 0x1, 0x1, 0xc420098e40, 0xc4200a28a8, 0x0)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/device.go:55 +0x187
github.com/abates/cli.(*Command).Run(0xc4200dc070, 0xc4200b4060, 0x1, 0x1, 0x18, 0xc420067e08)
	/tmp/goi/gopath/src/github.com/abates/cli/commands.go:163 +0x1f1
github.com/abates/cli.(*Command).Run.func1(0xc400000018, 0x55f200)
	/tmp/goi/gopath/src/github.com/abates/cli/commands.go:159 +0x4a
main.run(0xc4200b4060, 0x2, 0x2, 0xc420098de0, 0x0, 0x0)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/main.go:110 +0x14a
github.com/abates/cli.(*Command).Run(0xc4200dc000, 0xc4200b4010, 0x2, 0x2, 0x0, 0x0)
	/tmp/goi/gopath/src/github.com/abates/cli/commands.go:163 +0x1f1
main.main()
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/main.go:116 +0x76

goroutine 19 [syscall]:
syscall.Syscall(0x0, 0x3, 0xc4200e2000, 0x1000, 0xc420080780, 0xc420020080, 0xc4200eac00)
	/usr/lib/go-1.10/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x3, 0xc4200e2000, 0x1000, 0x1000, 0xc420020001, 0x0, 0x0)
	/usr/lib/go-1.10/src/syscall/zsyscall_linux_amd64.go:749 +0x5f
syscall.Read(0x3, 0xc4200e2000, 0x1000, 0x1000, 0x2, 0x0, 0x0)
	/usr/lib/go-1.10/src/syscall/syscall_unix.go:162 +0x49
internal/poll.(*FD).Read(0xc4200b01e0, 0xc4200e2000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/usr/lib/go-1.10/src/internal/poll/fd_unix.go:153 +0x118
os.(*File).read(0xc4200ae020, 0xc4200e2000, 0x1000, 0x1000, 0x2, 0x0, 0x0)
	/usr/lib/go-1.10/src/os/file_unix.go:226 +0x4e
os.(*File).Read(0xc4200ae020, 0xc4200e2000, 0x1000, 0x1000, 0x2, 0x0, 0x0)
	/usr/lib/go-1.10/src/os/file.go:107 +0x6a
github.com/tarm/serial.(*Port).Read(0xc4200ae028, 0xc4200e2000, 0x1000, 0x1000, 0x2, 0x0, 0x0)
	/tmp/goi/gopath/src/github.com/tarm/serial/serial_linux.go:138 +0x4c
bufio.(*Reader).fill(0xc4200b7020)
	/usr/lib/go-1.10/src/bufio/bufio.go:100 +0x11e
bufio.(*Reader).ReadByte(0xc4200b7020, 0x73368c115, 0x0, 0x0)
	/usr/lib/go-1.10/src/bufio/bufio.go:242 +0x39
github.com/abates/insteon/plm.(*Port).readPacket(0xc4200b2180, 0xc4200eafb0, 0x5, 0xc4200eafa0, 0x1, 0x1)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/port.go:102 +0xb0
github.com/abates/insteon/plm.(*Port).readLoop(0xc4200b2180)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/port.go:54 +0x40
created by github.com/abates/insteon/plm.NewPort
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/port.go:47 +0x216

goroutine 20 [select]:
github.com/abates/insteon/plm.(*Port).process(0xc4200b2180)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/port.go:66 +0x16f
created by github.com/abates/insteon/plm.NewPort
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/port.go:48 +0x238

goroutine 21 [select]:
github.com/abates/insteon/plm.(*PLM).process(0xc4200b4080)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/plm.go:101 +0x1ff
created by github.com/abates/insteon/plm.New
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/plm.go:91 +0x137

goroutine 23 [chan receive]:
github.com/abates/insteon/plm.(*PLM).Retry(0xc4200b4080, 0xc420130360, 0x0, 0xc4200992c0, 0x0, 0x0)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/plm.go:214 +0xd6
github.com/abates/insteon/plm.(*PLM).connect.func1(0xc4200b7200, 0xc4200b4080, 0xc4200b7260)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/plm.go:243 +0xe1
created by github.com/abates/insteon/plm.(*PLM).connect
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/plm.go:241 +0xbe

goroutine 24 [chan receive]:
github.com/abates/insteon/plm.(*connection).send(...)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/connection.go:75
github.com/abates/insteon/plm.(*connection).process(0xc4200b21c0)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/connection.go:52 +0x24f
created by github.com/abates/insteon/plm.newConnection
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/connection.go:38 +0x102

goroutine 25 [select]:
github.com/abates/insteon.(*Network).process(0xc4200b7620)
	/tmp/goi/gopath/src/github.com/abates/insteon/network.go:79 +0x1e9
created by github.com/abates/insteon.New
	/tmp/goi/gopath/src/github.com/abates/insteon/network.go:72 +0x16a

goroutine 35 [select]:
github.com/abates/insteon.(*Network).sendMessage(0xc4200b7620, 0xc420130240, 0x1, 0x0)
	/tmp/goi/gopath/src/github.com/abates/insteon/network.go:144 +0x47e
github.com/abates/insteon.(*Network).connect.func1(0xc420148120, 0xc4200b7620, 0xc420148180)
	/tmp/goi/gopath/src/github.com/abates/insteon/network.go:210 +0x92
created by github.com/abates/insteon.(*Network).connect
	/tmp/goi/gopath/src/github.com/abates/insteon/network.go:208 +0x98

goroutine 36 [chan receive]:
github.com/abates/insteon.(*connection).send(0xc4201a6000)
	/tmp/goi/gopath/src/github.com/abates/insteon/connection.go:167 +0x13e
github.com/abates/insteon.(*connection).process(0xc4201a6000)
	/tmp/goi/gopath/src/github.com/abates/insteon/connection.go:70 +0x1d9
created by github.com/abates/insteon.newConnection
	/tmp/goi/gopath/src/github.com/abates/insteon/connection.go:48 +0x12b

goroutine 37 [select]:
github.com/abates/insteon.(*I1Device).process(0xc420148420)
	/tmp/goi/gopath/src/github.com/abates/insteon/i1device.go:59 +0x234
created by github.com/abates/insteon.NewI1Device
	/tmp/goi/gopath/src/github.com/abates/insteon/i1device.go:52 +0x133

goroutine 38 [chan receive]:
github.com/abates/insteon.(*switchedDevice).process(0xc4201301b0)
	/tmp/goi/gopath/src/github.com/abates/insteon/lighting.go:275 +0x81
created by github.com/abates/insteon.switchedDeviceFactory
	/tmp/goi/gopath/src/github.com/abates/insteon/lighting.go:520 +0x120

It looks to me like it's blocked here on reading the first byte of the PLM packet. See goroutine 19.

@rspier
Copy link
Contributor Author

rspier commented Dec 31, 2018

Adding a timeout to the serial port, doesn't help... there just doesn't seem to be a packet to read.

2018/12/31 10:17:23 DEBUG Retrieving Device link database
2018/12/31 10:17:23 insteon.(*Network).sendMessage:134 TRACE Sending &{00.00.00 2c.7d.24 ED     2:2 Read/Write ALDB [0 0 0 0 0 0 0 0 0 0 0 0 0 0]} to network
2018/12/31 10:17:23 DEBUG Sending packet to port
2018/12/31 10:17:23 plm.(*Port).send:90 TRACE TX 02 62 2c 7d 24 1a 2f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d1
2018/12/31 10:17:28  INFO Error reading packet: EOF
2018/12/31 10:17:34  INFO Error reading packet: EOF
2018/12/31 10:17:39  INFO Error reading packet: EOF
2018/12/31 10:17:44  INFO Error reading packet: EOF
2018/12/31 10:17:49  INFO Error reading packet: EOF
Link Database:
    No links defined
Timeout reading from plm

@abates
Copy link
Owner

abates commented Dec 31, 2018

This seems like a silly question, but you don't have another process reading from /dev/ttyUSB0, do you?

@abates
Copy link
Owner

abates commented Dec 31, 2018

Reading through the trace from the javascript package (and trying to analyze their source) it seems like there are a ton of send timeouts. This would explain why it is so slow. In the js package, a send timeout results in a retry several times. This also explains why the go package gives up, because it never retries if there was no ack seen.

I think I can try to reproduce the behavior of the js package. Give me a few minutes.

@rspier
Copy link
Contributor Author

rspier commented Dec 31, 2018

Not a silly question, and no. I don't have anything else reading from /dev/ttyUSB0.

But, progress!

Increasing writeDelay to 650ms makes things work. 600ms is too short.

I'll send a PR later with that change and a few other little things.

Related to debugging, I'm wondering if there's one or two extra levels of channels and goroutines in the library, that might make more sense being in the application level. But that's for a different day if it works. :)

@abates
Copy link
Owner

abates commented Dec 31, 2018

Okay, branch issue-2 has the updated code that retries on read timeouts. Strange that increasing writeDelay makes things work. That could (will?) make communication really really slow.

Out of curiosity, how many Insteon devices do you have on your network?

@abates
Copy link
Owner

abates commented Dec 31, 2018

Re Debugging: I'm not sure I understand your question, but I'll be the first to admit that the plm portion of code could probably use some work. I feel like everything in insteon is in pretty good shape, but insteon/plm is not well tested or designed.

@rspier
Copy link
Contributor Author

rspier commented Dec 31, 2018

9 devices including the PLM.

the issue-2 branch hangs:

$ time ./ic  device info 2C.7D.24
       Device: Switch (2c.7d.24)
     Category: 02.2f
     Firmware: 0x43
2018/12/31 10:56:53  INFO Read timeout waiting for PLM ACK
2018/12/31 10:56:53  INFO Read timeout waiting for PLM ACK
# 2 minutes have passed

The stack trace is less obvious about what's blocking:

goroutine 0 [idle]:
runtime.futex(0x623da8, 0x0, 0x0, 0x0, 0x7ffe00000000, 0x0, 0x0, 0x0, 0x7ffef017c9e8, 0x40e78b, ...)
	/usr/lib/go-1.10/src/runtime/sys_linux_amd64.s:530 +0x21
runtime.futexsleep(0x623da8, 0x0, 0xffffffffffffffff)
	/usr/lib/go-1.10/src/runtime/os_linux.go:45 +0x4b
runtime.notesleep(0x623da8)
	/usr/lib/go-1.10/src/runtime/lock_futex.go:151 +0x9b
runtime.stopm()
	/usr/lib/go-1.10/src/runtime/proc.go:1952 +0xe5
runtime.findrunnable(0xc420024a00, 0x0)
	/usr/lib/go-1.10/src/runtime/proc.go:2415 +0x50c
runtime.schedule()
	/usr/lib/go-1.10/src/runtime/proc.go:2541 +0x13b
runtime.park_m(0xc420000180)
	/usr/lib/go-1.10/src/runtime/proc.go:2604 +0xb6
runtime.mcall(0x0)
	/usr/lib/go-1.10/src/runtime/asm_amd64.s:351 +0x5b

goroutine 1 [chan receive]:
github.com/abates/insteon.(*I2Device).Links(0xc42000e0d8, 0x0, 0x0, 0x0, 0x0, 0x0)
	/tmp/goi/gopath/src/github.com/abates/insteon/i2device.go:51 +0x2bd
main.printLinkDatabase(0x7f66540a8200, 0xc42000c260, 0x1, 0x7f66540a8200)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/linkdatabase.go:26 +0x4c
main.printDevInfo.func1(0x7f66540a8200, 0xc42000c260, 0xc42000c260, 0x7f66540a8200)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/device.go:126 +0x35
main.devLink(0x5603b8, 0x12, 0xc420065b30)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/device.go:77 +0xeb
main.printDevInfo(0x5713e0, 0xc42000c260, 0x0, 0x0, 0x1, 0xc420092f30)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/device.go:125 +0x29a
main.devInfoCmd(0xc420010130, 0x1, 0x1, 0xc420092f30, 0x0, 0x0)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/device.go:108 +0x41
github.com/abates/cli.(*Command).Run(0xc4200c20e0, 0xc420010130, 0x1, 0x1, 0x8, 0xc4200182d0)
	/tmp/goi/gopath/src/github.com/abates/cli/commands.go:163 +0x1f1
github.com/abates/cli.(*Command).Run.func1(0xc420073680, 0xc420247d2c)
	/tmp/goi/gopath/src/github.com/abates/cli/commands.go:159 +0x4a
main.devCmd(0xc420010130, 0x1, 0x1, 0xc420092e10, 0xc42009a708, 0x0)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/device.go:55 +0x187
github.com/abates/cli.(*Command).Run(0xc4200c2070, 0xc420010120, 0x1, 0x1, 0x18, 0xc420065e08)
	/tmp/goi/gopath/src/github.com/abates/cli/commands.go:163 +0x1f1
github.com/abates/cli.(*Command).Run.func1(0xc400000018, 0x560220)
	/tmp/goi/gopath/src/github.com/abates/cli/commands.go:159 +0x4a
main.run(0xc420010120, 0x2, 0x2, 0xc420092db0, 0x0, 0x0)
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/main.go:110 +0x14a
github.com/abates/cli.(*Command).Run(0xc4200c2000, 0xc420010110, 0x2, 0x2, 0x0, 0x0)
	/tmp/goi/gopath/src/github.com/abates/cli/commands.go:163 +0x1f1
main.main()
	/tmp/goi/gopath/src/github.com/abates/insteon/cmd/ic/main.go:116 +0x76

goroutine 5 [chan send]:
github.com/abates/insteon/plm.(*Port).readLoop(0xc420010280)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/port.go:57 +0x1ed
created by github.com/abates/insteon/plm.NewPort
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/port.go:47 +0x216

goroutine 6 [chan send]:
github.com/abates/insteon/plm.(*Port).process(0xc420010280)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/port.go:68 +0x30a
created by github.com/abates/insteon/plm.NewPort
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/port.go:48 +0x238

goroutine 7 [chan send]:
github.com/abates/insteon/plm.(*PLM).receive(0xc4200ca000, 0xc4201420a0, 0x19, 0x20)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/plm.go:176 +0x12f
github.com/abates/insteon/plm.(*PLM).process(0xc4200ca000)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/plm.go:107 +0x5ea
created by github.com/abates/insteon/plm.New
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/plm.go:91 +0x137

goroutine 9 [chan receive]:
github.com/abates/insteon/plm.(*PLM).Retry(0xc4200ca000, 0xc420100060, 0x2, 0xc4200e6150, 0x0, 0x0)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/plm.go:223 +0x269
github.com/abates/insteon/plm.(*PLM).connect.func1(0xc420073260, 0xc4200ca000, 0xc4200732c0)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/plm.go:246 +0xe1
created by github.com/abates/insteon/plm.(*PLM).connect
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/plm.go:244 +0xbe

goroutine 10 [chan receive]:
github.com/abates/insteon/plm.(*connection).send(...)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/connection.go:75
github.com/abates/insteon/plm.(*connection).process(0xc4200102c0)
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/connection.go:52 +0x255
created by github.com/abates/insteon/plm.newConnection
	/tmp/goi/gopath/src/github.com/abates/insteon/plm/connection.go:38 +0x102

goroutine 11 [select]:
github.com/abates/insteon.(*Network).process(0xc420073680)
	/tmp/goi/gopath/src/github.com/abates/insteon/network.go:79 +0x1e9
created by github.com/abates/insteon.New
	/tmp/goi/gopath/src/github.com/abates/insteon/network.go:72 +0x16a

goroutine 66 [chan receive]:
github.com/abates/insteon.(*switchedDevice).process(0xc420092f00)
	/tmp/goi/gopath/src/github.com/abates/insteon/lighting.go:275 +0x81
created by github.com/abates/insteon.switchedDeviceFactory
	/tmp/goi/gopath/src/github.com/abates/insteon/lighting.go:520 +0x120

goroutine 14 [chan receive]:
github.com/abates/insteon.(*Network).connect.func1(0xc420073ce0, 0xc420073680, 0xc420073d40)
	/tmp/goi/gopath/src/github.com/abates/insteon/network.go:209 +0x60
created by github.com/abates/insteon.(*Network).connect
	/tmp/goi/gopath/src/github.com/abates/insteon/network.go:208 +0x98

goroutine 15 [select]:
github.com/abates/insteon.(*connection).process(0xc4200c3260)
	/tmp/goi/gopath/src/github.com/abates/insteon/connection.go:54 +0x15d
created by github.com/abates/insteon.newConnection
	/tmp/goi/gopath/src/github.com/abates/insteon/connection.go:48 +0x12b

goroutine 16 [select]:
github.com/abates/insteon.(*I1Device).process(0xc42015a000)
	/tmp/goi/gopath/src/github.com/abates/insteon/i1device.go:59 +0x234
created by github.com/abates/insteon.NewI1Device
	/tmp/goi/gopath/src/github.com/abates/insteon/i1device.go:52 +0x133

Have to step away from the computer for a little... back later.

@abates
Copy link
Owner

abates commented Jan 1, 2019

Okay, I merged the PR. @rspier: do you consider this issue resolved? I still have some lingering unease. Even if the PLM stopped sending traffic, there should have been 3 timeouts logged and then an error returned instead of the program hanging indefinitely.

@rspier
Copy link
Contributor Author

rspier commented Jan 2, 2019

I think we should leave it open. Pr #3 is more of a workaround than a fix. I've been playing with some stuff this afternoon and timeouts are still happening in an inconsistent manner. I added some more logging (and delays) to the write path. My suspicions are leading towards a race somewhere, but I haven't nailed it down yet. (Was focusing on implementing something with the library instead.)

@abates
Copy link
Owner

abates commented Jan 4, 2019

I agree that it seems like some kind of race condition. I'll continue to try and determine what's going on here and let you know if I come up with anything.

@abates
Copy link
Owner

abates commented Jan 4, 2019

I'm re-reading the PLM developer guide and something has occurred to me. The modem does not implement hardware flow control, instead implementing software flow control by echoing the bytes received from the host. The way I wrote the current code, the echoed bytes are discarded/ignored. If we rewrite this to wait for the echoed bytes and use that as flow control it may actually fix this problem.

I'll look into using the echoed bytes as flow control.

@abates
Copy link
Owner

abates commented Jan 4, 2019

Nope, that's not right, the echoed traffic is actually accounted for in the plm code...

@abates
Copy link
Owner

abates commented Jan 4, 2019

The PLM developer documentation makes this statement:

The IM buffers IM Commands as it receives them, so you can send a complete IM Command without pause. To maintain compatibility with earlier IM versions, the IM will echo each byte that it receives (earlier versions of the IM used byte echoing for flow control). You can now ignore the byte echos, but in order to avoid overrunning the IM’s receive buffer, you must wait for the IM to send its response to your current IM Command before sending a new one.

Right now, the PLM code will send a command and queue all subsequent commands until an ACK (matching packet ending in 0x06) is received. The above statement seems to indicate that earlier PLMs may rely on echoing (as opposed to acknowledging) for flow control.

@abates
Copy link
Owner

abates commented Jan 4, 2019

The plm/port code had a couple areas where channel closing or error conditions weren't handled quite correctly. I don't think it will affect this issue, but you might try and see what happens now.

@rspier
Copy link
Contributor Author

rspier commented Jan 5, 2019

I think the situation might be improved (lower writeDelay), but not completely. I have to use a writeDelay of 538ms +/- 1ms to consistently read from the linkdb from dimmers. (./ic dimmer info)

My PLM isn't that old, I bought it in April 2015. It's Firmware: 158, if that helps with PLM age determination. From the documentation I can find, that seems to be long after the echo flow control change.

The writeDelay shouldn't be needed at all, looking at "IM RS232 Port Settings" in the manual.

@abates
Copy link
Owner

abates commented Jan 5, 2019

My PLM is firmware 158 as well which makes this issue even more bizarre.

Regarding the writeDelay. I agree that it seems like it shouldn't be needed. However, I've found that without it, I get a ton of NAK's from the PLM when performing many subsequent operations. I've been thinking lately that the better way to do this is to always try to send the command to the PLM immediately, then, if the PLM responds with a NAK set a writeDelay and try again (up to N times).

One thing I've learned is that if there is active traffic on the Insteon network (not necessarily originating from the PLM), then the PLM responds with a NAK. I have a hypothesis that back to back Insteon messages (with no delay) will result in a PLM NAK for the second message. My understanding so far is that this is due to messages being repeated by every device on the network until the hopsLeft count is zero. In the case of a moderate or large Insteon installation (I have about 70 devices on my network) this adds a noticeable delay.

@abates
Copy link
Owner

abates commented Apr 2, 2019

@rspier I updated a bunch of the code to remove go routines and potential race conditions... if you get a chance, can you see if the changes fixed your timeout issue?

@rspier
Copy link
Contributor Author

rspier commented Apr 12, 2019

Sorry for the slow response...

I don't think it does.

I no longer need to raise the timeout, but I still need to raise the writeDelay. The actual value fluctuates. 537ms was working for a while, but 3 minutes later I need something higher.

At bef47b6...

Low writeDelay fails differently

$ ./ic --writeDelay 536ms device info 2c.75.ca
Device: Dimmer (2c.75.ca)
Category: 01.35
Firmware: 67
Link Database:
No links defined
Read Timeout
$ ./ic --writeDelay 536ms device info 2c.75.ca
Device: Dimmer (2c.75.ca)
Read Timeout

High writeDelay works ok

$ ./ic --writeDelay 600ms device info 2c.75.ca
Device: Dimmer (2c.75.ca)
Category: 01.35
Firmware: 67
Link Database:
Flags Group Address Data
UR 5 33.14.1d ff 1f 01
UC 1 33.14.1d 03 1f 01
UR 7 33.14.1d ff 1f 01
UR 5 33.14.bd ff 1f 01
UR 7 33.14.bd ff 1f 01
UC 1 33.14.bd 03 1f 01
UR 0 34.78.d9 ff 1f 01
UC 1 34.78.d9 03 1f 01

--log trace slows things down enough to make it happy without writeDelay.

@abates
Copy link
Owner

abates commented Apr 13, 2019

I've added a ttl flag which you might want to play with to see if it changes anything (although I'm doubtful it will have any effect). I also changed the ErrReadTimeout in one place to be an ErrAckTimeout to try and determine where things are happening.

Please try again, if you wouldn't mind. Also, if you could post a new trace, that would be helpful. I don't think the trace will really change, but with all the changes I made recently I just want to be sure.

Thanks

@rspier
Copy link
Contributor Author

rspier commented Apr 14, 2019

Good news and bad news.

Good news:
It now works without any flags on the simple Micro On/Off.

$ go run github.com/abates/insteon/cmd/ic device 2c.75.ca info
Device: Dimmer (2c.75.ca)
Category: 01.35
Firmware: 67
Link Database:
Flags Group Address Data
UR 5 33.14.1d ff 1f 01
UC 1 33.14.1d 03 1f 01
UR 7 33.14.1d ff 1f 01
UR 5 33.14.bd ff 1f 01
UR 7 33.14.bd ff 1f 01
UC 1 33.14.bd 03 1f 01
UR 0 34.78.d9 ff 1f 01
UC 1 34.78.d9 03 1f 01

Bad news:
It now works without any flags on the simple Micro On/Off, and nothing obvious changed.

Good news:
One of the keypadlincs works consistently.

Bad news:
The other (identical?) keypadlinc is flaky.

3 Runs, 3 outputs:

$ go run github.com/abates/insteon/cmd/ic device 33.14.bd info
       Device: Dimmer (33.14.bd)
     Category: 01.41
     Firmware: 67
Link Database:
    Flags Group Address    Data
<snip>
$ go run github.com/abates/insteon/cmd/ic device 33.14.bd info
Read Timeout
exit status 1
$ go run github.com/abates/insteon/cmd/ic device 33.14.bd info
Ack Timeout
exit status 1

Interestingly, if it's going to fail, it fails early (4.8 seconds). Getting the entire link database is slow. (22 or 25 seconds, 19 with trace enabled.)

Here's an Ack Timeout trace:

$ time go run github.com/abates/insteon/cmd/ic -log trace device 33.14.bd info 2>/tmp/t
       Device: Dimmer (33.14.bd)
     Category: 01.41
     Firmware: 67
Link Database:
    No links defined
Ack Timeout

real	0m5.120s
user	0m0.342s
sys	0m0.094s

acktimeout.txt

And a Read Timeout:

$ time go run github.com/abates/insteon/cmd/ic -log trace device 33.14.bd info 2>/tmp/t
       Device: Dimmer (33.14.bd)
Read Timeout

real	0m4.854s
user	0m0.292s
sys	0m0.077s

readtimeout.txt

The TTL flag is already set to it's maximum of 3. Lowering it to 1 got me one Ack timeout (likely unrelated) and 2 successes

Lowering TTL to 0 returned 3 Ack Timeouts followed by an unexpected success. Setting TTL to 4 (which should be the same as 0) performed similarly. I think that's because it takes 20 seconds for the data to get back to the PLM, and the next time I ask for it, it's there.

Thanks!

-R
.

@rspier
Copy link
Contributor Author

rspier commented Apr 14, 2019

See pull request #10 for a fix to actually set the TTL. It (surprisingly?) doesn't change any of the behavior seen previously.

@abates
Copy link
Owner

abates commented Apr 14, 2019

You know, I wonder if we stumbled onto something here. Due to the bug where TTL wasn't getting set, the PLM was actually sending out a TTL of zero, which means no other device on the network would propagate it. I don't think the PLM will change the ttl, either. Presumably, the receiving device gets the message the the PLM sent and responds, even though the TTL is zero.

After you made the changes for PR #10, if you set the TTL to 2 or 3 does everything still work as described in your most recent notes?

@abates
Copy link
Owner

abates commented Apr 14, 2019

As for the read timeout above, it looks suspiciously like a race condition. There are two ID request messages sent. The response for ID request is that the device will ACK the request and then send a broadcast "set button pressed controller/responder" message. If you look at the read timeout trace, we have two ID requests, two ID request ACKs and two broadcast messages. However, one of the broadcast messages appears to come before the ID request message. This really looks like a race condition somewhere...

@abates
Copy link
Owner

abates commented Apr 14, 2019

Oh, and I'm now able to seemingly reproduce the problem, so yay for troubleshooting on my end.

@abates
Copy link
Owner

abates commented Apr 14, 2019

there is also an intermittent failing test:

Andrews-MBP:insteon abates$ go test -race
--- FAIL: TestConnectionIDRequest (0.00s)
    connection_test.go:245: Want ErrReadTimeout got Ack Timeout

It sometimes fails, but not often.

abates added a commit that referenced this issue Apr 14, 2019
Connection really should be synchronous, this commit
removes the buffering from channels in an effort to
fix issue #2
@abates
Copy link
Owner

abates commented Apr 14, 2019

After re-reading the Insteon Developer Guide "Timeslot Synchronization" section I updated the PLM transmit code to wait for 2 * (6 or 13 depending on standard/extended messages) * ttl zero crossings (basically that value / 60). It looks like the appropriate amount of time to wait for standard messages is about 600ms and for extended length messages is 1.3s. This seems to correspond to what you are seeing for behavior on your end.

If you want to test the dynamic writeDelay just set -writeDelay 0 and see how it works.

relevant code here

@rspier
Copy link
Contributor Author

rspier commented Apr 14, 2019

ref 8102653

$ time go run -v ./cmd/ic device 33.14.bd info
took 44 seconds to successfully pull the info from the KeypadLinc. 31 seconds the second time. Read timeout after 4 seconds the third time. Ack timeout the fourth and fifth time.

after power cycling the PLM...
$ time go run -v ./cmd/ic -writeDelay 0 device 33.14.bd info

7 successful runs. fastest 19.2, slowest 23.8 seconds. avg around 22 seconds.

this is looking really good!

Don't try and run two at once :) But that's a different problem (but maybe points to needing an external lock somewhere.)

-R

@abates
Copy link
Owner

abates commented Apr 14, 2019

I wonder why power cycling the PLM changed the outcome?

@rspier
Copy link
Contributor Author

rspier commented Apr 14, 2019

I meant for that to normalize test, but I think I inadvertently confused things. It does appear something was up though.

without powercycling...

$ time go run -v ./cmd/ic device 33.14.bd info
1: 20.0
2: 18.9
3: 21.2
4: 19.9
5: 21.3
6: 18.9
7: .... more of the same
till something like 12... when it got uninteresting.

no writeDelay=0, and it's still succeeding. 🙁

-R

@abates
Copy link
Owner

abates commented Apr 15, 2019

So... where do we stand with this issue?

@rspier
Copy link
Contributor Author

rspier commented Apr 16, 2019

I think we can close it. I've been unable to trigger a failure. We can always open it again if it comes back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants