Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIR Runtime on macOS corrupts SecureRTMPChannel #2625

Open
busitech opened this issue Apr 26, 2023 · 55 comments
Open

AIR Runtime on macOS corrupts SecureRTMPChannel #2625

busitech opened this issue Apr 26, 2023 · 55 comments
Labels

Comments

@busitech
Copy link

The AIR Runtime corrupts SecureRTMPChannel connections on macOS, but not on Windows. Our application makes RemoteObject calls to the server over this channel, which begins to hang forever on macOS after a few transactions. The initial connection is usually successful, and sometimes even a few successful remote calls can be made before the hang occurs.

Compiler: AIR SDK 32.0.0.116
AIR Runtime: 32.0.0.116 to 50.2.1.1
Platform: macOS Ventura 13.3.1

@busitech busitech added the Bug label Apr 26, 2023
@busitech
Copy link
Author

@ajwfrost Is there a particular reason this serious bug report has received zero interaction in almost one month's time?

@ajwfrost
Copy link
Collaborator

Hi

Apologies for the delay, we did see this one but there have been a lot of other activities going on; there is quite an effort to set up a reproduction for this issue so it's not yet been investigated. If you're able to provide any sample code/SWF file that we could use to run this on a mac and to make it easier to reproduce, it would greatly help.

thanks

Andrew

@busitech
Copy link
Author

Hello @ajwfrost,

Thank you very much for your reply. Please help us understand what you have, and what you are missing, so we can help fill in the gaps so you can quickly recreate this.

When a fresh build of the AIR Runtime is compiled, are there regression tests of the RTMP and RTMPS communication protocols that must pass before release? If so, I am wondering, do these tests actually connect out and transfer data as part of the test? In our tests, almost everything fails, which is why I was hoping this would be easy to recreate with only the bare minimum of existing tests that you might already have at your disposal.

I think the fix for the issue will be in the low level RTMPS communication routines of the AIR Runtime, in an area of the code which is specific to macOS, because the application runs perfectly on Windows. RTMP works perfectly on macOS, but RTMPS hangs almost immediately, sometimes not even completing the handshake that bootstraps the encryption by negotiating the cipher suite and exchanging keys. Therefore, the problem must be in the area of encryption and decryption, since RTMPS is just the encrypted version of RTMP.

In order to actually use RTMPS the way we do, you would need a server that speaks the protocol, and the SWC that activates the RTMPS protocol in the AIR Runtime. Do you have a preference in this regard?

In our case, this layer is the Adobe LiveCycle Data Services (LCDS) product, and of course, we only have the binary version. Do you have access to the Adobe LCDS product, and if so, do you use it when regression testing the AIR Runtime?

Adobe LCDS was distributed with sample applications. I haven't tried those yet, but I assume that those will be a good starting point for recreating the issue quickly, if you can use LCDS on your end.

We are ready to assist once we understand what you need to proceed.

Thank you,

Matthew

@ajwfrost
Copy link
Collaborator

Hi

In order to actually use RTMPS the way we do, you would need a server that speaks the protocol, and the SWC that activates the RTMPS protocol in the AIR Runtime.
Do you have access to the Adobe LCDS product?

This is our main problem... we do not have any instance of LiveCycle or AMS which means that we aren't able to run the RTMP based testing. It's an area we've not really touched and it's not actually widely used now..

I notice that you've mentioned the issue also existed in the 32.0 runtime, which at least makes sense in terms of it not being a regression that we'd inadvertently introduced. Given your descriptions above, probably the best thing is if I can talk to Adobe and see if they can support us with LiveCycle, or if this is also something we could test with AMS then we can talk with Veriskope..

Although it might actually be that the quickest/best option is for us to work directly with you for debugging, if you have the environment and reproduction already set up. Your analysis has highlighted the particular area of code where we'd need to do the digging so this could be an option perhaps? If we sent you a build of the macOS runtime that output a lot of logging for this, we may be able to pinpoint the issue or progressively narrow it down to fix it..?

thanks

Andrew

@busitech
Copy link
Author

Yes, we'd be happy to run the debug build here and send you the logs produced.

@busitech
Copy link
Author

busitech commented Jun 1, 2023

@ajwfrost I have more information for you. There are actually two bugs. The client side is falling back to RTMPT because of a second Adobe server side bug that prevents RTMPS from connecting. It is the RTMPT fallback strategy which does not work well on the macOS side, while the Windows platform falls back to RTMPT successfully.

@ajwfrost
Copy link
Collaborator

ajwfrost commented Jun 5, 2023

Okay thanks -> curious as I'm not 100% sure what code there is platform-specific. We'll look into it and try to get you a debug build of the macOS runtime shortly..

@busitech
Copy link
Author

@ajwfrost It has been close to 6 months since your last message about getting us a debug build. We were looking forward to using your debug build of AIR for macOS to assist you with debugging.

As the macOS operating system continues to advance, the AIR runtime for macOS seems to be getting worse as time goes on. RTMP and RTMPS connections are becoming unstable and practically usable.

Is it intended that these protocols will be maintained?

@ajwfrost
Copy link
Collaborator

@busitech apologies for that! This one had completely fallen off our radar I'm afraid, do please feel free to chase up on things. From our perspective we would have hoped that these protocols would continue to work as-is, so if there are changes in the OS that are impacting them, then I'd hope we can work around them. The other option we'd be looking at is to pull the RTMP-related functionality out of the private Adobe source code so that it could be implemented in a separate, open source library (and kept up to date with what Veovera are doing).

Anyway -> let me kick this activity off again and try to get you something that outputs more details around this area...

thanks

@ajwfrost
Copy link
Collaborator

Hi @busitech

Would you be able to download a test version of the AIR runtime from the below link, which has a fair amount of debug trace added around the RTMP/RTMPS functionality. This may not immediately help, but it should show us whereabouts things are going wrong so that we can add some more targeted debugging in the specific area.
https://transfer.harman.com/message/3XvxIKiDFG0LJkNN3lp2K0

thanks

@busitech
Copy link
Author

Hello @ajwfrost

Thank you for sending the test version of the AIR runtime. It produced the debug trace log shown below:

[0:02.937] Starting mac network detector on thread 0x600002358180
[0:21.666] NetConnection - connection request for rtmps://rtmpserver:2443
[0:21.666] Connection: local 0, protocol 4, domain rtmpserver, app name 
[0:21.666] NetConnectionIO -> connect rtmpserver:2443, protocol 4
[0:21.666] Proxy type is best
[0:21.666] Broadcast fpad on port 67
[0:21.666] Net connection succeeded
[0:21.668] NetConnection IO thread 0x6000023c6140 server rtmpserver port 2443 protocol 4
[0:21.668] NetConnection IO - connecting with RTMPS, native 1 (1), try connect 0 (1)
[0:21.668] OSX SSL socket connecting(rtmpserver, 2443, 0)
[0:21.698] Sending SWF hash 0 0x3bdb44f79db0
[0:21.721] OSX SSL socket initating SSL(rtmpserver)
[0:21.743] Sending SWF hash 0 0x3bdb44f79db0
[0:21.785] Sending SWF hash 0 0x3bdb44f79db0
[0:21.826] Sending SWF hash 0 0x3bdb44f79db0
[0:21.827] OSX SSL socket - SSL connection results 8
[0:21.827] NetConnectionIO::DoConnect rtmps protocol, nagle 0, native SSL(1), connect failed.

[0:21.827] SSL result = (null)
[0:21.827] NetConnection IO - connecting with RTMPS, native 1 (1), try connect 1 (1)
[0:21.827] NetConnectionIO::DoConnect rtmps protocol, CONNECT method proxy, tunnel open failed.

[0:21.827] SSL result = (null)
[0:21.827] NetConnection IO - connecting with RTMPS, native 0 (1), try connect 0 (1)
[0:21.869] Sending SWF hash 0 0x3bdb44f79db0
[0:21.909] Sending SWF hash 0 0x3bdb44f79db0
[0:21.952] Sending SWF hash 0 0x3bdb44f79db0
[0:21.993] Sending SWF hash 0 0x3bdb44f79db0
[0:22.031] NetConnection IO connected=0
[0:22.031] Sending SWF hash 0 0x3bdb44f79db0
[0:36.702] NetConnection - connection request for rtmps://rtmpserver:2443
[0:36.702] Connection: local 0, protocol 4, domain rtmpserver, app name 
[0:36.702] NetConnectionIO -> connect rtmpserver:2443, protocol 4
[0:36.702] Proxy type is http
[0:36.702] Broadcast fpad on port 67
[0:36.702] Net connection succeeded
[0:36.702] NetConnection IO thread 0x6000023c7340 server rtmpserver port 2443 protocol 4
[0:36.703] NetConnection IO - connecting with RTMPS, native 0 (1), try connect 0 (1)
[0:36.742] Sending SWF hash 0 0x3bdb44f79db0
[0:36.783] Sending SWF hash 0 0x3bdb44f79db0
[0:36.827] Sending SWF hash 0 0x3bdb44f79db0
[0:36.869] Sending SWF hash 0 0x3bdb44f79db0
[0:36.869] NetConnection IO connected=0
[0:36.909] Sending SWF hash 0 0x3bdb44f79db0
[0:36.911] [FaultEvent fault=[RPC Fault faultString="Send failed" faultCode="Client.Error.MessageSend" faultDetail="Channel.Connect.Failed error undefined url:'rtmps://rtmpserver:2443'"] messageId="89FCA69C-730E-CDF9-0343-8A7A915E0979" type="fault" bubbles=false cancelable=true eventPhase=2]

@ajwfrost
Copy link
Collaborator

Thanks for that. So it looks like we're first trying to connect using "native" SSL within the RTMP handling code, but that this fails:
OSX SSL socket - SSL connection results 8
So then we try to connect via HTTPS, we've not go so much logging in that part but I suspect it's the same issue as caused a problem for the RTMPS.

Result 8 is a self-signed certificate... we did change some handling here a while ago for people to be able to decide for themselves whether to accept self-signed certs. Looking at the code here though, it originally accepted this (within the RTMPS handling) and we have the same override. So actually, with an SSL connection result of 8, we should have accepted that as connected...

Although I think I've also just spotted a separate bit of code that has picked up that response value and causing the socket to shut down...

We can review this part again and see if there's an issue with the connection being killed off due to the self-signed certificate on the server.

thanks

@busitech
Copy link
Author

@ajwfrost We won't be able to work on the problem this issue was opened to solve until this new problem of not being able to connect at all in 50.2.4.0 is resolved. We opened this issue when 50.2.1.1 was still current, and that version would connect just fine but then the connection would soon go down randomly after a few minutes. We are eager to work on that issue, but this new issue in the later build will need to come first since it prevents connection. I might be able to set up a server with a real certificate and possibly get a head start on that issue, if that allows 50.2.4.0 to connect. If it would be easy for you to send a debug version of 50.2.1.1, that would also work. Thank you!

@ajwfrost
Copy link
Collaborator

Okay thanks, understood - I did wonder at this because you'd mentioned it was a more sporadic issue and had been around for a while, sorry I'd missed the fact that we'd just made it worse and immediately reproducible!

I'll upload an updated version of the latest runtime that allows the SSL connection, and we can then start delving into why it may be dropping after a while.

thanks

@ajwfrost
Copy link
Collaborator

Hi again
Here's a version that should work with self-signed certificates on the server.. at least on macOS. Looking at it, I think the same thing may now be happening on Windows too, we can correct for that in our next proper release though.
thanks
https://transfer.harman.com/message/5HdEE9yYMHhk8JaZYpMUSf

@busitech
Copy link
Author

@ajwfrost It looks like no progress was made with today's version. Nevertheless, thank you for giving it a shot so quickly.

[0:05.424] Starting mac network detector on thread 0x600003c74300
[0:11.003] NetConnection - connection request for rtmps://localhost:2443
[0:11.003] Connection: local 0, protocol 4, domain localhost, app name 
[0:11.003] NetConnectionIO -> connect localhost:2443, protocol 4
[0:11.003] Proxy type is best
[0:11.003] Broadcast fpad on port 67
[0:11.003] Net connection succeeded
[0:11.006] NetConnection IO thread 0x7000020a5000 server localhost port 2443 protocol 4
[0:11.006] NetConnection IO - connecting with RTMPS, native 1 (1), try connect 0 (1)
[0:11.006] OSX SSL socket connecting(localhost, 2443, 0)
[0:11.006] OSX SSL socket initating SSL(localhost)
[0:11.039] Sending SWF hash 0 0x21bf1b101db0
[0:11.081] Sending SWF hash 0 0x21bf1b101db0
[0:11.081] OSX SSL socket - SSL connection results 8
[0:11.081] OSX SSL socket - CLOSING
[0:11.081] TCP connecting via SSL -> 8
[0:11.081] NetConnectionIO::DoConnect rtmps protocol, result 8, nagle 0, native SSL(1), connect succeeded.

[0:11.081] NetConnection IO connected=1
[0:11.081] NetConnection IO - starting read thread 0x700002128000
[0:11.082] NetConnection IO - bytes written = 1, message 0x0
[0:11.121] Sending SWF hash 0 0x21bf1b101db0
[0:26.042] NetConnection - connection request for rtmps://localhost:2443
[0:26.042] Connection: local 0, protocol 4, domain localhost, app name 
[0:26.042] NetConnectionIO -> connect localhost:2443, protocol 4
[0:26.042] Proxy type is http
[0:26.042] Broadcast fpad on port 67
[0:26.042] Net connection succeeded
[0:26.043] NetConnection IO thread 0x7000020a5000 server localhost port 2443 protocol 4
[0:26.043] NetConnection IO - connecting with RTMPS, native 0 (1), try connect 0 (1)
[0:26.080] Sending SWF hash 0 0x21bf1b101db0
[0:26.121] NetConnection IO connected=0
[0:26.121] Sending SWF hash 0 0x21bf1b101db0
[0:26.121] [ChannelFaultEvent faultCode="Channel.Connect.Failed" faultString="error" faultDetail="undefined url:'rtmps://localhost:2443'" channelId="my-rtmps" type="channelFault" bubbles=false cancelable=false eventPhase=2]

@busitech
Copy link
Author

In the mean time, I have installed a CA-signed certificate on a server, successfully connected, and was able to reproduce another issue, but I'm not sure if it is a new issue, or our original main issue.

What I found easy to reproduce today is an infinite loop and hang when a debugger is connected, such as IntelliJ IDEA, and RemoteObject calls are made. I reproduced this issue when connecting with both RTMPS and RTMP using the AIR Runtime v50.2.4.0. Previously, when using the AIR Runtime v50.2.1.1, we were able to revert to RTMP to avoid our main issue. This makes me wonder if this could be a new issue, or just a really easy way to get the same issue to happen, that now affects both network protocols.

In any event, this one is extremely consistent, I can repro in two minutes.

I hope what is below helps you get closer to the root cause, and we can test further as needed.

After RTMP or RTMPS connection, I see this line in the console being logged continuously, even though no network traffic is occurring (the message implies sending). You will notice how these messages are milliseconds apart:

[0:42.666] Sending SWF hash 0 0x19403743e348
[0:42.708] Sending SWF hash 0 0x19403743e348
[0:42.751] Sending SWF hash 0 0x19403743e348
[0:42.793] Sending SWF hash 0 0x19403743e348
[0:42.836] Sending SWF hash 0 0x19403743e348
[0:42.877] Sending SWF hash 0 0x19403743e348

After one or two successful server RemoteObject calls, all debug messages stop being logged, and CPU utilization goes sky high on the adl process. There is apparently an infinite loop in progress now, inside the runtime. The server has just finished responding to a request from the client, and the client received the message, processed it, and updated the UI. The bug is sensitive to timing, because the same number of tries is not always

The debugging functions are stopped or hung now, and the app is totally malfunctioning.

While in this state, the application is still able to send and receive Command messages (subscribe, unsubscribe) in response to certain UI events, and the server acknowledges, and even one remoting call from every remote object will send, but incoming remote object messages seem to be going into a queue that is no longer processing what is being added to the queue.

When the mouse hovers over the grid control, the background color change will follow the mouse as it usually does, so the UI itself is not frozen. The app seems hung with respect to incoming messages. Any UI events which normally fire more outbound network traffic will do so, and the server even responds, as long as the object that sends the message does not have an outstanding request. Once a given remote object has sent a message that expects a response, the response is received from the server, but since the message never gets processed, that remote object becomes hung too.

The only thing the debugger can do now is kill the app. Pause doesn't work, and of course breakpoints don't work either.

I am having trouble reproducing this if I start adl from the command line, but since that one trace message is constantly being logged, the logging fills up Scout's memory over time, so we don't have a very good shot at using Scout over a long period of time. I turned off some of the unnecessary recording just to get more time, and I'm going to leave a session run overnight and see what happens.

Here are the last trace messages that are seen in the console:

[trace] Sending SWF hash 0 0x3da12ed4c348 <---- 1,000's of these messages above....
[trace] Sending SWF hash 0 0x3da12ed4c348
[trace] Sending SWF hash 0 0x3da12ed4c348
[trace] Sending SWF hash 0 0x3da12ed4c348
[trace] Sending SWF hash 0 0x3da12ed4c348
[trace] Sending SWF hash 0 0x3da12ed4c348
[trace] NetConnection call via data stream 512 bytes
[trace] NetConnection IO - bytes written = 1, message 0x0
[trace] NetConnection IO - bytes written = 1, message 0x0
[trace] NetConnection IO - bytes written = 1, message 0x0
[trace] NetConnection IO - bytes written = 1, message 0x6000017f4980
[trace] NetConnection IO - read 4096 bytes
[trace] NetConnection IO - read 4096 bytes
[trace] NetConnection IO - read 2350 bytes
[trace] NetConnection call via data stream 512 bytes
[trace] NetConnection IO - bytes written = 1, message 0x0
[trace] NetConnection IO - bytes written = 1, message 0x0
[trace] NetConnection IO - bytes written = 1, message 0x60000178f340
[trace] NetConnection IO - read 156 bytes
[trace] NetConnection call via data stream 512 bytes
[trace] NetConnection IO - bytes written = 1, message 0x0
[trace] NetConnection IO - bytes written = 1, message 0x0
[trace] NetConnection call via data stream 512 bytes <----This is the last message that will be logged.

@ajwfrost
Copy link
Collaborator

Thanks for the quick feedback!
An updated version again: https://transfer.harman.com/message/lBH7PdBBpGHoRiiCkYvAim
This one should hopefully correct the RTMPS ssl set-up... apologies for not quite getting that right last time!

The "Sending SWF hash" is actually not sending a hash (the '0' value stops this) so I've moved that so it's only reported if we do actually send a hash. This forms part of the 'idle cycle' i.e. the stuff we check within the runtime's main thread loop which also drives the AS3 events and animation cycle, hence it appearing quite a lot!

The issues you're describing sound like something is stuck within the 'read' thread, i.e. we are receiving data, maybe adding it to a queue but not processing that, and maybe then something is in a tight loop...

Would it be possible to enable core dumps on your machine, and then whilst it's hanging, kill the process from a terminal? you should be able to click on "reply" on the above message link to be able to then upload a dump file to return so that we can check the call stacks...

thanks

@busitech
Copy link
Author

You have RTMPS working with this update, thank you! I have a full core dump saved (9.5GB), and I have sent you the backtrace of all threads, by replying to the message. Please let me know if you need more information.

@busitech
Copy link
Author

@ajwfrost I believe we have the intermittent issue recreated, and the log messages captured have explained more about what is going on under the covers.

The NetConnection logs an "Address changed" event, but the address being connected to during our test was localhost, and this address never changes.

[57:53.356] NetConnection IO - processing read buffer, message 0x6000009bdc40, remaining 0
[57:53.356] NetConnection IO - message id 17, stream 0
[58:09.221] Address changed (thread %p)    <---- Impossible for localhost !!!
[58:09.221] NetConnection IO - read thread is quitting
[58:09.276] NetConnection - connection request for rtmps://localhost:2443  <---- reconnecting to localhost
[58:09.276] Connection: local 0, protocol 4, domain localhost, app name 

In the log, we also notice that the memory address of the NetConnection IO thread and the address of the next "starting read thread" are always the same values each time the same messages are logged, even though there is also a message indicating that the read thread is quitting. I'm not sure if this is interesting or not, but I thought I'd mention it.

@ajwfrost
Copy link
Collaborator

Hi

Thanks for checking this, apologies for the delay due to the Christmas/New Year period! With that hint about the "address changed", it seems a little odd, because the socket should only be reset if we're no longer able to pick up the remote connection. The callback comes from the OS itself but we should ignore this as long as we're still active - and there's even code there that checks if it's a local/loopback address, in which case it should definitely then continue!

So, a little more debug needed I'm afraid .. below is an updated version with some additional debugging around this area.
https://transfer.harman.com/message/vbUOL53Iup1t2QhmXBg1zh

many thanks

Andrew

@busitech
Copy link
Author

Andrew, I have your updated debugging information:

NetConnection IO - processing read buffer, message 0x6000020aca00, remaining 0
NetConnection IO - message id 17, stream 0
Address changed (thread %p)
NetConnection IO - NetworkChanged callback
********************* NETWORK CHANGED tSocketIO: a81ab600******************************

** tcpAddress: [::1]:51172 working: 0

Send network change AS3 event
NetConnection IO - read thread is quitting
NetConnection - connection request for rtmps://localhost:2443
Connection: local 0, protocol 4, domain localhost, app name 
NetConnectionIO -> connect localhost:2443, protocol 4
Proxy type is best
Broadcast fpad on port 67
Net connection succeeded
NetConnection IO thread 0x700002136000 server localhost port 2443 protocol 4
NetConnection IO - connecting with RTMPS, native 1 (1), try connect 0 (1)
OSX SSL socket connecting(localhost, 2443, 0)
OSX SSL socket initating SSL(localhost)
OSX SSL socket - SSL connection results 8
TCP connecting via SSL -> 8
NetConnectionIO::DoConnect rtmps protocol, result 8, nagle 0, native SSL(1), connect succeeded.

NetConnection IO connected=1
NetConnection IO - starting read thread 0x7000022bf000
NetConnection IO - bytes written = 1, message 0x0
NetConnection IO - read 3073 bytes

I wanted to clarify that the issue happens to both localhost connections and connections to external addresses, in case there is any doubt about that.

Also, just a reminder that we also shared the stack trace which explains how the runtime crashes when the debugger is attached, which is a separate issue. We have to run the adl command from the command line to test with the later versions of the runtime. (Attaching IDEA causes a hang within a few seconds of running the program.)

Thank you.

@ajwfrost
Copy link
Collaborator

Thanks for this.

So the bit where it says working: 0 is concerning, but then we're not seeing the subsequent traces that we had put in where it's looking to see whether or not to close the connection based on whether the remote address is still accessible. So I'm wondering if something else is blocking it or disconnecting it .. do you have any AS3 handler for the NetConnection.Connect.NetworkChange event?

We've added a bit more checking of the network situation here and trying to find out exactly why it's dropping out of the loop. And we're not sending out that AS3 event now.

Sorry to ask for another attempt, thanks for your time on this so far!
https://transfer.harman.com/message/1i4nhN1oPhxGdKxLPga7qN

The crash with debugger we will look at shortly, that may well just be because of changes we've added in for this debugging session..

thanks

@ajwfrost
Copy link
Collaborator

Hi @busitech - just wondering if you'd been able to capture some logs of this going wrong? We've done some testing here, the network changed event happens e.g. if we add a VPN connection, but the original network path still shows as valid (i.e. we've not been able to get it to show working: 0)

It may be an IPv4 vs IPv6 thing, but if you have the logs it would be handy, I would hope this is the final one needed! and I was then hoping we could get a fix in for our next release, which I've been postponing slightly in case we can squeeze this in!

thanks

@busitech
Copy link
Author

Hi @ajwfrost we went to work immediately on the last release you provided, and we are wondering if one of the changes you made could have resolved the issue, because we haven't seen the issue again since installing that last debug version you provided.

@ajwfrost
Copy link
Collaborator

Interesting! I will double-check the diffs, but my recollection was that the only functional change we made there was to swallow that NetConnection.Connect.NetworkChange event. So if you're able to look if there's a handler added for the NetStatus event to do something there, it may be good.

It would still be useful if we could see the log around that "NETWORK CHANGED" message, so that we can double-check how the IPv6 addresses are being handled - I still can't quite see how we would have ended up with working: 0 in the log, which happens prior to that network change as3 event... and I feel is probably the key to the whole thing!

thanks

@busitech
Copy link
Author

Not only are we not seeing disconnects in this version of the debug runtime, we also haven't seen the messages you're looking for either, which is why we've had nothing to send your way. The interesting log entries are not showing up now, but we're watching for them nonetheless, and will send one if we see it. You might want to send a debug version with more problems and less changes that are acting like fixes, because it is like we've lost our ability to recreate the issue.

@busitech
Copy link
Author

Hello @ajwfrost ,

We have some log data for you!

[trace] ********************* NETWORK CHANGED tSocketIO: 3e194c00******************************
[trace] MacNetworkDetector: checking if 0x7fb43e194df8 is a working local interface
[trace] IPv6 - checking address is set 1
[trace] IPv6 - family is 30 (INET 2, INET6 30)
[trace] IP string is [::1]
[trace] Storage family is %d
[trace] Comparing [fd7a:bcc9:1c03:0:0:0:0:2] with [::1] (ipv6)
[trace] Comparing [fe80:0:0:0:c49:f376:b658:2c53] with [::1] (ipv6)
[trace] Returning 0 for 'valid' from 'is working local interface'
[trace] ** tcpAddress: [::1]:64762 working: 0
[trace] DON'T Send network change AS3 event
[trace] Not shutdown and not using HTTP
[trace] Not using RTMFP
[trace] Not RTMFP, we have a remote address, local = 1
[trace] Finished network changed handler
[trace] Address changed (thread %p)
[trace] NetConnection IO - NetworkChanged callback
[trace] Checking remote address of our TCP socket...
[trace] Got a remote address
[trace] Set IP address: loopback 1, link local 0
[trace] ********************* NETWORK CHANGED tSocketIO: 3e194c00******************************
[trace] MacNetworkDetector: checking if 0x7fb43e194df8 is a working local interface
[trace] IPv6 - checking address is set 1
[trace] IPv6 - family is 30 (INET 2, INET6 30)
[trace] IP string is [::1]
[trace] Storage family is %d
[trace] Comparing [fe80:0:0:0:c49:f376:b658:2c53] with [::1] (ipv6)
[trace] Returning 0 for 'valid' from 'is working local interface'
[trace] ** tcpAddress: [::1]:64762 working: 0
[trace] DON'T Send network change AS3 event
[trace] Not shutdown and not using HTTP
[trace] Not using RTMFP
[trace] Not RTMFP, we have a remote address, local = 1
[trace] Finished network changed handler
[trace] Address changed (thread %p)
[trace] NetConnection IO - NetworkChanged callback
[trace] Checking remote address of our TCP socket...
[trace] Got a remote address
[trace] Set IP address: loopback 1, link local 0
[trace] ********************* NETWORK CHANGED tSocketIO: 3e194c00******************************
[trace] MacNetworkDetector: checking if 0x7fb43e194df8 is a working local interface
[trace] IPv6 - checking address is set 1
[trace] IPv6 - family is 30 (INET 2, INET6 30)
[trace] IP string is [::1]
[trace] Storage family is %d
[trace] Comparing [fd93:2ede:9e4f:0:0:0:0:2] with [::1] (ipv6)
[trace] Comparing [fe80:0:0:0:c49:f376:b658:2c53] with [::1] (ipv6)
[trace] Returning 0 for 'valid' from 'is working local interface'
[trace] ** tcpAddress: [::1]:64762 working: 0
[trace] DON'T Send network change AS3 event
[trace] Not shutdown and not using HTTP
[trace] Not using RTMFP
[trace] Not RTMFP, we have a remote address, local = 1
[trace] Finished network changed handler
[trace] Address changed (thread %p)
[trace] NetConnection IO - NetworkChanged callback
[trace] Checking remote address of our TCP socket...
[trace] Got a remote address
[trace] Set IP address: loopback 1, link local 0
[trace] ********************* NETWORK CHANGED tSocketIO: 3e194c00******************************
[trace] MacNetworkDetector: checking if 0x7fb43e194df8 is a working local interface
[trace] IPv6 - checking address is set 1
[trace] IPv6 - family is 30 (INET 2, INET6 30)
[trace] IP string is [::1]
[trace] Storage family is %d
[trace] Comparing [fe80:0:0:0:c49:f376:b658:2c53] with [::1] (ipv6)
[trace] Returning 0 for 'valid' from 'is working local interface'
[trace] ** tcpAddress: [::1]:64762 working: 0
[trace] DON'T Send network change AS3 event
[trace] Not shutdown and not using HTTP
[trace] Not using RTMFP
[trace] Not RTMFP, we have a remote address, local = 1
[trace] Finished network changed handler

@busitech
Copy link
Author

Good evening @ajwfrost

Here are a couple more with the timestamps enabled. We're unsure why the problem seems to be back.

[1:19.115] NetConnection IO - read 159 bytes
[1:19.115] NetConnection IO - processing read buffer, message 0x6000029c9400, remaining 0
[1:19.115] NetConnection IO - message id 17, stream 0
[1:19.115] Read thread is looping
[2:12.712] Address changed (thread %p)
[2:12.712] NetConnection IO - NetworkChanged callback
[2:12.712] Checking remote address of our TCP socket...
[2:12.712] Got a remote address
[2:12.712] Set IP address: loopback 1, link local 0
[2:12.712] ********************* NETWORK CHANGED tSocketIO: 428faa00******************************

[2:12.712] MacNetworkDetector: checking if 0x7fe8428fabf8 is a working local interface
[2:12.712] IPv6 - checking address is set 1
[2:12.712] IPv6 - family is 30 (INET 2, INET6 30)
[2:12.712] IP string is [::1]
[2:12.712] Storage family is %d
[2:12.712] Comparing [fe80:0:0:0:c49:f376:b658:2c53] with [::1] (ipv6)
[2:12.712] Comparing [fd1f:c3e5:bb49:0:0:0:0:2] with [::1] (ipv6)
[2:12.712] Returning 0 for 'valid' from 'is working local interface'
[2:12.712] ** tcpAddress: [::1]:55554 working: 0

[2:12.712] DON'T Send network change AS3 event
[2:12.712] Not shutdown and not using HTTP
[2:12.712] Not using RTMFP
[2:12.712] Not RTMFP, we have a remote address, local = 1
[2:12.712] Finished network changed handler
[2:42.752] Address changed (thread %p)
[2:42.752] NetConnection IO - NetworkChanged callback
[2:42.752] Checking remote address of our TCP socket...
[2:42.752] Got a remote address
[2:42.752] Set IP address: loopback 1, link local 0
[2:42.752] ********************* NETWORK CHANGED tSocketIO: 428faa00******************************

[2:42.752] MacNetworkDetector: checking if 0x7fe8428fabf8 is a working local interface
[2:42.752] IPv6 - checking address is set 1
[2:42.752] IPv6 - family is 30 (INET 2, INET6 30)
[2:42.752] IP string is [::1]
[2:42.752] Storage family is %d
[2:42.752] Comparing [fe80:0:0:0:c49:f376:b658:2c53] with [::1] (ipv6)
[2:42.752] Returning 0 for 'valid' from 'is working local interface'
[2:42.752] ** tcpAddress: [::1]:55554 working: 0

[2:42.752] DON'T Send network change AS3 event
[2:42.752] Not shutdown and not using HTTP
[2:42.752] Not using RTMFP
[2:42.752] Not RTMFP, we have a remote address, local = 1
[2:42.752] Finished network changed handler

@ajwfrost
Copy link
Collaborator

Thanks for those! It seems fairly certain that this code isn't working how we would want it to:

[2:12.712] MacNetworkDetector: checking if 0x7fe8428fabf8 is a working local interface
[2:12.712] IPv6 - family is 30 (INET 2, INET6 30)
[2:12.712] IP string is [::1]
[2:12.712] Comparing [fe80:0:0:0:c49:f376:b658:2c53] with [::1] (ipv6)
[2:12.712] Comparing [fd1f:c3e5:bb49:0:0:0:0:2] with [::1] (ipv6)
[2:12.712] Returning 0 for 'valid' from 'is working local interface'
[2:12.712] ** tcpAddress: [::1]:55554 working: 0

This is handling some change in the local network conditions and should be giving us the IP addresses of the available network interfaces, and then checking that the one we were using for our rtmp socket is still avaialble.

The problem seems to be that our existing socket has a local address of ::1.. although potentially, that is correct and it's the list that is missing this value, judging by the result of ifconfig on a mac here..

Anyway - let us check on this aspect, I think we should have enough details to find/fix it now! Many thanks for your help with the debugging :-)

@busitech
Copy link
Author

Do you need us to send logs for connections to external hosts as well? We are only debugging with the loopback interface because it is convenient, but do not want to limit the scope of the debugging to only loopback, which is not used in the real world...

@ajwfrost
Copy link
Collaborator

I think at the moment we're fine thanks - a quick test here shows that we appear to not pick up all of the inet6 addresses on macOS when we query the network adapters.. I suspect the problem is the same regardless of where the remote end of the socket is connected.

@ajwfrost
Copy link
Collaborator

Having said that of course .. would it be possible to get the results of the below command run on the client Mac machine please?

 ifconfig | grep inet6

It looks like this may come down to a 'feature' in AIR that we don't provide the IP details for link-local network addresses. So it may be that you have something like "::1%lo0" rather than just "::1" which might explain why ::1 wasn't in that list...

And when you talk about external hosts, are these external to your company, or would they still be local network addresses?

thanks

@busitech
Copy link
Author

Yes, we have the issue connecting outside our company to remote servers as well as those inside, and those on our local machines.

inet6 ::1 prefixlen 128 
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
inet6 fe80::c49:f376:b658:2c53%en0 prefixlen 64 secured scopeid 0x4 
inet6 fe80::aede:48ff:fe00:1122%en5 prefixlen 64 scopeid 0x5 
inet6 fe80::347d:daff:fed1:ca06%ap1 prefixlen 64 scopeid 0x8 
inet6 fe80::cf2:39da:75b9:1cb6%en1 prefixlen 64 secured scopeid 0x9 
inet6 fe80::7c45:23ff:fe2b:46a5%awdl0 prefixlen 64 scopeid 0xd 
inet6 fe80::7c45:23ff:fe2b:46a5%llw0 prefixlen 64 scopeid 0xe 
inet6 fe80::7b89:7c51:949f:dc2d%utun0 prefixlen 64 scopeid 0xf 
inet6 fe80::aeb8:aee2:d11f:4299%utun1 prefixlen 64 scopeid 0x10 
inet6 fe80::17d8:e68f:5fdd:3268%utun2 prefixlen 64 scopeid 0x11 
inet6 fe80::ce81:b1c:bd2c:69e%utun3 prefixlen 64 scopeid 0x12 

@ajwfrost
Copy link
Collaborator

Thanks for all this. So, we've found the problem, although I won't pretend that we understand why it's failing!

For this check following a network changed event, we are using a macOS method SCDynamicStoreKeyCreateNetworkServiceEntity with a regular expression which should return all of the addresses in the system ... but this is missing some. Looking at scutil this also misses out a number of them.. documentation here is a little sparse sadly.

So -> we can switch to using a lower-level method to pick up the network addresses, which seems to work in our test app (using the unix getifaddrs method). This lists all of the same network interfaces and addresses that we see via ifconfig, and for sure returns the ::1 address that was missing in your case which caused the disconnect.

Unfortunately not quite in time for the release which is about to come out, but we'll follow up soon with a macOS update I think as there is another mac-specific fix being investigated currently.. but once the code is updated, I can upload a fix candidate perhaps..

thanks

@busitech
Copy link
Author

Are you able to see a pathway by which the problem identified would affect connections to hosts that are not local? If not we will need to start looking for another issue which affects remote servers.

@ajwfrost
Copy link
Collaborator

Yes .. essentially, the code that's meant to work out whether the network interface being used for the RTMP connection is still available, uses a function to check what local interfaces are still present/available. It may work okay for IPv4 but for IPv6 for some reason the query is failing to retrieve the full set of interfaces.

So any connection, regardless of the actual remote/server location, is likely to be impacted by this. The initial connection works fine, it's just whenever macOS detects any network change, the AIR code then mistakenly thinks that the connection will be cut and so disconnects and reconnects.

thanks

@busitech
Copy link
Author

That is awesome, thank you very much.

@ajwfrost
Copy link
Collaborator

Hi

So a fix proposal is below..
https://transfer.harman.com/message/CTSuV9LUDdtXuPKBZCsQbg

This is still outputting a lot of logging for us, just to check! If you're able to capture the logs (actually a lot of NSLog calls too so if you are able to run your app from the command-line, these should come out..) - we're looking to see the "NETWORK CHANGED" trace come out, and it's then the new code handling around that which we need to exercise.

I'm hopeful that the connection will just continue and this error will then go away...

thanks

@busitech
Copy link
Author

In the latest version, the channel still becomes disconnected.

Starting mac network detector on thread 0x600000a840c0
NetConnectionIO -> connect localhost:2443, protocol 4
NetConnection IO - connecting with RTMPS, native 1 (1), try connect 0 (1)
TCP connecting via SSL -> 8
NetConnectionIO::DoConnect rtmps protocol, result 8, nagle 0, native SSL(1), connect succeeded.

NetConnection IO connected=1
Found a network interface: lo0 - address count 3
  IPv4 network address 127.0.0.1

  IPv6 network address ::1
  IPv6 network address fe80::1%lo0
Found a network interface: gif0 - address count 0
Found a network interface: stf0 - address count 0
Found a network interface: en0 - address count 2
  IPv6 network address fe80::4d3:8b8c:1f21:220a%en0
  IPv4 network address 192.168.101.38

Found a network interface: en5 - address count 1
  IPv6 network address fe80::aede:48ff:fe00:1122%en5
Found a network interface: ap1 - address count 0
Found a network interface: en1 - address count 0
Found a network interface: awdl0 - address count 0
Found a network interface: llw0 - address count 0
Found a network interface: en2 - address count 0
Found a network interface: en3 - address count 0
Found a network interface: bridge0 - address count 0
Found a network interface: utun0 - address count 1
  IPv6 network address fe80::c723:7c5a:d1d8:6f03%utun0
Found a network interface: utun1 - address count 1
  IPv6 network address fe80::98d2:3e3d:7d3b:ee6f%utun1
Found a network interface: utun2 - address count 1
  IPv6 network address fe80::cd1d:b678:8bf0:eeeb%utun2
Found a network interface: utun3 - address count 1
  IPv6 network address fe80::ce81:b1c:bd2c:69e%utun3
Found a network interface: utun4 - address count 2
  IPv6 network address fe80::3e7d:aff:fe09:8137%utun4
  IPv6 network address fd4d:d5f7:14c::2
Found a network interface: en4 - address count 0
Found a network interface: en6 - address count 2
  IPv6 network address fe80::1814:e1cb:5c13:93dd%en6
  IPv4 network address 169.254.109.31

Found a network interface: en7 - address count 1
  IPv6 network address fe80::985f:98ff:fe6b:3f1f%en7
Address changed (thread %p)
NetConnection IO - NetworkChanged callback
Checking remote address of our TCP socket...
Got a remote address
Set IP address: loopback 1, link local 0
********************* NETWORK CHANGED tSocketIO: 768d2400******************************

MacNetworkDetector: checking if 0x7fbe768d25f8 is a working local interface
IPv6 - checking address is set 1
IPv6 - family is 30 (INET 2, INET6 30)
IP string is [::1]
Storage family is 30
Comparing [fe80:0:0:0:4d3:8b8c:1f21:220a] with [::1] (ipv6)
Comparing [fe80:0:0:0:aede:48ff:fe00:1122] with [::1] (ipv6)
Comparing [fe80:0:0:0:c723:7c5a:d1d8:6f03] with [::1] (ipv6)
Comparing [fe80:0:0:0:98d2:3e3d:7d3b:ee6f] with [::1] (ipv6)
Comparing [fe80:0:0:0:cd1d:b678:8bf0:eeeb] with [::1] (ipv6)
Comparing [fe80:0:0:0:ce81:b1c:bd2c:69e] with [::1] (ipv6)
Comparing [fe80:0:0:0:3e7d:aff:fe09:8137] with [::1] (ipv6)
Comparing [fd4d:d5f7:14c:0:0:0:0:2] with [::1] (ipv6)
Returning 0 for 'valid' from 'is working local interface'
** tcpAddress: [::1]:50540 working: 0

Send network change AS3 event - updated to send
NetConnection IO - read thread is quitting
Finished network changed handler
NetConnectionIO -> connect localhost:2443, protocol 4
NetConnection IO - connecting with RTMPS, native 1 (1), try connect 0 (1)
TCP connecting via SSL -> 8
NetConnectionIO::DoConnect rtmps protocol, result 8, nagle 0, native SSL(1), connect succeeded.

NetConnection IO connected=1
Address changed (thread %p)
NetConnection IO - NetworkChanged callback
Checking remote address of our TCP socket...
Got a remote address
Set IP address: loopback 1, link local 0
********************* NETWORK CHANGED tSocketIO: 8b00a000******************************

MacNetworkDetector: checking if 0x7fbe8b00a1f8 is a working local interface
IPv6 - checking address is set 1
IPv6 - family is 30 (INET 2, INET6 30)
IP string is [::1]
Storage family is 30
Comparing [fe80:0:0:0:4d3:8b8c:1f21:220a] with [::1] (ipv6)
Comparing [fe80:0:0:0:aede:48ff:fe00:1122] with [::1] (ipv6)
Comparing [fe80:0:0:0:c723:7c5a:d1d8:6f03] with [::1] (ipv6)
Comparing [fe80:0:0:0:98d2:3e3d:7d3b:ee6f] with [::1] (ipv6)
Comparing [fe80:0:0:0:cd1d:b678:8bf0:eeeb] with [::1] (ipv6)
Comparing [fe80:0:0:0:ce81:b1c:bd2c:69e] with [::1] (ipv6)
Returning 0 for 'valid' from 'is working local interface'
** tcpAddress: [::1]:52312 working: 0

Send network change AS3 event - updated to send
NetConnection IO - read thread is quitting
Finished network changed handler
NetConnectionIO -> connect localhost:2443, protocol 4
NetConnection IO - connecting with RTMPS, native 1 (1), try connect 0 (1)
TCP connecting via SSL -> 8
NetConnectionIO::DoConnect rtmps protocol, result 8, nagle 0, native SSL(1), connect succeeded.

NetConnection IO connected=1
Address changed (thread %p)
NetConnection IO - NetworkChanged callback
Checking remote address of our TCP socket...
Got a remote address
Set IP address: loopback 1, link local 0
********************* NETWORK CHANGED tSocketIO: 8d00e800******************************

MacNetworkDetector: checking if 0x7fbe8d00e9f8 is a working local interface
IPv6 - checking address is set 1
IPv6 - family is 30 (INET 2, INET6 30)
IP string is [::1]
Storage family is 30
Comparing [fe80:0:0:0:4d3:8b8c:1f21:220a] with [::1] (ipv6)
Comparing [fe80:0:0:0:aede:48ff:fe00:1122] with [::1] (ipv6)
Comparing [fe80:0:0:0:c723:7c5a:d1d8:6f03] with [::1] (ipv6)
Comparing [fe80:0:0:0:98d2:3e3d:7d3b:ee6f] with [::1] (ipv6)
Comparing [fe80:0:0:0:cd1d:b678:8bf0:eeeb] with [::1] (ipv6)
Comparing [fe80:0:0:0:ce81:b1c:bd2c:69e] with [::1] (ipv6)
Comparing [fe80:0:0:0:3e7d:aff:fe09:8137] with [::1] (ipv6)
Comparing [fd5c:222a:123d:0:0:0:0:2] with [::1] (ipv6)
Returning 0 for 'valid' from 'is working local interface'
** tcpAddress: [::1]:52325 working: 0

Send network change AS3 event - updated to send
NetConnection IO - read thread is quitting
Finished network changed handler
NetConnectionIO -> connect localhost:2443, protocol 4
NetConnection IO - connecting with RTMPS, native 1 (1), try connect 0 (1)
TCP connecting via SSL -> 8
NetConnectionIO::DoConnect rtmps protocol, result 8, nagle 0, native SSL(1), connect succeeded.

NetConnection IO connected=1
Address changed (thread %p)
NetConnection IO - NetworkChanged callback
Checking remote address of our TCP socket...
Got a remote address
Set IP address: loopback 1, link local 0
********************* NETWORK CHANGED tSocketIO: 9519ca00******************************

MacNetworkDetector: checking if 0x7fbe9519cbf8 is a working local interface
IPv6 - checking address is set 1
IPv6 - family is 30 (INET 2, INET6 30)
IP string is [::1]
Storage family is 30
Comparing [fe80:0:0:0:4d3:8b8c:1f21:220a] with [::1] (ipv6)
Comparing [fe80:0:0:0:aede:48ff:fe00:1122] with [::1] (ipv6)
Comparing [fe80:0:0:0:c723:7c5a:d1d8:6f03] with [::1] (ipv6)
Comparing [fe80:0:0:0:98d2:3e3d:7d3b:ee6f] with [::1] (ipv6)
Comparing [fe80:0:0:0:cd1d:b678:8bf0:eeeb] with [::1] (ipv6)
Comparing [fe80:0:0:0:ce81:b1c:bd2c:69e] with [::1] (ipv6)
Returning 0 for 'valid' from 'is working local interface'
** tcpAddress: [::1]:52346 working: 0

Send network change AS3 event - updated to send
NetConnection IO - read thread is quitting
Finished network changed handler
NetConnectionIO -> connect localhost:2443, protocol 4
NetConnection IO - connecting with RTMPS, native 1 (1), try connect 0 (1)
TCP connecting via SSL -> 8
NetConnectionIO::DoConnect rtmps protocol, result 8, nagle 0, native SSL(1), connect succeeded.

NetConnection IO connected=1

@ajwfrost
Copy link
Collaborator

Hmm... so that looks like it's hitting our new code, although the NSLog output isn't coming out there. The only thing from looking through it is to see whether the ::1 network adapter has gone down .. is that a possibility? would you be able to put a monitor onto the local network interface to see whether it drops? Not sure if you open Settings -> Network whether you can just check whether the green light periodically flickers, or there are suggestions in the below link:
https://superuser.com/questions/203614/how-to-monitor-internet-connection-for-interruptions-for-mac-os-x

To check the actual network interfaces that you have, can you please try running the below command-line tool? Firstly just whenever you're about to start the AIR app, and then perhaps just after it's failed? If you run this from a terminal window, you should see the log listing all the interfaces; if "::1" isn't in the final list at the end, then please check the earlier output log to see what it says re the "lo0" interface..
nwtest.zip

thanks

@busitech
Copy link
Author

Your utility was not built for our Intel CPUs, so it does not run here.

We have seen dozens of disconnects while simultaneously sending ICMP packets to the localhost interface every 500ms with zero packets lost. Granted, 500ms is "forever" from the perspective of an operating system kernel, but at least we have proven that nothing humanly perceptible is happening on the interface.

@ajwfrost
Copy link
Collaborator

Hi

Apologies, I hadn't spotted that I needed to switch the build type there...

But I think that actually, this may not have much value, given you're seeing the interface when you run the 'ifconfig' command, and with your above testing which shows that it must be a very transient issue, if this is the case that the status switched briefly.

Probably what we need to do is:

  • firstly, update the logging within this 'network changed' handler, so that we have the full set of details for all interfaces and know why it is that ::1 isn't being listed
  • and secondly, see whether we can build in some resiliency to see if that network comes back after a few ms, as we could then (hopefully) still have the connection in place. Or to start -> if we ignore this "network changed" event, it would be interesting to know whether the behaviour at the application level then corrects itself!

Will try to get a build out to you for checking this within the next day or two..

thanks

@busitech
Copy link
Author

We have started doing tests that do not involve a local interface, and instead connects to a remote server via the en0 interface. We saw a disconnect this evening while the same client computer running the AIR app was also connected to the very same server via ssh, which did not disconnect.

We were also sending ICMP packets every 400ms (reduced from 500ms) with zero packets lost across the disconnect of AIR. Below is the log. There were no address change messages captured in this log. We did see the NetConnection IO - read thread is quitting message.

Therefore, we have learned via this test that disconnects do occur even when they are not triggered by address change events.

Starting mac network detector on thread 0x600000530080
NetConnectionIO -> connect host.masked.com:2443, protocol 4
NetConnection IO - connecting with RTMPS, native 1 (1), try connect 0 (1)
TCP connecting via SSL -> 8
NetConnectionIO::DoConnect rtmps protocol, result 8, nagle 0, native SSL(1), connect succeeded.
NetConnection IO connected=1
Found a network interface: lo0 - address count 3
  IPv4 network address 127.0.0.1
  IPv6 network address ::1
  IPv6 network address fe80::1%lo0
Found a network interface: gif0 - address count 0
Found a network interface: stf0 - address count 0
Found a network interface: en0 - address count 2
  IPv6 network address fe80::4d3:8b8c:1f21:220a%en0
  IPv4 network address 192.168.101.38
Found a network interface: en5 - address count 1
  IPv6 network address fe80::aede:48ff:fe00:1122%en5
Found a network interface: ap1 - address count 0
Found a network interface: en1 - address count 0
Found a network interface: awdl0 - address count 0
Found a network interface: llw0 - address count 0
Found a network interface: en2 - address count 0
Found a network interface: en3 - address count 0
Found a network interface: bridge0 - address count 0
Found a network interface: utun0 - address count 1
  IPv6 network address fe80::c723:7c5a:d1d8:6f03%utun0
Found a network interface: utun1 - address count 1
  IPv6 network address fe80::98d2:3e3d:7d3b:ee6f%utun1
Found a network interface: utun2 - address count 1
  IPv6 network address fe80::cd1d:b678:8bf0:eeeb%utun2
Found a network interface: utun3 - address count 1
  IPv6 network address fe80::ce81:b1c:bd2c:69e%utun3
Found a network interface: utun4 - address count 2
  IPv6 network address fe80::3e7d:aff:fe09:8137%utun4
  IPv6 network address fd59:c60b:9c12::2
Found a network interface: en4 - address count 0
Found a network interface: en6 - address count 2
  IPv6 network address fe80::1814:e1cb:5c13:93dd%en6
  IPv4 network address 169.254.203.231
Found a network interface: en7 - address count 1
  IPv6 network address fe80::985f:98ff:fe6b:3f1f%en7
NetConnection IO - read thread is quitting
Channel disconnected before an acknowledgement was received
NetConnectionIO -> connect host.masked.com:2443, protocol 4
NetConnection IO - connecting with RTMPS, native 1 (1), try connect 0 (1)
TCP connecting via SSL -> 8
NetConnectionIO::DoConnect rtmps protocol, result 8, nagle 0, native SSL(1), connect succeeded.
NetConnection IO connected=1
NetConnectionIO -> connect host.masked.com:2443, protocol 4
NetConnection IO - connecting with RTMPS, native 1 (1), try connect 0 (1)
TCP connecting via SSL -> 8
NetConnectionIO::DoConnect rtmps protocol, result 8, nagle 0, native SSL(1), connect succeeded.
NetConnection IO connected=1
Found a network interface: lo0 - address count 3
  IPv4 network address 127.0.0.1
  IPv6 network address ::1
  IPv6 network address fe80::1%lo0
Found a network interface: gif0 - address count 0
Found a network interface: stf0 - address count 0
Found a network interface: en0 - address count 2
  IPv6 network address fe80::4d3:8b8c:1f21:220a%en0
  IPv4 network address 192.168.101.38
Found a network interface: en5 - address count 1
  IPv6 network address fe80::aede:48ff:fe00:1122%en5
Found a network interface: ap1 - address count 0
Found a network interface: en1 - address count 0
Found a network interface: awdl0 - address count 0
Found a network interface: llw0 - address count 0
Found a network interface: en2 - address count 0
Found a network interface: en3 - address count 0
Found a network interface: bridge0 - address count 0
Found a network interface: utun0 - address count 1
  IPv6 network address fe80::c723:7c5a:d1d8:6f03%utun0
Found a network interface: utun1 - address count 1
  IPv6 network address fe80::98d2:3e3d:7d3b:ee6f%utun1
Found a network interface: utun2 - address count 1
  IPv6 network address fe80::cd1d:b678:8bf0:eeeb%utun2
Found a network interface: utun3 - address count 1
  IPv6 network address fe80::ce81:b1c:bd2c:69e%utun3
Found a network interface: utun4 - address count 2
  IPv6 network address fe80::3e7d:aff:fe09:8137%utun4
  IPv6 network address fd59:c60b:9c12::2
Found a network interface: en4 - address count 0
Found a network interface: en6 - address count 2
  IPv6 network address fe80::1814:e1cb:5c13:93dd%en6
  IPv4 network address 169.254.203.231
Found a network interface: en7 - address count 1
  IPv6 network address fe80::985f:98ff:fe6b:3f1f%en7

@busitech
Copy link
Author

Andrew, have you been able to make a build which can process the "network changed" events appropriately?

Since we last spoke, our research has discovered some very helpful information.

A recent update to macOS has introduced a new architecture for communicating with other Apple devices. macOS is now constantly creating and tearing down utun (user space network tunnel) interfaces via background processes. Some of these processes are named remotepairingd and remoted. The setup and tear down of these utun interfaces cause many new "network changed" events to be sent to all applications monitoring the status of interfaces. The macOS console logs reveal copious amounts of details surrounding these events.

We really need an AIR Runtime Framework that handles these events correctly. There are times when the application can not stay open for more than a minute or two without becoming disconnected.

@busitech
Copy link
Author

Hello @ajwfrost,

Please let us know how we can proceed and what to expect. Even a temporary workaround that disables the network change handler would help.

Thank you.

@ajwfrost
Copy link
Collaborator

Hi @busitech - thanks for the information above about the interface changes, we've not yet had a chance to look into this.. the expectation though (I would have hoped!) is that the functions we use to get the current information about available interfaces would still show the one that's being used for your RTMPS connection, so we shouldn't then disconnect...

So I'm slightly concerned that we may be trying to fix something that then turns out not to be the underlying problem!

But, your suggestion of a temporary workaround to disable the network change handler is a good one, and fairly easily implemented. I'll try to get a test runtime version created for this and upload it here...

thanks

@busitech
Copy link
Author

This issue has been open for 10 months. If the resources are not available to finish the work, please open the source code so we may maintain it. The disconnects every 60 seconds are unbearable.

@ajwfrost
Copy link
Collaborator

Hi - sadly we're not able to open up the source code, it's not really ours and the license terms are fairly strict!

But I had meant to ask, could you please check whether the AIR 51 release is more stable if you set the NetworkInfo.disableNetworkChanges value to true during start-up?

We've got a documentation update pending, as this was a last-minute addition, but you should just be able to do:

NetworkInfo.networkInfo.disableNetworkChanges = true;

and then we won't start interrupting the sockets with all the events that are happening on macOS.

Whether this is then going to help or not, depends on whether those network interfaces are actually being interrupted by the OS. It's still not clear to us why the machines are not reporting any 'localhost' availability when this is happening for you, we can get the network changed event (by starting/stopping VPN) but always see the localhost address available...

thanks

Andrew

@busitech
Copy link
Author

Okay, we'll try the new setting, and provide you with the results.

On November 29th, you had suggested the possibility of separating the RTMP feature out into an open source library, and I have been wondering if you ever spoke to Adobe about this.

I highly doubt there are any interruptions at the OS level. I am able to keep connections open for weeks using other protocols and applications.

Avoid focusing too much attention on localhost, because the same issue happens on every network interface we connect out with.

@busitech
Copy link
Author

busitech commented Mar 5, 2024

Turning on disableNetworkChanges has circumvented the disconnect problem. We opened a connection on Friday, and on Monday it was still connected and usable.

Have you updated your development environment to the latest Sonoma release to make sure you have the same operating system we do for testing this issue? This should be giving you very frequent disconnects without having to cycle a VPN connection.

@ajwfrost
Copy link
Collaborator

ajwfrost commented Mar 5, 2024

That's good to hear! I will double-check, I thought our systems were up to date. But we weren't seeing that same issue with the disconnections. We're just using wifi, in case you're using an ethernet cable for this, maybe that makes a difference?

thanks

@busitech
Copy link
Author

busitech commented Apr 9, 2024

@ajwfrost When will a permanent fix for this issue be seen the AIR runtime? The permanent fix should be one that does not require special options to be set in our source code, or a special build of the AIR application to be released.

We are expecting that end users would simply be able to update their AIR runtime in order to access this fix for macOS, while using presently deployed versions of our AIR applications.

We do not intend to make any permanent changes to our development environment or to have to recompile our various AIR applications in order for users to obtain a stable environment for their existing AIR applications in later macOS releases.

@ajwfrost
Copy link
Collaborator

Hi

** probably you can skip all this text and just jump to the 'update' section below.. but I'd written it and don't really want to delete it... **

The issue for us is that we don't see the same behaviour as your systems are showing. You mentioned earlier:

A recent update to macOS has introduced a new architecture for communicating with other Apple devices. macOS is now constantly creating and tearing down utun (user space network tunnel) interfaces via background processes. Some of these processes are named remotepairingd and remoted. The setup and tear down of these utun interfaces cause many new "network changed" events to be sent to all applications monitoring the status of interfaces. The macOS console logs reveal copious amounts of details surrounding these events.

We don't see this though, we are not getting any spurious "network changed" events. We've checked on two different macs which have the latest software on them; the only things I am wondering that may be different are (a) are you using ethernet cables? as we are only on wifi; and (b) we are using arm64-based devices rather than intel..

But, it was also clear from the earlier testing/profiling that when your systems are sending this "network changed" event, the list of available network interfaces did not include the one that was being used for the RTMP connection, which is why there was then a manual tear-down of the socket.

I don't really like changing the existing behaviour of the runtime without having some opt-in or opt-out type mechanism, but we could look perhaps at the RTMP code ignoring these network changes and just relying on the underlying OS socket implementation to let us know if there's an issue. Based on your findings with that new AS3 flag, this is all that would be needed. The risk of course is that someone else might be relying on the early detection of a changed network in order to rapidly respond/reconnect. So it's really just a case of two options:

  1. Opt-in - like we have now, you have to make a change in the code in order to have the RTMP handling ignore these spurious network adaptor changes
  2. Opt-out - which is what you're after here, i.e. we make the change for everyone but provide a way for folk who needed it to change their app so they can get back to the original functionality.

Of course, it would be much nicer if the OS didn't send these notifications or didn't miss out the required network information when we queried it, and if we had a way to reproduce it here then we would be more likely to see if there's a better workaround we could do (such as perhaps delaying a few milliseconds before checking the list of network interfaces - since from what you say, and the behaviour of other apps, this is just a glitch that shouldn't actually impact a socket connection).

Update

Just doing a little more searching to see if we could find any reasons behind these messages, and we found that the same problem had been caused in Chrome. After a lot of analysis, they had essentially added an "ignore network change notifications" but in a limited/targeted way. So, we should be able to do the same...
https://issues.chromium.org/issues/40939844#comment58

Seems like the issue causing these notifications is related to the wider use of Apple connected devices on the local network, hence us not reproducing it here in the dev environment.

Let me get someone to inspect the changes that the Chromium team did, and try to do the same thing as default in AIR...

thanks

@busitech
Copy link
Author

@ajwfrost It is incorrect to assume that macOS is broken, and that no sense can be made of the events you receive. These two assumptions are leading you to the false dichotomy of ignore vs. do not ignore. The "ignore" concept leads to a serious design flaw, the one implemented by Chromium.

My belief is that after working with Apple to understand their changes, such as by opening a case with Apple Developer Support and discussing the APIs you're using to monitor the interfaces, and the events received as a result of the utun interface changes, you will be able to achieve the original functionality without ignoring events, while fixing the bug we have reported at the same time.

The bug is tearing down perfectly good connections.
The trigger is a network change event for an interface you're not using.

The most important next step on your end is to do whatever it takes to reproduce the issue, and capture full details in your debugger. I would imagine that the interface being changed is part of the event.

All we have in the office for Apple products are two iPhones on WiFi and two Intel iMacs on Ethernet.

If in the end, you believe that Apple has introduced a bug in the APIs you are using, please, for the sake of all mankind, report the issue so they can fix it! : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants