Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sockets on Windows simulator do not close #1058

Closed
jethrowharton opened this issue Mar 7, 2023 · 11 comments
Closed

Sockets on Windows simulator do not close #1058

jethrowharton opened this issue Mar 7, 2023 · 11 comments

Comments

@jethrowharton
Copy link
Contributor

jethrowharton commented Mar 7, 2023

Build environment: Windows 10
Target device: desktop simulator

Description
Sockets do not close on Windows xsbug simulator, causing sockets to increase without resolving. Only the first request is successful, all other requests are blocked.

Steps to Reproduce

  1. Modify epaper-mini-travel-time manifest with API key and locations, and change certificate to ca233 (Can also modify main.js to repeat more frequently for easier testing)
  2. Build and install epaper-mini-travel-time using this build command: mcconfig -d -m -p win
  3. See sockets increase without resolving

Expected behavior
Socket should close and new request should complete successfully.

Other information
We checked the Windows socket module (modSocket.c) and we added traces to the file, and confirmed that FD_CLOSE was never called. However, the sockets work as intended on the Mac simulator.

@phoddie
Copy link
Collaborator

phoddie commented Mar 7, 2023

Thank you for the detailed report.

It could be a few days before we can look into this in detail. However, in reviewing the implementation of modSocket.c for Windows, it looks like the callback ignores all message when an error is set. That seems to have the potential to suppress the delivery of FD_CLOSE. For example, if the remote end-point closed the connection, the error could be set to WSAECONNRESET. Since you are set-up to reproduce this problem, may be you could try commenting out that check to see if it allows FD_CLOSE to be received.

In addition, it looks like that same check would prevent notifications of failed connections. For example, when connect() fails on a non-blocking socket, the documentation notes that FD_CONNECT is sent with an error code.

it isn't immediately obvious why all requests but the first fail. But, let's sort of the FD_CLOSE notification and see from there.

@jethrowharton
Copy link
Contributor Author

Thank you for your response.

I have tested through xsTrace that FD_CLOSE does not occur normally and only works if either doDestructor or socketDownUseCount are commented out. However, even with FD_CLOSE working, this did not solve the issue. I understand that commenting the functions was only for debugging purposes and not a real solution.

@phoddie
Copy link
Collaborator

phoddie commented Mar 9, 2023

Thanks for the update. We'll take a deeper look.

@jethrowharton
Copy link
Contributor Author

I have been testing through trial-and-error and across different versions, and I found that version OS221101 works with sockets on Windows. I also found that version 3.5.0 does not work. There are many differences across the two versions, and I have not been able to pinpoint the exact issue yet. Another interesting discovery is that on the latest version 3.7.6 when I replace the Timer loop with calling the network function several times in succession, the sockets close as expected.

@andycarle
Copy link
Member

Thank you for the exhaustive investigation here--I really appreciate it. I'll take those notes and get back to this today to see if we can close this out.

Thanks!

@andycarle
Copy link
Member

@jethrowharton Thank you for your patience on this. I have a draft fix for this problem that is working well in my initial tests. I'm hoping to get some review of the changes by the rest of the team and then I'll get a final fix out in our next update.

@jethrowharton
Copy link
Contributor Author

Thank you for letting me know. I'm excited for the fix.

mkellner pushed a commit that referenced this issue Mar 24, 2023
mkellner pushed a commit that referenced this issue Mar 24, 2023
…ts 8e571e and replaces it with a new implementation)
@andycarle
Copy link
Member

@jethrowharton This fix is now live. Please update from GitHub and give it a try. Let me know how it goes.

As Peter rightly noted, there are also some issues with error handling in the Windows socket code. I don't believe that's directly related to what you were seeing here, but it is definitely in need of a fix. I'll have those in shortly.

@jethrowharton
Copy link
Contributor Author

I have tested and it works properly. Thank you for the quick fix!

Also, now I am able to use parallel_build again :)

@andycarle
Copy link
Member

Great! I'm glad to hear that everything is looking good there. The other topic discussed in this thread was addressed with 4188333.

If everything is looking good, shall we close this out?

@phoddie
Copy link
Collaborator

phoddie commented Apr 26, 2023

Closing.

@phoddie phoddie closed this as completed Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants