Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket: Abort pending operations on shutdown #8758

Merged
merged 1 commit into from Nov 2, 2020

Conversation

sepalani
Copy link
Contributor

@sepalani sepalani commented Apr 22, 2020

This PR will need a hardware test (similar #8389) to confirm this behaviour.

PR goal

This PR tries to fix inconsistencies regarding shutdown implementation (on a socket) and be as close as possible to what the Wii does (not sure about the GC behaviour).

What it fixes

It fixes crashes/hangs in some Just Dance games : https://dolp.in/i11910

What it doesn't fix

Hardware test

I created 2 hardware tests:

  1. one that fires shutdown on blocking operations
  2. the second one that checks return values of network functions after shutdown was called

How-to run

  1. Get the latest version of Dolphin (at least after IP/Top: Add SO_LINGER optname #9099)
  2. Enable Dolphin debugging interface under Options > Config > Interface then tick Show Debugging UI
  3. Enable the Dolphin Log window
  4. In the Log configuration tab
    • Tick "Write to File" in Logger Outputs
    • Tick "OSREPORT" and "HLE"
    • Set the verbosity level to Info or Debug
  5. Make sure you run the ELF (not the DOL) hardware tests
    (On top of that, you can also grab the result using netcat on your local IP address and the result port displayed at the end of the test)

Hardware tests results

Dolphin results are based on this build #8796 latest master and #9191 for Android.

Results of the first hwtest

Results of the second hwtest

Analysis

Result value

  • NO STATE: Failed to reach wanted state
  • TIMEOUT or --: State reached but last task didn't finish
  • DEADLOCK: Same as timeout but cleanup didn't interrupt the ioctl call (i.e. impossible to join/stop thread)
  • shutdown / callback: return values (based on devkitPPC errno.h)

Code changes

The code was changed based on the wanted Wii results, my reasoning was done as follow:

  • The Wii does nothing and returns 0 for IP_PROTO_UDP
    • based on IOS reverse-engineering
  • Adjust pending operations
    • IOCTL_SO_ACCEPT:
      • hwtest-1: B/BIND/LISTEN/ACCEPT
      • #define EINVAL 22 /* Invalid argument */
    • IOCTL_SO_CONNECT:
      • hwtest-1: B/CONNECT
      • #define ENETUNREACH 114 /* Network is unreachable */
    • IOCTLV_SO_RECVFROM:
      • hwtest-1: B/CONNECT/RECV, B/BIND/LISTEN/ACCEPT'D/RECV
      • hwtest-2: B/BIND/LISTEN/ACCEPT'D/SHUT/RECV, B/CONNECT/SHUT/RECV
      • #define ENOTCONN 128 /* Socket is not connected */
    • IOCTLV_SO_SENDTO:
      • hwtest-2: B/BIND/LISTEN/ACCEPT'D/SHUT/SEND, B/CONNECT/SHUT/SEND
      • #define ENOTCONN 128 /* Socket is not connected */

Ready to be reviewed & merged.

@sepalani
Copy link
Contributor Author

I updated the error code based on this hardware test (still based on a connect call to a ubisoft server).

wii-soconnect-shutdown.zip
wii-soconnect-shutdown

Some important points to mention:

  • I haven't handled the how parameter (i.e. might cause regression)
  • I need to write a test suite for all network calls to make sure, the error codes are correct and to know which calls are affected by the read and/or write end being closed by the shutdown call

@sepalani
Copy link
Contributor Author

sepalani commented May 8, 2020

Very early WIP hwtest, I'll post a draft PR on the hwtest repository soon:
wii-shutdown.zip

Here are the results, I'll adjust this PR accordingly:
https://gist.github.com/sepalani/e0ef0b0c3b39ae4523c4f007fa23916d

Regarding the result value:

  • NO STATE: Failed to reach wanted state
  • TIMEOUT: State reached but last task didn't finish
  • DEADLOCK: Same as timeout but cleanup didn't interrupt the ioctl call
  • shutdown / callback: return values

@sepalani
Copy link
Contributor Author

sepalani commented May 10, 2020

Test 1/2

I did some fine-tuning on the previous hwtest to fix some inconsistent behaviours:

  • I first added sleep before firing net_shutdown, now it matches the 1st hwtest I did (so-connect-shutdown) and doesn't timeout
  • I used www.example.com:80 as the TCP connection lives longer than on dns.google:443
  • I tidied the code logic a bit
    wii-shutdown-v2.zip

The results:

Test 2/2

The other part of the test where I check return values of network functions after a shutdown call.
wii-shutdown-v2p2.zip

The results:
https://gist.github.com/sepalani/ccf2bc4d4104e5f68ac18ba2287cdd07

@sepalani sepalani force-pushed the so-shutdown branch 2 times, most recently from dd2947f to ff2915f Compare May 10, 2020 16:31
@sepalani
Copy link
Contributor Author

sepalani commented May 10, 2020

@leoetlino
PR rebased, depends on #8796 and takes into account the how parameters.

This PR fixes Just Dance games: https://bugs.dolphin-emu.org/issues/11910

This PR should be thoroughly tested before being merged to be sure there is no network regression.

@sepalani
Copy link
Contributor Author

I added hwtests results for the ASUS RoG Phone II and Linux (Parrot).

@sepalani
Copy link
Contributor Author

sepalani commented Jun 2, 2020

I moved up the WiiSockMan::GetNetErrorCode call to be executed right after the shutdown call.

Still ready to be reviewed.

@sepalani
Copy link
Contributor Author

Rebased on latest master. I tested it on some online games such as Super Smash Bros. Brawl, Mario Strikers Charged Football, DJHero and some Just Dance games and I didn't find any issue.

Ready to be reviewed & merged. Depends on #8796.

@sepalani
Copy link
Contributor Author

sepalani commented Sep 1, 2020

Still ready to be reviewed & merged. This PR fixes https://bugs.dolphin-emu.org/issues/11910

As a side note, Nintendo have changed their server infrastructure and previously working games don't work anymore and that isn't related to this PR.

Many Wii games send duplicate HTTP Host headers (that's a well-known issue). But the current servers are stricter and don't accept them anymore resulting in Error Code 23400 (aka "HTTP 400 - Bad request").

@MayImilae
Copy link
Contributor

So I have a mac so I can do the macOS testing for you. Can you walk me through it?

@sepalani
Copy link
Contributor Author

@MayImilae Sure. You'll need to grab this build of dolphin and the first and second hardware test.

Then, make sure your computer has Internet and run the ELF files with the build of Dolphin you downloaded. There are two methods to grab the result. You can pick one of them or use them both.

Method 1

You can grab the result at the end of the test by connecting to your local IP address and the prompted port (it should be port 16784). I usually use the netcat command to grab the result and the tee command to redirect to a file.

# My local IP address was 192.168.1.7
# My output file was dolphin_hwtest_shutdown_v2rogII.txt
nc 192.168.1.7 16784 | tee dolphin_hwtest_shutdown_v2rogII.txt

Method 2

With this method the output of the test should appear in Dolphin log file:

  1. Enable the Dolphin Log window
  2. In the Log configuration tab
    • Tick "Write to File" in Logger Outputs
    • Tick "OSREPORT" and "HLE"
  3. Make sure you run the ELF (not the DOL) hardware tests
    .

These tests can take quite some time to run, by the way.

@MayImilae
Copy link
Contributor

MayImilae commented Sep 13, 2020

That PR build you pointed to has been merged, so... I'll just use latest master (5.0-12595). Also I'll just use Dolphin's logs since I'm familiar with them.

System specs:
MacBook Pro 13in mid-2018 with Touch Bar
macOS 10.15.4
Core i7-8559U @ 4.5ghz
Intel Iris Plus Graphics 655
16GB 2133MHz LPDDR3

Ok so, log settings are verbosity Info, write to file, and OSReport and OSHLE.

Dolphin settings are more or less normal. Things of note are OpenGL, 1x native, Scaled EFB Copy ON, Skip Presenting Duplicate Frames OFF.


Ok so, first test (wii-shutdown-v2) completed successfully.

Screen Shot 2020-09-12 at 22 21 21

Logs:

02:04:881 Core/HLE/HLE.cpp:124 I[HLE]: Patching printf 8002b310
02:04:881 Core/HLE/HLE.cpp:124 I[HLE]: Patching puts 8002b4e8


The second test (wii-shutdown-v2p2) did not complete. I tried it 4 times and it crashed each time, always late in the run. Verbosity info was exactly the same as the above v2 run that succeeded, so I enabled logging everything and let it run. This is the last line in that log.

51:09:779 Core/IOS/Device.cpp:101 I[IOS_WC24]: /dev/net/ip/top (fd 3) - IOCtl 0xe (in_size=0x8, out_size=0x0)

Nothing before that is unusual. Here's the full log.

wii-shutdown-v2p2.log.zip


So um, yea. Let me know if you want me to try anything.

@JMC47
Copy link
Contributor

JMC47 commented Sep 13, 2020

9/10 is an A-, so I think it's good to go.

@sepalani
Copy link
Contributor Author

@MayImilae My bad, it seems the OSREPORT log is missing. You'll need to enable Dolphin debugging interface under Options > Config > Interface then tick Show Debugging UI. The text you see on screen should now appear under the OSREPORT log.

I'll try on my end the second test to see if something has changed or if it's related to the Dolphin build. It can also be an OSX specific issue.

@MayImilae
Copy link
Contributor

Ok! Here is some proper logs, this time dumped from terminal since the crashed run doesn't dump any logs.

wii-shutdown-v2 (successful): https://hastebin.com/eketabapac

wii-shutdown-v2p2 (crashed): https://hastebin.com/zaraganudo

@sepalani
Copy link
Contributor Author

@MayImilae Thank you for the log and the files, there were quite useful.

After digging into the log, this is what I found so far:

  • There are some OSX specific errors which shouldn't matter for the test
    • SO_SETSOCKOPT failed with error 22: Invalid argument with SO_LINGER on some sockets (seems related)
    • SO_SETSOCKOPT failed with error 55: No buffer space available with SO_RCVBUF on all sockets from net_socket
  • Dolphin can't find the interface's IP (and that matters for the test as it prevents the use of bind)
    • Android is also impacted
    • Dolphin defaults to IP 10.0.1.30 (why?)
  • The GetSystemDefaultInterface function in Dolphin is mixing up endianness

I'll try to see if I can fix the GetSystemDefaultInterface function and eventually add Android support. However, I don't know if I'll be able to add macOS support since I don't have one to test. Unfortunately, I wasn't able to identify the reason for the crash yet.

@MayImilae
Copy link
Contributor

Let me know if there is anything else you'd like me to try on my mac! If I don't catch you posting here I'm always around on Dolphin's IRC.

@OatmealDome
Copy link
Member

@sepalani I tested this PR on macOS 11 Big Sur.

The debugger tells me that the process exited with signal 13 (SIGPIPE). Apple's documentation (under "Use POSIX sockets efficiently") says a SIGPIPE will be issued if the connection is closed by default, and the application will be quit if it isn't handled or ignored.

I don't really have any experience in POSIX sockets, so sorry if this is wrong or not of any help.

@JMC47
Copy link
Contributor

JMC47 commented Oct 9, 2020

Does this need to be rebased so that the crash fix is in? Would like to get this finally sorted out.

@sepalani
Copy link
Contributor Author

sepalani commented Oct 9, 2020

@JMC47 I should but it won't fix the default network interface not being detected properly on macOS and Android. I'd like to sort these two things out, otherwise, these tests are meaningless for those platforms.

@sepalani
Copy link
Contributor Author

I updated the Android results based on this PR: #9191

@JMC47
Copy link
Contributor

JMC47 commented Oct 24, 2020

So the only thing left is figuring out macOS now?

@sepalani
Copy link
Contributor Author

@JMC47 Yes, indeed.

I suspect there is something odd in the Linux part of the code. Part of the hardware test for both Linux and Mac shows a NO STATE result. It mustn't happen in the shutdown(NONE) (i.e. I don't call shutdown) part of the test. That's where I perform simple network operations which shouldn't fail.

@sepalani
Copy link
Contributor Author

@JMC47
I found the issue for Linux and I fixed it without noticing in #9099. The SO_LINGER constant was defined with a different value, that's probably the issue on Mac too. Both Android and Linux results are now on par and I expect the same for macOS.

@OatmealDome @MayImilae
Could you please retry the two hardware tests on macOS with the latest version of Dolphin?

You'll need to:

  1. Get the latest version of Dolphin (at least after IP/Top: Add SO_LINGER optname #9099)
  2. Enable Dolphin debugging interface under Options > Config > Interface then tick Show Debugging UI
  3. Enable the Dolphin Log window
  4. In the Log configuration tab
    • Tick "Write to File" in Logger Outputs
    • Tick "OSREPORT" and "HLE"
    • Set the verbosity level to Info or Debug
  5. Make sure you run the ELF (not the DOL) hardware tests

@OatmealDome
Copy link
Member

@sepalani I ran both tests on macOS 11 Big Sur beta 10, Dolphin 5.0-12946.

first

second

@sepalani
Copy link
Contributor Author

sepalani commented Nov 1, 2020

@OatmealDome
Thanks a lot. Sadly, it seems there are still bind issues on macOS. At least we have complete tests without a crash.

@leoetlino @JMC47
I updated the PR description with all the results. Feel free to merge this PR in its current states since it fixes the shutdown behaviour but it won't fix other network related issues on Android and macOS.

@leoetlino leoetlino merged commit 5b9cd83 into dolphin-emu:master Nov 2, 2020
10 checks passed
@sepalani sepalani deleted the so-shutdown branch November 7, 2020 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants