Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows] Use fd_set according to Winsock2 specifics #4954

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lxbndr
Copy link
Contributor

@lxbndr lxbndr commented May 12, 2024

Fixes apple/swift#73532.

On Windows, socket handles in a fd_set are not represented as bit flags as in Berkeley sockets. While we have no fd_set dynamic growth in this implementation, the FD_SETSIZE defined as 1024 in CoreFoundation_Prefix.h should be enough for majority of tasks.

This also includes a test for SocketPort which uses CFSocket internally. This test fails (hangs or crashes with memory corruption error) without the fix.

@lxbndr
Copy link
Contributor Author

lxbndr commented May 12, 2024

@hjyamauchi this patch has minor differences from that version you checked. Just to be sure, could you please validate original issue with this implementation?

@parkera parkera requested a review from compnerd May 13, 2024 17:32
@parkera
Copy link
Member

parkera commented May 13, 2024

@swift-ci test

Fixes apple/swift#73532.

On Windows, socket handles in a `fd_set` are not represented as
bit flags as in Berkeley sockets. While we have no `fd_set` dynamic
growth in this implementation, the `FD_SETSIZE` defined as 1024
in `CoreFoundation_Prefix.h` should be enough for majority of tasks.
@lxbndr
Copy link
Contributor Author

lxbndr commented May 13, 2024

Force-pushed NFC (removed unused variable x in the for-loop in the test code).

@compnerd
Copy link
Collaborator

@swift-ci please test

@hjyamauchi
Copy link
Contributor

@hjyamauchi this patch has minor differences from that version you checked. Just to be sure, could you please validate original issue with this implementation?

Validated. It worked fine.

@hjyamauchi
Copy link
Contributor

CC @tristanlabelle

if (CFDataGetLength(fdSet) == 0) {
return 0;
}
return FD_SETSIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the structure on Windows is that it has a default size but it's legit to allocate a bigger buffer and treat the array as bigger than FD_SETSIZE. I think a better implementation would calculate the number of entries from CFDataGetLength after removing the size of the count field. This would also avoid the implicit dependency where this code is assuming that the data length is at least that of fd_set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I think I will update the patch and exclude this function from Win build at all. It is not used in any of Windows code paths except one place where its result is ignored anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved per convo about large FD_SETSIZE override

}

CF_INLINE Boolean __CFSocketFdSet(CFSocketNativeHandle sock, CFMutableDataRef fdSet) {
/* returns true if a change occurred, false otherwise */
Boolean retval = false;
if (INVALID_SOCKET != sock && 0 <= sock) {
fd_set *fds;
#if TARGET_OS_WIN32
if (CFDataGetLength(fdSet) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic should support resizing the buffer past sizeof(fd_set) since fd_set's trailing array appears meant to be used as elastic-sized, and we have nothing to detect the error condition if we use a fixed size and can't add a new socket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved per convo about large FD_SETSIZE override.

retval = true;
FD_SET(sock, (fd_set *)fds_bits);
FD_SET(sock, fds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macOS code made sure to resize so knows that this will succeed. However the Windows code is capping at 64 sockets and does not handle failure.

I think we should avoid using FD_SET on Windows because it has to assume an array size of 64, but fd_set is designed such that the allocation can make the trailing array bigger than that if it needs to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved per convo about large FD_SETSIZE override, although we could add an assert.

}

fd_set *fds = (fd_set *)CFDataGetMutableBytePtr(d);
fd_set invalidFds;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be more than 64 invalid sockets if fds->fd_count is bigger than 64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolving per convo about large FD_SETSIZE override.

Copy link
Contributor

@tristanlabelle tristanlabelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is very necessary, thanks! I think there is still a problem with the Windows implementation. The code is treating fd_set as fixed-size, which will cause silent failures if we exceed the 64 array size in the structure (not a problem on macOS with bitset resizing). Reading the docs, the fd_set structure is designed to have a trailing array that can be bigger than the default 64. I think the logic should support resizing the array past 64 as needed, so that we don't have this additional silent failure condition on Windows.

@lxbndr
Copy link
Contributor Author

lxbndr commented May 14, 2024

@tristanlabelle so the major concern is the static non-growing structure, right? I agree that it is not comfortable to live with this restriction, and my initial intention was to make it dynamic.

Not sure if it is really ok to manipulate inner structure array, but I think it worth to try. I am suspicious about changing it on the fly, because FD_SETSIZE is a define, and it must be defined before including winsock2 header. So it could be baked into some macros like FD_SET or FD_ZERO.

I initially planned not to grow the array, but create additional fd_sets as needed, so we would have growing array of fd_sets instead.

Also, some good news. The default cap is 64, which is pretty low. But look what we got here. We already have FD_SETSIZE defined as 1024. That's a plenty of space.

I could try dynamic sizing, but probably not early than this weekend 😔 If that not blocking anything critical for you, we could keep this for now. Or we could land it and I'll do a follow up.

And... Do you think it makes sense to fight 1024 sockets limit, while corelibs are gradually moving away from CoreFoundation as backing implementation? WDYT?

@tristanlabelle
Copy link
Contributor

@lxbndr Ah given the 1024 socket override I don't have much of a concern. I'll resolve my comments and file a follow-up issue.

You're right that we wouldn't be able to use FD_SET or FD_ZERO on Windows with dynamic sizing, they would need their own implementation.

@tristanlabelle
Copy link
Contributor

Actually FD_SET is the only problematic one for dynamic sizing:

#define FD_CLR(fd, set) do { \
    u_int __i; \
    for (__i = 0; __i < ((fd_set FAR *)(set))->fd_count ; __i++) { \
        if (((fd_set FAR *)(set))->fd_array[__i] == fd) { \
            while (__i < ((fd_set FAR *)(set))->fd_count-1) { \
                ((fd_set FAR *)(set))->fd_array[__i] = \
                    ((fd_set FAR *)(set))->fd_array[__i+1]; \
                __i++; \
            } \
            ((fd_set FAR *)(set))->fd_count--; \
            break; \
        } \
    } \
} while(0)

#define FD_SET(fd, set) do { \
    if (((fd_set FAR *)(set))->fd_count < FD_SETSIZE) \
        ((fd_set FAR *)(set))->fd_array[((fd_set FAR *)(set))->fd_count++]=(fd);\
} while(0)

#define FD_ZERO(set) (((fd_set FAR *)(set))->fd_count=0)

#define FD_ISSET(fd, set) __WSAFDIsSet((SOCKET)(fd), (fd_set FAR *)(set))

@tristanlabelle
Copy link
Contributor

Filed #4958 as follow-up

@compnerd
Copy link
Collaborator

@swift-ci please test Windows platform

@compnerd
Copy link
Collaborator

@swift-ci please test macOS platform

@hjyamauchi
Copy link
Contributor

@swift-ci please test macOS platform

@hjyamauchi
Copy link
Contributor

Does it looks like the macos CI was broken on May 11 before this PR was created? The latest run 466 looks broken in the same way as the May 11 run 461.

May 20 (the latest):
https://ci.swift.org/job/swift-corelibs-foundation-PR-macOS/466/

May 11:
https://ci.swift.org/job/swift-corelibs-foundation-PR-macOS/461/

Failed Tests (24):
  SwiftXCTestFunctionalTests :: ArgumentParser/main.swift
  SwiftXCTestFunctionalTests :: Asynchronous/Expectations/main.swift
  SwiftXCTestFunctionalTests :: Asynchronous/Handler/main.swift
  SwiftXCTestFunctionalTests :: Asynchronous/Misuse/main.swift
  SwiftXCTestFunctionalTests :: Asynchronous/Notifications/Expectations/main.swift
  SwiftXCTestFunctionalTests :: Asynchronous/Notifications/Handler/main.swift
  SwiftXCTestFunctionalTests :: Asynchronous/Predicates/Expectations/main.swift
  SwiftXCTestFunctionalTests :: Asynchronous/Predicates/Handler/main.swift
  SwiftXCTestFunctionalTests :: Asynchronous/Use/main.swift
  SwiftXCTestFunctionalTests :: EqualityWithAccuracy/main.swift
  SwiftXCTestFunctionalTests :: ErrorHandling/main.swift
  SwiftXCTestFunctionalTests :: FailingTestSuite/main.swift
  SwiftXCTestFunctionalTests :: FailureMessagesTestCase/main.swift
  SwiftXCTestFunctionalTests :: ListTests/main.swift
  SwiftXCTestFunctionalTests :: Observation/All/main.swift
  SwiftXCTestFunctionalTests :: Observation/MultipleObservers/main.swift
  SwiftXCTestFunctionalTests :: Observation/Selected/main.swift
  SwiftXCTestFunctionalTests :: Performance/Misuse/main.swift
  SwiftXCTestFunctionalTests :: Performance/main.swift
  SwiftXCTestFunctionalTests :: SelectedTest/main.swift
  SwiftXCTestFunctionalTests :: SingleFailingTestCase/main.swift
  SwiftXCTestFunctionalTests :: SkippingTestCase/main.swift
  SwiftXCTestFunctionalTests :: TestCaseLifecycle/Misuse/main.swift
  SwiftXCTestFunctionalTests :: TestCaseLifecycle/main.swift

@compnerd
Copy link
Collaborator

This is really windows specific, the tests haven't regressed, @parkera okay to merge ignoring the macOS issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compiler hangs occasionally on many-core CPUs on Windows
5 participants