Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UV_EAGAIN (errno 4088) related to high/low watermarks. #13

Closed
lschoe opened this issue Jan 2, 2024 · 7 comments
Closed

UV_EAGAIN (errno 4088) related to high/low watermarks. #13

lschoe opened this issue Jan 2, 2024 · 7 comments
Labels
bug Something isn't working help wanted Extra attention is needed question Further information is requested

Comments

@lschoe
Copy link
Collaborator

lschoe commented Jan 2, 2024

This issue is about the second problem mentioned in issue #12.

The problem seems to be related to the current flow control implementation in Winloop 0.1.0. An UV_EAGAIN (errno 4088) occurs when handling larger volumes of data exchanges. The problem is not present with any of the other event loops (asyncio's proactor, selector, nor with uvloop).

You can reproduce the problem running, e.g., the np_lpsolver demo for the MPyC package. The following code is from the latest version MPyC v0.9.7 on GitHub, see mpyc/asyncoro.py. This is inside the connection_made(transport) method:

        if 'winloop' in str(self.runtime._loop):
            # Workaround to avoid problems with handling of high/low watermarks when
            # running MPyC demos with large communication complexity.
            # For example, "python np_lpsolver.py -i5 -M3 -E1"
            # gives "OSError: [Errno 4088] Unknown error"  (4088 stands for UV_EAGAIN).
            # Winloop 0.1.0 default setting, with FLOW_CONTROL_HIGH_WATER = 64:
            #   high = 65536 = FLOW_CONTROL_HIGH_WATER * 1024
            #   low  = 16    = FLOW_CONTROL_HIGH_WATER // 4
            # Workaround:
            self.transport.set_write_buffer_limits(high=2**18)
            # With high=2**18, problem still present for cases with even larger communication
            # complexity, like "python np_lpsolver.py -i7 -M3 -E1", and setting
            # high=2**22 resolves this case as well.
            # Alternative workaround: set high=0 to disable use of high/low watermarks.
            # NB: these workarounds do not work with SSL,
            # for example, "python np_lpsolver.py -i5 -M3 -E1 --ssl"
            # keeps giving "OSError: [Errno 4088] Unknown error" whatever we set for high and low.
            print(f'workaround for Winloop: {self.transport.get_write_buffer_limits()=}')

Using python np_lpsolver.py -i5 -M3 -E1 in mpyc/demos runs the demo between three parties on localhost using Winloop. If the default flow control settings are used, all parties will get a 4088 error. If the high watermark is increased from 2**16 to 2**18, the problem disappears. The problem also disappears by setting high=0. But, interestingly, these workarounds do not work with SSL.

@Vizonex
Copy link
Owner

Vizonex commented Jan 3, 2024

This seems to be an issue beyond my scope of knowledge but I have seen it do things and bugs at times. Even I'm still trying to port more stuff from linux over to windows. I will flag this issue as a bug for now but if I knew which cython file this was coming from that will help me figure out how to fix it...

@Vizonex Vizonex added bug Something isn't working help wanted Extra attention is needed question Further information is requested labels Jan 3, 2024
@lschoe
Copy link
Collaborator Author

lschoe commented Jan 4, 2024

To reproduce this error easily I'm attaching a small program hi2.py. If you run start python hi2.py 1 & python hi2.py 0 on a Windows command prompt, a server and client will be run, and the above error 4088 will show.
The program can also be used to see the error for issue #12.

You can also disable the use of Winloop to see that the error does not happen with the default event loop. Or try some other settings, for which the problem sometimes disappears, or shows a different kind of error.

lschoe added a commit to lschoe/Winloop that referenced this issue Jan 7, 2024
For now, using hard coded error numbers.
@lschoe
Copy link
Collaborator Author

lschoe commented Jan 7, 2024

Looks like I've found a fix for the problem behind this issue #13, and also a fix for the other issue #12.

I had been looking at the reason why the errors are actually not printed correctly on Windows, and that way I was prepared to see the problems in these two lines of Winloop/handles/stream.pyx at some point:

elif err != uv.EAGAIN:

if nread == uv.EOF:

On Windows, we have uv.EOF equal to -1 (instead of -4095) and uv.EAGAIN equal to 11 (instead of -4088). Putting these values -4095 and -4088, hard coded in the source code for now, fixes the problem! So, the workaround above is no longer needed.

@Vizonex
Copy link
Owner

Vizonex commented Jan 8, 2024

Looks like I've found a fix for the problem behind this issue #13, and also a fix for the other issue #12.

I had been looking at the reason why the errors are actually not printed correctly on Windows, and that way I was prepared to see the problems in these two lines of Winloop/handles/stream.pyx at some point:

elif err != uv.EAGAIN:

if nread == uv.EOF:

On Windows, we have uv.EOF equal to -1 (instead of -4095) and uv.EAGAIN equal to 11 (instead of -4088). Putting these values -4095 and -4088, hard coded in the source code for now, fixes the problem! So, the workaround above is no longer needed.

@lschoe I will try to implement this in shortly thanks for figuring this out for me...

@Vizonex
Copy link
Owner

Vizonex commented Jan 8, 2024

@lschoe Hopefully this bug-fix is what you were looking for 66bfb74 I will close this issue if you think that this change is acceptable enough...

Vizonex added a commit that referenced this issue Jan 8, 2024
@Vizonex
Copy link
Owner

Vizonex commented Jan 8, 2024

@lschoe I merged your pull request instead. I was confused with what was needed but I will close this issue now that it's correctly been handled...

@Vizonex Vizonex closed this as completed Jan 8, 2024
@lschoe
Copy link
Collaborator Author

lschoe commented Jan 8, 2024

Great! This way I can now run all MPyC programs, with all datasets small to large, experiencing the same kind of speedup for Winloop as for uvloop.

The PR is a simple, direct fix. For a more systematic fix, also to cover other potential problems due to misaligned error numbers, I think a nice target for you is to first fix the error messages printed by Winloop,

Currently, we get something like "OSError: [Errno 4088] Unknown error" printed, but -- from looking at your errors.pyx --- you want something like "BlockingIOError" and then with the correct error message "resource temporarily unavailable" attached based on the line " XX(EAGAIN, "resource temporarily unavailable")" from the uv.h file. This does some macro stuff involving

typedef enum {
#define XX(code, _) UV_ ## code = UV__ ## code,
  UV_ERRNO_MAP(XX)
#undef XX
  UV_ERRNO_MAX = UV__EOF - 1
} uv_errno_t;

which takes things (prefixed with UV__) from uv/errno.h. Not sure how much you need to do still to get things working on Windows. I would hope/expect this kind of thing has been sorted out already (by/for libuv?), as it is probably needed in many other applications as well. Anyway, Errno.h has the list with Windows error codes.

Ideally, you can fix this such that you can use the existing uvloop code involving error codes without making changes at that level. Only some changes at a lower level, to deal with the mapping of the error numbers, is done in one central place in the source code for Winloop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants