Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to set SO_REUSEPORT to the server's socket #505

Merged
merged 5 commits into from May 20, 2023

Conversation

VladimirKuzmin
Copy link
Contributor

@VladimirKuzmin VladimirKuzmin commented May 2, 2022

❓ What kind of change does this PR introduce?

  • 🐞 bug fix
  • 🐣 feature
  • πŸ“‹ docs update
  • πŸ“‹ tests/coverage improvement
  • πŸ“‹ refactoring
  • πŸ’₯ other

πŸ“‹ What is the related issue number (starting with #)

Resolves #504

❓ What is the current behavior? (You can also link to an open issue here)

Cheroot cannot be used in infrastructure where bind address for it allocated and kept by some different process.

main.py:

s = socket.socket(family)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
s.bind(('127.0.0.1', 1234))
subprocess.run(['python3', 'server.py', '127.0.0.1', '1234'])

server.py:

addr = (argv[0], int(argv[1]))
server = wsgi.Server(addr, wsgi_app)
server.start()  # fails, socket already in use

❓ What is the new behavior (if this is a feature change)?

Cheroot can optionally re-use port:

main.py:

s = socket.socket(family)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
s.bind(('127.0.0.1', 1234))
subprocess.run(['python3', 'server.py', '127.0.0.1', '1234'])

server.py:

addr = (argv[0], int(argv[1]))
server = wsgi.Server(addr, wsgi_app, reuse_port=True)  # a new option reuse_port
server.start()  # works, SO_REUSEPORT has been set

πŸ“‹ Contribution checklist:

(If you're a first-timer, check out
this guide on making great pull requests)

  • I wrote descriptive pull request text above
  • I think the code is well written
  • I wrote good commit messages
  • I have squashed related commits together after
    the changes have been approved
  • Unit tests for the changes exist
  • Integration tests for the changes exist (if applicable)
  • I used the same coding conventions as the rest of the project
  • The new code doesn't generate linter offenses
  • Documentation reflects the changes
  • The PR relates to only one subject with a clear title
    and description in grammatically correct, complete sentences

This change is Reviewable

Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to accept this as is right now. I wasn't aware of this option, and it'll need to be researched closer so that the PR is reviewed consciously. But I'm leaving a few notes for the future reference.

One thought is whether we really need this to be toggleable. SO_REUSEADDR cannot be switched on or off, for example. Why would it be any different for SO_REUSEPORT which is essentially a feature of the same category?
Are there any cases when it'd be undesired to have SO_REUSEADDR on by default?

Some googling suggests that it may be a good idea to have both of these options set on the socket simultaneously.

Extra reading for maintainers:

Extra questions:

  • How does this affect the ephemeral sockets?
  • How does this impact the abstract sockets?

cheroot/test/test_server.py Outdated Show resolved Hide resolved
@VladimirKuzmin
Copy link
Contributor Author

In my opinion here is the key logical difference between SO_REUSEADDR and SO_REUSEPORT (with exception to Windows):

  • SO_REUSEADDR is used to resolve conflict with some unrelated processes, and can be set by a process listening to non-wildcard addresses;
  • SO_REUSEPORT is used as an tool for system design (load balancing, assigning ports by some coordinating process, picking up port from another stuck process) and it's set intentionally by every process using the address.

Turning it on by default can lead to a situation when two independent processes (e.g two different cherrypy/cheroot servers) accidentally open connections to the same port and address, set SO_REUSEPORT just in case and get only part of their requests. It might be a rare case, but it would be hard to debug. So I am still inclined to use the extra parameter

I will add better handling of some special cases:

  • Windows can use SO_REUSEADDR instead of SO_REUSEPORT;
  • Ephemeral socket should not be used together with reuse_port, will add a check or switch off SO_REUSEPORT if we go with the solution without the extra parameter;
  • Will do research on abstract sockets.

@VladimirKuzmin VladimirKuzmin force-pushed the reuse-port branch 5 times, most recently from c203f77 to b05d84e Compare March 8, 2023 18:33
@VladimirKuzmin
Copy link
Contributor Author

Sorry for the long delay. I updated my PR and resolved previously discussed issues.

Also, here are some examples of port reuse implementations in Python web servers:
https://github.com/benoitc/gunicorn/blob/20.1.0/gunicorn/sock.py#L41
https://github.com/unbit/uwsgi/blob/2.0.21/core/socket.c#L699
https://github.com/jonashaag/bjoern/blob/3.2.2/bjoern.py#L24

Alternatively, if this feature is too expensive to maintain, I can send PR making prepare_socket a class method instead of a static method, so that method can be overridden.

@VladimirKuzmin VladimirKuzmin force-pushed the reuse-port branch 2 times, most recently from 3831a91 to d8e4e03 Compare March 9, 2023 17:24
Copy link
Member

@jaraco jaraco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. It seems to follow similar conventions of other servers and unblocks an important use case.

I find it a little awkward that REUSE_ADDR is now used in two different locations (under re-use port for Windows and by default for non-ephemeral addresses on other platforms).

  • Windows can use SO_REUSEADDR instead of SO_REUSEPORT;

I'm not sure I understand this motivation. Can you provide a reference for why SO_REUSEADDR is appropriate on Windows when the user is requesting SO_REUSEPORT?

cheroot/server.py Show resolved Hide resolved
cheroot/server.py Outdated Show resolved Hide resolved
@VladimirKuzmin
Copy link
Contributor Author

VladimirKuzmin commented May 18, 2023

I'm not sure I understand this motivation. Can you provide a reference for why SO_REUSEADDR is appropriate on Windows when the user is requesting SO_REUSEPORT?

Historically SO_REUSEADDR in Windows works as a combination of SO_REUSEADDR and SO_REUSEPORT. It's the reason this flag is skipped for Windows in IS_EPHEMERAL_PORT section (unintended port sharing could lead to the long debugging).

At the same time Windows has a flag SO_EXCLUSIVEADDRUSE which apparently is "anti-SO_REUSEPORT". That special case can probably be handled in the IS_EPHEMERAL_PORT section, but that would spread reuse port logic even more:

    if not IS_EPHEMERAL_PORT:
        """Enable SO_REUSEADDR for the current socket.
        Skip for Windows (has different semantics)
        or ephemeral ports (can steal ports from others).
        Refs:
        * https://msdn.microsoft.com/en-us/library/ms740621(v=vs.85).aspx
        * https://github.com/cherrypy/cheroot/issues/114
        * https://gavv.github.io/blog/ephemeral-port-reuse/
        """
        sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        if IS_WINDOWS and not reuse_port and hasattr(socket, "SO_EXCLUSIVEADDRUSE"):
            sock.setsockopt(socket.SOL_SOCKET, socket.SO_EXCLUSIVEADDRUSE, 1)

@jaraco
Copy link
Member

jaraco commented May 20, 2023

Commit 081aa090ac:
3: B6 Body message is missing

Commit 78d59c73d6:
3: B6 Body message is missing

I'm not fixing that.

@jaraco jaraco merged commit 2b3b3eb into cherrypy:main May 20, 2023
56 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an option for port re-using
3 participants