New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wsgi: add peer UNIX credential to a non-standard environment variable #37

Merged
merged 20 commits into from Apr 10, 2018

Conversation

Projects
None yet
2 participants
@bazsi
Contributor

bazsi commented Jun 25, 2017

Signed-off-by: Balazs Scheidler balazs.scheidler@balabit.com

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

It adds the UNIX UID of a unix domain socket that connects to our HTTP service over socket.

  • What is the related issue number (starting with #)

none

  • What is the current behavior? (You can also link to an open issue here)

peer credentials are not available to apps even if they bind to a unix domain socket

  • What is the new behavior (if this is a feature change)?

I've added a non-standard _REMOTE_UID value to environ, that is available to apps. I've originally did this patch to an older version of cherrypy and this is a forward port.

  • Other information:

@bazsi bazsi force-pushed the bazsi:add-peer-cred-to-environ branch from e7d4824 to 5bedde2 Jun 25, 2017

@bazsi

This comment has been minimized.

Contributor

bazsi commented Jun 25, 2017

I've attempted to fix automatic tests, but the ones in circle ci seems to be an infrastructure problem. The rest of the tests were green.

I'd be really grateful about the feature, especially the naming of the environment variable. I'd like to deploy something like this in production, and I'd like to use a name that will match the 'official' one.

thanks.

@bazsi

This comment has been minimized.

Contributor

bazsi commented Jun 27, 2017

Any comments?

@webknjaz

This comment has been minimized.

Member

webknjaz commented Jun 27, 2017

Oh, sorry @bazsi.. I was busy and didn't check notifications during past few days.

Nevermind about CircleCI, it's not configured yet.
Regarding your change, it's on my list to check it. I'll do this ASAP once I have some free time.

@@ -276,6 +278,11 @@ def get_environ(self):

return env

def _get_peer_uid(self):
creds = self.req.server.socket.getsockopt(socket.SOL_SOCKET, socket.SO_PEERCRED, struct.calcsize('3i'))

This comment has been minimized.

@webknjaz

This comment has been minimized.

@bazsi

bazsi Jun 28, 2017

Contributor

Done in 831ec2d

@@ -276,6 +278,11 @@ def get_environ(self):

return env

def _get_peer_uid(self):

This comment has been minimized.

@webknjaz

webknjaz Jun 27, 2017

Member

It isn't obvious what this function purpose is, so it's a good practice to leave comments + docstring.

This comment has been minimized.

@bazsi

bazsi Jun 28, 2017

Contributor

also done in 831ec2d

I would be squashing that into the base commit as review is complete.

@@ -254,6 +255,7 @@ def get_environ(self):
# AF_UNIX. This isn't really allowed by WSGI, which doesn't
# address unix domain sockets. But it's better than nothing.
env['SERVER_PORT'] = ''
env['_REMOTE_UID'] = self._get_peer_uid()

This comment has been minimized.

@webknjaz

webknjaz Jun 27, 2017

Member

Why do you need this as an environment variable?

@@ -276,6 +278,11 @@ def get_environ(self):

return env

def _get_peer_uid(self):
creds = self.req.server.socket.getsockopt(socket.SOL_SOCKET, socket.SO_PEERCRED, struct.calcsize('3i'))

This comment has been minimized.

@webknjaz

webknjaz Jun 27, 2017

Member

Can this be moved to the request or connection entity?

This comment has been minimized.

@bazsi

bazsi Jun 28, 2017

Contributor

I felt this was a lot more connected to REMOTE_ADDR and the likes, as I've felt it is a property of the connection rather than the request itself.

In our internal HTTP based service, we are using the environment to populate our internal data structures and by that time the connection itself is not available.

If you meant that _REMOTE_UID stays in the environment, but the implementation itself moves to HTTPConnection, that could make sense. Is that what you are thinking?

Thanks for the feedback.

This comment has been minimized.

@webknjaz

webknjaz Jun 28, 2017

Member

Could you please share your snippet where you cannot access HTTPConnection and still can access env vars? I'm really not sure whether all other users need this as it adds extra syscall overhead.

@bazsi

This comment has been minimized.

Contributor

bazsi commented Jun 28, 2017

I have now pushed a couple of fixups to address your notes. If you are ok with them, I would rebase the branch and squash them to the same patch.

@@ -26,8 +26,9 @@ def my_crazy_app(environ, start_response):
"""

import sys

import struct

This comment has been minimized.

@webknjaz

webknjaz Jun 28, 2017

Member

It's unused in this module

@@ -254,6 +255,9 @@ def get_environ(self):
# AF_UNIX. This isn't really allowed by WSGI, which doesn't
# address unix domain sockets. But it's better than nothing.
env['SERVER_PORT'] = ''
uid = self.req.conn.get_peer_uid()
if uid is not None:
env['_REMOTE_UID'] = str(uid)

This comment has been minimized.

@webknjaz

webknjaz Jun 28, 2017

Member

Are there any other WSGI implementations providing similar env var?

This comment has been minimized.

@bazsi

bazsi Jul 1, 2017

Contributor

I've found a number of WSGI/HTTP servers/libs that support unix domain sockets:

With that said, the only HTTP over UNIX domain socket implementation, that I've found to also support SO_PEERCRED based authentication is cups, which has its embedded http implementation (in C, brr) and does not use CGI like environment variables.

I've also looked up rfc3875, the original specification of CGI, which seems to be the origin for these environment variables: http://www.ietf.org/rfc/rfc3875. In the RFC, there's a standard REMOTE_USER environment variable, which seems to be what we aim for, but would require username lookup instead of just passing on the UID. username resolution could be expensive, especially as it would need to be done on every request.

That same RFC, also specifies that non-standard extensions should be prefixed with an X_.

With all this said, I think:

  • unix domain based authentication makes sense, there is at least one implementation, and a use-case that matches mine.
  • environment wise, we have three options:
    1. use X_REMOTE_UID and pass the uid only
    2. resolve the uid to username (with caching?), and reuse the standard REMOTE_USER attribute
    3. disregard the CGI origin, and use something different.

I've also found this document about WSGI: http://wsgi.readthedocs.io/en/latest/specifications/simple_authentication.html. This says, that the username needs to go to REMOTE_USER, thus

My opinion and the above spec point to option number 2) above.

If you agree, I can change my patches this way.

This comment has been minimized.

@webknjaz

webknjaz Jul 2, 2017

Member
  1. This shouldn't be a default behavior to avoid overhead for other users who may not want such logic to be triggered, so I would prefer that the calling logic go to CherryPy tool, which would be easy to turn on or switch off easily via config option.
  2. It would be enough to provide @properties, which evaluate creads lookup lazily, once per request as I commented earlier.
  3. It would be ok to have some function/method populating all of REMOTE_USER, X_REMOTE_UID, X_REMOTE_GID, X_REMOTE_PID or just some of them as configured.

This comment has been minimized.

@bazsi

bazsi Jul 2, 2017

Contributor

hmm.. cherrypy does not really have access to HTTPConnection objects, so I don't see how that would go back to the socket to do SO_PEERCRED. This is how it looks like (obviously I don't have intimate knowledge of these parts, I just did a quick browse of the related source code):

  • cherrypy instantiates a cheroot.wsgi.Server instance (in ServerAdapter). wsgi.Server is derived from the "native" HTTP server, cheroot.server.HTTPServer
  • as a connection arrives, a HTTPConnection is instantiated to encapsulate the connection specific socket
  • we parse the incoming request, and a cheroot specific HTTPRequest is created, to encapsulate the low-level HTTP details. Here we still have a reference to the connection/server objects.
  • then a cheroot.wsgi.Gateway is created, one that takes the low-level HTTPRequest, and populates a WSGI environment, and passes it to the the WSGI application via a __call__ method.
  • The environment is the only input to the WSGI app, even headers are translated into and from environment name-value pairs. (with ugly translations like Accept-Language becoming HTTP_ACCEPT_LANGUAGE)

There seems to be an intention to decouple the HTTP specific pieces and the app side, and the means to exchange information is the environment itself. It is thus not possible to go back from the app to HTTPConnection & HTTPRequest where the information is available.

The Native interface is one abstraction short (e.g. environment is not created and HTTPRequest is directly translated to cherrypy request objects), but that is not the default how cherrypy operates. That interface does not support authentication (e.g. request.login is not populated at all).

I could inject the socket into the environment (with a key like "wsgi.socket" to match other attributes), but my opinion is that this would be ugly and would derail the entire idea to decouple apps from HTTP specifics.

Short of adding this ugly hack to environment, we need to do the authentication on the cheroot side and pass the information via environment variables.

Still, REMOTE_USER looks to be the best solution, as that would work out of the box with cherrypy (e.g. request.login is filled automatically).

I could make the username resolution stuff optional by adding an additional constructor parameter to cheroot.wsgi.Server that could be driven by a cherrypy config variable (e.g. server.socket_file_auth True/False).

I can also amortize the performance impact by caching the username lookup, because in most cases, it would only be a very limited number of users that would use the interface.

I can see these courses of action at this point if we are to use REMOTE_USER:

  1. additional parameter w/o caching
  2. additional parameter w/ caching
  3. no additional parameter w/ caching

OR we can use X_REMOTE_UID, GID and PID as you suggested, in which case username resolution is not needed at all, so performance impact should be small. But in that case cherrypy.request.login will not be set automatically.

What do you think?

import six
import socket

This comment has been minimized.

@webknjaz

webknjaz Jun 28, 2017

Member

Unused as well

"""

if self.socket.family != socket.AF_UNIX:
return None

This comment has been minimized.

@webknjaz

webknjaz Jun 28, 2017

Member

I would prefer raising exception rather than returning "magic values"

pid, uid, gid = struct.unpack('3i', creds)
return uid
except socket.error:
pass

This comment has been minimized.

@webknjaz

webknjaz Jun 28, 2017

Member

Please, don't swallow exception. Log it and re-raise appropriate one.

This comment has been minimized.

@bazsi

bazsi Jul 1, 2017

Contributor

I actually wanted to ignore errors and in those cases not add this to the environment. It seems to be a Linux specific feature. I wouldn't want UNIX domain sockets not to work if this is not available.

perhaps a one-time warning would suffice?

This comment has been minimized.

@webknjaz

webknjaz Jul 2, 2017

Member

Alright, warning would be ok (mabe even NOTICE). But I still prefer raising exception instead of returning None.

@@ -1155,6 +1156,28 @@ def close(self):
# Apache does, but not today.
pass

def get_peer_uid(self):

This comment has been minimized.

@webknjaz

webknjaz Jun 28, 2017

Member

I'd prefer a method returning the whole tuple of creds and 3 @property methods returning each of 'em.

# NOTE: the value for SO_PEERCRED can be architecture specific, in
# which case the getsockopt() will hopefully fail. The arch
# specific value could be derived from platform.processor()
SO_PEERCRED = getattr(socket, 'SO_PEERCRED', 17)

This comment has been minimized.

@webknjaz

webknjaz Jun 28, 2017

Member

I'd add this as a monkey-patch next to similar win-specific code @ line 80 to be consistent

@webknjaz

This comment has been minimized.

Member

webknjaz commented Jun 28, 2017

@bazsi it would be very helpful to have tests and documentation for this feature as well + fix linter errors failing in Travis CI

@bazsi

This comment has been minimized.

Contributor

bazsi commented Jul 1, 2017

I'll do another iteration and push the updated patches.

@bazsi

This comment has been minimized.

Contributor

bazsi commented Jul 15, 2017

I still have not progressed with this. coming back once I have a timeslice.

@webknjaz

This comment has been minimized.

Member

webknjaz commented Jul 15, 2017

No worries. I'll be waiting for updates from your side and maybe even push a few commits to your branch if I have some free time for this.

@stale stale bot added the stale label Sep 13, 2017

@stale

This comment has been minimized.

stale bot commented Sep 13, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@webknjaz webknjaz force-pushed the bazsi:add-peer-cred-to-environ branch from 776c139 to 528f3ab Sep 14, 2017

@stale stale bot removed the stale label Sep 14, 2017

@webknjaz

This comment has been minimized.

Member

webknjaz commented Sep 14, 2017

@bazsi I've rebased your branch on top of current master

@webknjaz

This comment has been minimized.

Member

webknjaz commented Sep 14, 2017

@bazsi I've added bits of refactoring, but it's not considered as complete yet.

I'd like to address:

@webknjaz

This comment has been minimized.

Member

webknjaz commented Sep 14, 2017

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Unnecessary pass

Comment posted by lgtm.com

@codecov

This comment has been minimized.

codecov bot commented Sep 14, 2017

Codecov Report

Merging #37 into master will decrease coverage by 0.44%.
The diff coverage is 41.58%.

@@            Coverage Diff             @@
##           master      #37      +/-   ##
==========================================
- Coverage   66.42%   65.97%   -0.45%     
==========================================
  Files          17       17              
  Lines        2847     2942      +95     
==========================================
+ Hits         1891     1941      +50     
- Misses        956     1001      +45
@stale

This comment has been minimized.

stale bot commented Nov 13, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 13, 2017

@webknjaz webknjaz force-pushed the bazsi:add-peer-cred-to-environ branch 2 times, most recently from 0bcde15 to f18d472 Apr 7, 2018

@webknjaz

This comment has been minimized.

Member

webknjaz commented Apr 8, 2018

@bazsi I think, PoC is ready.

Additionally to #37 (review) I'm thinking that we could also looking at REMOTE_IDENT (RFC 3875 section 4.1.10., RFC 1413)

I'm almost ready to merge this, but it would be great to see some tests around it.

bazsi and others added some commits Jun 25, 2017

wsgi: add peer UNIX credential to a non-standard environment variable
Signed-off-by: Balazs Scheidler <balazs.scheidler@balabit.com>
Drop unnecessary pass statement
Reported by lgtm.com

@webknjaz webknjaz force-pushed the bazsi:add-peer-cred-to-environ branch from 670c8b0 to 899d943 Apr 8, 2018

balabit-sync pushed a commit to balabit-deps/balabit-os-6-python3-cherrypy8 that referenced this pull request Apr 9, 2018

wsgiserver: add _REMOTE_UID to contain the UNIX uid of the peer
This patch enables SO_PEERCRED based authentication into the NNX REST
server, by adding the _REMOTE_UID to the environment, which then
can be used to base authentication on.

A similar patch was submitted to CherryPy upstream, here:

cherrypy/cheroot#37

Signed-off-by: Balazs Scheidler <balazs.scheidler@balabit.com>
@webknjaz

This comment has been minimized.

Member

webknjaz commented Apr 10, 2018

@bazsi
Our test client doesn't support making HTTP requests over UNIX socket, so I'll have to postpone covering this with tests and doing any other modifications in the code base withing this PR.
I'm going to merge and release it now and expect that we could polish/work it out with bugfix releases if necessary.
Feel free to send new PRs with adding tests/fixing test client to support unix sockets etc.
I'm going to wire up CherryPy config options to this feature a bit earlier.

webknjaz added some commits Apr 9, 2018

@webknjaz webknjaz force-pushed the bazsi:add-peer-cred-to-environ branch from 0ecd13e to b2f1921 Apr 10, 2018

@webknjaz webknjaz merged commit b2f1921 into cherrypy:master Apr 10, 2018

2 of 7 checks passed

ci/circleci: linux-build CircleCI is running your tests
Details
ci/circleci: macos-build CircleCI is running your tests
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
lgtm analysis: Python Running analyses for revisions
Details
WIP ready for review
Details
codeclimate Approved by Sviatoslav Sydorenko.
Details

webknjaz added a commit that referenced this pull request Apr 10, 2018

Add PEEDCRED lookup caps for HTTP conns
PR #37 by @bazsi:

Implement PEERCRED lookup over UNIX-socket HTTP connection.

  * Discover connected process' PID/UID/GID

  * Respect server switches: ``peercreds_enabled`` and
    ``peercreds_resolve_enabled``

  * ``get_peer_creds`` and ``resolve_peer_creds``  methods on connection

  * ``peer_pid``, ``peer_uid``, ``peer_gid``, ``peer_user`` and ``peer_group``
    properties on connection

  * ``X_REMOTE_PID``, ``X_REMOTE_UID``, ``X_REMOTE_GID``, ``X_REMOTE_USER``
    (``REMOTE_USER``) and ``X_REMOTE_GROUP`` WSGI environment variables when
    enabled and supported

  * Per-connection caching to reduce lookup cost
@webknjaz

This comment has been minimized.

Member

webknjaz commented Apr 10, 2018

New minor version v6.2.0 is being released: https://travis-ci.org/cherrypy/cheroot/builds/364381847 (ETA: 30 min)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment