-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional use of header to get user IP address #6973
Optional use of header to get user IP address #6973
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qqmyers would you like me to provide a review for this? I see some potential for optimization and have opinions 😉
Sure. That would be helpful. FWIW @landreev and I are looking into issues of multiple values/multiple headers right now. |
Great. It's quite late already over here, so please let me add my review tomorrow (while you are asleep 😄). |
We absolutely want to hear your feedback. |
@poikilotherm - based on a side discussion, I've updated the code to handle cases where there may be multiple headers and/or headers with several IP addresses. Not sure how that affects your existing comments but hopefully you can still review. We believe that the ordering is always such that the last value is closest to Dataverse. This code should cover the single proxy/load balancer case and enable IP group support with, for example, a campus proxy and load balancer, although in the latter case, MDC gets the campus proxy IP address (since that's what would be needed for IP groups.). If you think the logic is confused or the code isn't clear, we can hop into chat, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a very high level of looking at this and keeping in mind its security relevance, I would not recommend implementing these checks on such a deep level of the application.
From my point of view, it would make much more sense to implement this as a HTTPFilter
class.
- Can be done without changes to
web.xml
by using the@WebFilter
annotation. - Should catch calls to the REST API, too.
- Can replace the remote address in the HttpServletRequest with the client IP instead of the proxy
- Can be configured once on startup to avoid the repeated validation issue
I'm also missing tests for this crucial part of the code.
|
||
private static String headerToUse = System.getProperty("dataverse.useripaddresssourceheader"); | ||
|
||
private static final HashSet<String> ALLOWED_HEADERS = new HashSet<String>(Arrays.asList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing support for RFC 7239 based headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. No reason those shouldn't be included except that they have more structure and it would take more time to understand them/test, etc.
int index = ip.indexOf(','); | ||
if (index >= 0) { | ||
ip = ip.substring(index + 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is attempting to support the comma separated list used in the X-Forwarded-For header.
What I don't understand: the indexOf
will return an index to first appearance of ,
in the list. Then this is used to get the part beyond that as the IP.
Let's have an example:
10.0.0.1, 123.13.123.1, 134.94.231.0
- The code will result in a substring " 123.13.123.1, 134.94.231.0", which is no valid IP.
- It just extracted the IP of the proxies between the service and the client. Please review the format:
X-Forwarded-For: client, proxy1, proxy2
- This doesn't handle whitespace, too.
I don't think this has been intended here, does it?
Maybe using commons.lang3.StringUtils.substringBefore()
is the easiest way forward, null safe etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Straight up bug - lastIndexOf() was what I intended.
String remoteAddressFromRequest = aHttpServletRequest.getRemoteAddr(); | ||
if (remoteAddressFromRequest != null) { | ||
remoteAddressStr = remoteAddressFromRequest; | ||
try { | ||
address = IpAddress.valueOf(remoteAddressStr); | ||
} catch (IllegalArgumentException iae) { | ||
address = IpAddress.valueOf(saneDefault); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very verbose. Wouldn't it make sense to refactor the code of IpAddress.valueOf()
to be null safe and use a default value as second parameter? That would render this into a oneliner, much easier to read and understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable.
private final User user; | ||
private final IpAddress sourceAddress; | ||
|
||
private final static String undefined = "0.0.0.0"; | ||
private final static String localhost = "127.0.0.1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this variable is not used anymore?
public DataverseRequest(User aUser, HttpServletRequest aHttpServletRequest) { | ||
this.user = aUser; | ||
|
||
final String undefined = "0.0.0.0"; | ||
String saneDefault = undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to refactor this to use a class variable saneDefault
directly then?
String saneDefault = undefined; | ||
String remoteAddressStr = saneDefault; | ||
|
||
IpAddress address = null; | ||
|
||
if (aHttpServletRequest != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no handling of when the request IS null
, which would be a major failure. Instead, the object is created anyway. Either remove the if
or throw an exception in case it's null
. (Why would that ever happen?) You might think of using @NotNull
annotation for the parameter to document that you assume this will be never null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there are places in the code where null is sent. I'd have to look again to find specific examples but I think they are places where there's internal/async processing and possibly for tests. Refactoring could probably be done to avoid that.
// One can have multiple instances of a given header. They SHOULD be in order. | ||
Enumeration<String> ipEnumeration = aHttpServletRequest.getHeaders(headerToUse); | ||
if (ipEnumeration.hasMoreElements()) { | ||
// Always get the last header, which SHOULD be from the proxy closest to | ||
// Dataverse | ||
String ip = ipEnumeration.nextElement(); | ||
while (ipEnumeration.hasMoreElements()) { | ||
ip = ipEnumeration.nextElement(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a source this behavior is actually more than grey theory? I could imagine that you might have a situation where there are multiple different headers caused by different proxy implementations beyond each other (which is unsupported here). At least the most common de-facto standard X-Forwarded-For
is handling multiple proxies differently, see below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We spent time yesterday looking online for specs/practices and saw that multiple headers are allowed. It wasn't clear if they are in practice. Non-buggy software is supposed to have later proxies put their X-Forwarded-For headers after the initial ones so there is a global order even if there are multiple headers and some headers with multiple values. We debated whether we should try to support this or just flag it and fail, but the challenge is that, if you just use getHeader() you get the first header, which is not the one you'd want, so it looked like you still need to getHeaders() at some point.
logger.warning("Ignoring invalid IP address received in header " + headerToUse + " : " + ip); | ||
address = null; | ||
} | ||
if (address!= null && address.isLocalhost()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the use of .isLocalhost()
instead of checking for a String. 👍
if (ip != null) { | ||
// This conversion will throw an IllegalArgumentException if it can't be parsed. | ||
try { | ||
address = IpAddress.valueOf(ip); | ||
} catch (IllegalArgumentException iae) { | ||
logger.warning("Ignoring invalid IP address received in header " + headerToUse + " : " + ip); | ||
address = null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very verbose. Please see my comment below about refactoring IpAddress.valueOf()
becoming null safe (here: just return null
if given).
// Not allowed since it is hard to image why a localhost request would be | ||
// proxied and we want to protect | ||
// the internal admin apis that can be restricted to localhost access | ||
logger.warning("Ignoring localhost received as IP address in header " + headerToUse + " : " + ip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seriously? This is a malicious hacking attempt and all we do is log it? IMHO this request should be aborted with a 401.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attempt fails. That said, we couldn't think of a legitimate case for localhost to be sent. IP addresses that can't be parsed (the check before this) is another one where there's probably no reason to tolerate it as the code does now. Using a filter would definitely make it easier/more efficient to just refuse the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, we don't know for the fact that this is a malicious hacking attempt. Could be a misconfigured proxy, or an honest API scripting mistake, or something of that nature. I will mention this particular warning in the guide, as something to look for.
I will also recommend that anyone behind a proxy adds %header.x-forwarded-for%
(or whatever header it is in their case) to the access log format line in their Payara configuration (something we've been doing on our servers). Then the admin should monitor the log; and specifically look for entries with the localhost address in them - something like 127.0.0.1, 111.222.333.444
- could potentially indicate someone trying to send it along maliciously. An entry like 111.222.333.444, 127.0.0.1
on the other hand would indicate that the proxy is doing something seriously wrong, possibly messing up the order of the proxies in the comma-separated list. In which case the practice of trusting the header as the source of the true ip address should be discontinued.
I will also remind everybody in that guide section, that allowing localhost connections to bypass authentication for admin calls is also optional and configurable. And that any Dataverse running behind a proxy that is interested in enabling this mechanism of extracting true IP addresses from headers may consider disabling it after weighing all the risks involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I guess the thing to emphasize here is that it's less likely to be a "malicious attempt", and much more likely a "something is very wrong in your configuration" situation. Normally, this just should not happen. Even if someone is trying to maliciously pretend to be coming from localhost, that address should never end up in the portion of the header where Dataverse would be inclined to use it. Seeing this in the log would indicate that either the header lookup option was activated on a Dataverse that is NOT behind a proxy; or the proxy is handling the headers incorrectly, etc.)
@poikilotherm - Thanks for the great comments. I'll make a few comments inline but, in terms of new code, would it make more sense for you to make a PR? If you're interested and able to do that, I'm fine with it. If not, I can work to make changes. |
W.r.t. being a filter - I agree that makes sense in terms of security. |
My main concern is that this breaks MDC geolocation anywhere using load balancers, which either means lost info, or manual work to pull IP addresses from other logs. That said, there's no emergency and we're just aiming to get this into the 5.x series (versus a 4.20.1), so I don't think a short delay is too bad. That said, I don't know how urgent the issue of IP group access is for anyone or if MDC geolocation is critical for anyone affected (using a proxy/LB). So - if others think that something has to be part of 5.0, I could make minor updates to this PR that could then be replaced when you can do a more thorough redesign job.) |
I should've said so earlier, but I would very much vote in favor of using this PR, with whatever minor cleanup you deem necessary. This implementation does the job; and it does not appear to expose us to any serious risk. Anyone who have reasons to believe the risk is unacceptable has an option of not using this header lookup feature. Or to disable the admin API auth override for localhost. If we have a PR with a more elegant solution - a filter - later on, we'll accept it too. (I want this on my own server badly enough that I will probably make a patch based on this PR or a similar solution without waiting for v5). So I'll proceed adding the security warning to the guide under the assumption that we are using this PR. |
I was up for hacking away on this in spare time. It's late, so just a first draft: #6977 |
Just updated to fix the bug identified (not getting the lastIndexOf(',') in the header string and did some minor cleanup per the comments. If we want this for 5.0 or for back-porting prior to #6977 , I think the code is ready (awaiting documentation). |
Thank you, much appreciated. |
Added a section discussing the security implications of using a proxy supplied header for tracking the addresses of incoming requests.
proxy header address lookup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving the PR; anyone should feel free to review the extra documentation text that I have added; feel free to either suggest changes, or edit as needed)
@qqmyers There are some doc formatting issues: the first explanatory section is one big wall of text with no breaks. Leonid thinks the line breaks need to be formatted differently. The second involves the jvm-option section. It would be good to provide an example command line and the actual argument option list is malformatted -the first option is normally formatted but has a couple extra quotes before the name and then the remaining options are formatted as a quoted pasted section of text, indented with quoted bar. |
yeah, my blurb is a mess. will address. |
Functionality checks out, blocked when not behind lb, header address added to mdc log |
@kcondon |
(this one should finally make the github online previewer work on this file!)
formatting fixes for config.rst
@qqmyers did merging my diff overwrite the asadmin example command you had added earlier? ( |
6970-release-note
What this PR does / why we need it: Allows optional specification of an HTTP header as the source of user ip addresses. This is needed to get valid ip addresses for MDC and IP based access control when running Dataverse behind a proxy/load balancer
Which issue(s) this PR closes:
Closes #6970
Special notes for your reviewer: There are security implications if the ip can be spoofed. The header is explicitly not used/ignored if it reports a localhost (127.0.0.1) address
Suggestions on how to test this: This should work with/without a header set on a 'normal' test instance (since the header should be null, and should change the reported IP (e.g. as reported in the mdc logs) on a server behind an AWS load balancer when you use 'X-Forwarded-For' from the IP of the load balancer to the correct user IP address.
Does this PR introduce a user interface change? If mockups are available, please link/include them here: No
Is there a release notes update needed for this change?: Yes? Since it affects MDC reporting.
Additional documentation: