Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Good article on x-forwarded-for parsing #29

Open
halostatue opened this issue Mar 19, 2022 · 10 comments
Open

Good article on x-forwarded-for parsing #29

halostatue opened this issue Mar 19, 2022 · 10 comments

Comments

@halostatue
Copy link

https://adam-p.ca/blog/2022/03/x-forwarded-for/

I’m going to be looking at this in substantially more depth, but this is probably worth looking at for this library as well to see if it’s worth adding references to the documentation.

@ajvondrak
Copy link
Owner

What a fantastic article! It speaks to my soul. 😭 I definitely want to link it in the docs, at least. Thanks so much for forwarding it! And thanks @adam-p for writing it. ❤️

It also gives me some food for thought. One thing the post calls out is

A default list of places to look for the client IP makes no sense

which is something I was feeling the last time I was doing significant work on this library:

In production, you likely want this to be a singleton - a list of only one
string. There are a couple reasons:
1. You usually can't rely on servers to preserve the relative ordering of
headers in the HTTP request. For example, the
[Cowboy](https://github.com/ninenines/cowboy/) server presently [uses
maps](https://github.com/elixir-plug/plug_cowboy/blob/f82f2ff982f04fb4faa3a12fd2b08a7cc56ebe15/lib/plug/cowboy/conn.ex#L125-L127)
to represent headers, which don't preserve key order. The order in which
we process IPs matters because we take that as the routing information for
the request. So if you have multiple competing headers, the routing might
be ambiguous, and you could get bad results.
2. It could also be a security issue. Say you're only expecting one header
like `X-Forwarded-For`, but configure multiple headers like
`["x-forwarded-for", "x-real-ip"]`. Then it'd be easy for a malicious user
to just set an extra `X-Real-Ip` header and interfere with the IP parsing
(again, due to the sensitive nature of header ordering).
We still allow multiple headers because:
1. Users can get up & running faster if the default configuration recognizes
all of the common headers.
2. You shouldn't be relying that heavily on IP addresses for security. Even a
single plain-text header has enough problems on its own that we can't
guarantee its results are accurate. For more details, see the
documentation for [the algorithm](algorithm.md).
3. It's more general. Networking setups are often very idiosyncratic, and we
want to give users the option to use multiple headers if that's what they
need.

There's definitely an upgrade path we could follow to get people to specify their headers explicitly. Plus, maybe replace the :headers option with :header, because multiple headers get weird. Strictly speaking, you could plug RemoteIp multiple times with different :header values, which has the benefit of at least setting a precedence order. 🤔 Will have to think on it.

They also opened the issue at sebest/xff#15 pointing out that their list of private IPs is incomplete, which is strictly true for remote_ip as well (the list originally came from the Rails implementation). Migrating that might be more of a pain, though.

@adam-p
Copy link

adam-p commented Apr 9, 2022

You're welcome! I wanted the post to be useful to people just like you, so I'm super happy you found it.

This is among the best libraries I've seen. I really liked your algorithm document.

You've already found the post, so I'll only add a couple of things...

Generic comma-separated headers like X-Forwarded-For, X-Real-Ip, and X-Client-Ip

Do you treat the latter two like a comma-separated list? (I think you do, but I have never tried to read Elixir.) They shouldn't be, but I guess there's no big downside. If the user has configured the lib to use one of those headers then they shouldn't be spoofable (a trusted reverse proxy set it) and treating it like a list-of-one is fine, and if the header is spoofable, then treating it like a list is the least of the user's problems. (It'd be nice to give an error or something to indicate to the user that something is wrong, but... again, if it's spoofable then it probably won't be spoofed with comma-separated values. Why would an attacker bother doing that?)

After writing the post I created a "reference implementation" in Go that you might be interested in: https://github.com/realclientip/realclientip-go I would really like to be able to point to implementations in other languages, so if yours ends up similar I'd be happy to add a stub realclientip-elixir repo to the organization that points to yours. (I mean, you'd also be welcome to move your repo under the org, but I doubt you will want to.)

@halostatue
Copy link
Author

@ajvondrak @adam-p I don’t have time to look this over now, but I’d be happy to provide an analysis of both codebases to see where they differ.

@halostatue
Copy link
Author

@ajvondrak, the short version is that remote_ip implements RightmostNonPrivateStrategy (more or less), but treats both the list headers and the single IP headers as equivalent.

What follows is mostly pulled from @adam-p's documentation and the implementation. I believe that it would be possible to do this with multiple plugs (for the use of single strategies). I have some thoughts about how this could be implemented, and it would depend on the ChainStrategy by default.

Before I get into my thoughts, let me cover the strategies in realclientip.

  • All strategies are configured for a single header, except ChainStrategy, which is configured with a list of strategies.

  • SingleIPHeaderStrategy derives an IP address from a single-IP header, such as X-Real-IP, CF-Connecting-IP, True-Client-IP, Fastly-Client-IP, X-Azure-ClientIP, and X-Azure-SocketIP.

    • This strategy should be used when the given header is added by a trusted reverse proxy.

    • You must ensure that this header is not spoofable (as is possible with Akamai's use of True-Client-IP, Fastly's default use of Fastly-Client-IP, and Azure's X-Azure-ClientIP). See the single-IP wiki page for more info.

    • SingleIPHeaderStrategy grabs the last known value of the specified header from the headers list. This is a reasonable default as RFC 2616 does not allow multiple instances of single-IP headers (or any non-list header). It is debatable whether it is better to treat multiple such headers as an error (more correct) or simply pick one of them (more flexible). As we've already told the user tom make sure the header is not spoofable, we're going to use the last header instance if there are multiple. (Using the last is arbitrary, but in theory it should be the newest value.)

  • LeftmostNonPrivateStrategy derives the client IP from the leftmost valid and non-private IP address in the X-Forwarded-For or Forwarded header.

    • This strategy should be used when a valid, non-private IP closest to the client is desired. Note that this MUST NOT BE USED FOR SECURITY PURPOSES. This IP can be TRIVIALLY SPOOFED.

    • The provided header name must be X-Forwarded-For or Forwarded.

    • It combines all good IP addresses from all instances of the header into a single list and then loops through them in top-down, left-right order, returning the first non-private IP address.

  • RightmostNonPrivateStrategy derives the client IP from the rightmost valid, non-private/non-internal IP address in the X-Fowarded-For or Forwarded header.

    • This strategy should be used when all reverse proxies between the internet and the server have private-space IP addresses.

    • The provided header name must be X-Forwarded-For or Forwarded.

    • It combines all good IP addresses from all instances of the header into a single list and then loops through them in bottom-up, right-left order, returning the first non-private IP address.

  • RightmostTrustedCountStrategy derives the client IP from the valid IP address added by the first trusted reverse proxy to the X-Forwarded-For or Forwarded header.

    • This Strategy should be used when there is a fixed number of trusted reverse proxies that are appending IP addresses to the header.

    • The provided header name must be X-Forwarded-For or Forwarded.

    • The count must be a positive integer.

    • It combines all good IP addresses from all instances of the header into a single list. The entry returned is (len(addresses) - 1) - (count - 1), or an empty value if there are fewer IPs than required or if the proxy didn't add a valid IP address.

    • In Elixir terms, we would actually grab at(reverse(addressess), count) (it may be (count + 1)).

  • RightmostTrustedRangeStrategy derives the client IP from the rightmost valid IP address in the X-Forwarded-For or Forwarded header which is not in a set of trusted IP ranges.

    • This strategy should be used when the IP ranges of the reverse proxies between the internet and the server are known.

    • If a third-party WAF, CDN, etc., is used, you SHOULD use a method of verifying its access to your origin that is stronger than checking its IP address (e.g., using authenticated pulls). Failure to do so can result in scenarios like:

      • You use AWS CloudFront in front of a server you host elsewhere. An attacker creates a CF distribution that points at your origin server. The attacker uses Lambda@Edge to spoof the Host and X-Forwarded-For headers. Now your "trusted" reverse proxy is no longer trustworthy.
    • The provided header name must be X-Forwarded-For or Forwarded.

    • The trusted ranges must contain all trusted reverse proxies on the path to this server. The ranges can be private/internal or external (for example, if a third-party reverse proxy is used).

    • It combines all good IP addresses from all instances of the header into a single list and then loops through them in bottom-up, right-left order. Trusted IPs in that order are skipped and the first entry is returned.

  • ChainStrategy attempts to use the given strategies in order. If the first one returns an empty string, the second one is tried, and so on, until a good IP is found or the strategies are exhausted.

    • A common use for this is if a server is both directly connected to the internet and expecting a header to check.

Here’s my thoughts on how this could work in Elixir:

  • The strategies could be implemented similar to how they are implemented now, but would be deferred to strategy implementation modules (possibly using a @behaviour as is done for the parsers).

  • The implementations would in be something like RemoteIp.Strategies.SingleIPHeaderStrategy, RemoteIp.Strategies.RightmostNonPrivateStrategy, etc.

  • I would not implement RemoteIp.Strategies.ChainStrategy, but instead would implement the plug as the wrapper to the ChainStrategy. The configuration could be one of:

    • A strategy-configuration tuple:

      plug RemoteIp, {:rightmost_non_private, header: "x-forwarded-for"}
    • A keyword list where the only key is header (implies
      :rightmost_non_private):

      plug RemoteIp, header: "x-forwarded-for"
    • A list of strategy-configuration tuples:

      [
        {:single_ip_header, header: "x-real-ip"},
        {:rightmost_trusted_count, header: "forwarded", count: 3},
        {:rightmost_non_private, header: "x-forwarded-for"},
      ]
    • The strategy configuration tuples could also be strategy modules (allowing custom extension with a behaviour):

      [
        {MyApp.MyStrategy, header: "forwarded"},
        {RemoteIp.Strategies.SingleIPHeaderStrategy, header: "x-real-ip"},
        {RemoteIp.Strategies.RightmostTrustedCountStrategy, header: "forwarded", count: 3},
        {RemoteIp.Strategies.RightmostNonPrivateStrategy, header: "x-forwarded-for"},
      ]
  • The implementation should probably be optimized to avoid parsing the wanted headers multiple times.

This is extremely doable.

@adam-p
Copy link

adam-p commented Apr 11, 2022

Sorry for the tangent, but when you get a moment could you explain this to me:

You usually can't rely on servers to preserve the relative ordering of headers in the HTTP request. For example, the Cowboy server presently uses maps to represent headers, which don't preserve key order. The order in which we process IPs matters because we take that as the routing information for the request. So if you have multiple competing headers, the routing might be ambiguous, and you could get bad results.

The order of multiple XFF or Forwarded headers obviously matters (and surely doesn't ever get reordered, right?), but how does the order of other headers matter?

(An answer I can see being possible is: Instead of looking through a list of headers names in order of preference, you go through the actual headers and take the first that matches something in the preference list. Is that happening?)

@halostatue
Copy link
Author

That would need to be something that would be raised with the makers of Cowboy or other HTTP servers, unfortunately. It appears that Erlang maps are sorted by key, meaning that they are stable, but unordered.

1> maps:put(a, 2, #{z => 1}).
#{a => 2,z => 1}

Keyword lists would be better (those are tuple lists), but would end up being a fairly major API change for Cowboy &c. But that’s only if the order of the headers matters. According to Cowboy documentation, RFC7230 3.2.2 indicates that the order of headers with different names does not matter.

So, I would bet that Cowboy (but I can’t find the code offhand):

  • parses the headers in order received;
  • puts new headers in the map; and
  • appends multiple instances of the same header to the header list.

It might have some optimizations for headers that are only supposed to appear once (e.g., Host) such that either the first value or the last value is used or that an exception is raised, but I’m not sure.

@ajvondrak
Copy link
Owner

(An answer I can see being possible is: Instead of looking through a list of headers names in order of preference, you go through the actual headers and take the first that matches something in the preference list. Is that happening?)

@adam-p Sort of. It's true that the :headers configuration is not in order of preference. Rather, every header it names is considered.

Elixir's Plug.Conn struct represents req_headers as an ordered list of tuples. So let's say it looks something like:

[
  {"accept", "*/*"},
  {"x-forwarded-for", "1.1.1.1, 2.2.2.2"},
  {"host", "localhost:4000"},
  {"x-client-ip", "3.3.3.3"},
  {"user-agent", "curl/7.77.0"},
]

We take this list and select just the headers whose names appear in the :headers configuration. The order of the req_headers is preserved irrespective of the order of the :headers configuration. So you could have

plug RemoteIp, headers: ["x-forwarded-for", "x-client-ip"]

or equivalently

plug RemoteIp, headers: ["x-client-ip", "x-forwarded-for"]

Either way, we'd whittle the above req_headers down to

[
  {"x-forwarded-for", "1.1.1.1, 2.2.2.2"},
  {"x-client-ip", "3.3.3.3"},
]

Each header is parsed for a list of IPs, then those lists are concatenated together, still respecting their relative ordering. Thus:

[
  {1, 1, 1, 1},
  {2, 2, 2, 2},
  {3, 3, 3, 3},
]

So, this list is taken to represent a network path like:

  1. Client 1.1.1.1 sends request to Proxy A
  2. Proxy A with IP 2.2.2.2 sets X-Forwarded-For: 1.1.1.1 and forwards to Proxy B
  3. Proxy B with IP 3.3.3.3 sets X-Forwarded-For: 2.2.2.2 and forwards to Proxy C
  4. Proxy C (whose IP isn't in the headers) sets X-Client-IP: 3.3.3.3 and forwards to Server
  5. Server receives request with the req_headers from above

However, as I was trying to convey in the docs, there are several things "off" about this. Really, it's based on the faulty premise that the order of Elixir's req_headers is sacred. As @halostatue illuminates:

  • Cowboy (the server most everyone is using) actually parses the headers into an Erlang map (which doesn't guarantee key order) before passing them on to Elixir's Plug, which converts the map into the list we see.
  • Moreover, the relative ordering of different headers doesn't even matter on the protocol level! 😅 I hadn't even realized this before, so thanks for pointing that out, @halostatue.

Then there's also the matter of how multiple headers with the same name get collapsed into comma-separated values. Suppose we had

  1. Proxy A sets X-Forwarded-For: 1.1.1.1
  2. Proxy B sets X-Client-IP: 2.2.2.2
  3. Proxy C adds X-Forwarded-For: 3.3.3.3, which gets concatenated with the existing header

This could come in to the req_headers as

[
  {"x-forwarded-for", "1.1.1.1, 3.3.3.3"},
  {"x-client-ip", "2.2.2.2"},
]

or

[
  {"x-client-ip", "2.2.2.2"},
  {"x-forwarded-for", "1.1.1.1, 3.3.3.3"},
]

Neither order gives us a list of parsed IPs that goes 1.1.1.1 -> 2.2.2.2 -> 3.3.3.3. I guess if multiple headers are getting mixed, it's up to the intermediate proxies to be pretty smart about how they're handled. 😕

Ultimately, if we can't rely on the relative ordering of separate headers, the :headers plural option doesn't make much sense as it is. Now that I think of it, I also recall being wary of treating it as a priority-ordered list because I didn't want the default to impose one header's precedence over another's. E.g., who am I to say if X-Real-IP should be preferred over X-Client-IP?

I guess users could set their own priority after a fashion by calling the plugin multiple times:

plug RemoteIp, headers: ["x-lo-priority"]
plug RemoteIp, headers: ["x-hi-priority"]
  • If neither header is set, neither call has an effect on the remote_ip
  • If only x-lo-priority is set, then the first call clobbers the remote_ip and the second call has no effect
  • If only x-hi-priority is set, then the first call has no effect and the second call clobbers remote_ip
  • If both headers are set, the first call clobbers the remote_ip with x-lo-priority, but the second call clobbers remote_ip all over again with x-hi-priority

Although interactions might get weird here when skipping proxy IPs. Especially if the two plugins are configured with different values for their :proxies + :clients. 🤔

@ajvondrak
Copy link
Owner

Do you treat the latter two like a comma-separated list? (I think you do, but I have never tried to read Elixir.) They shouldn't be, but I guess there's no big downside. If the user has configured the lib to use one of those headers then they shouldn't be spoofable (a trusted reverse proxy set it) and treating it like a list-of-one is fine, and if the header is spoofable, then treating it like a list is the least of the user's problems. (It'd be nice to give an error or something to indicate to the user that something is wrong, but... again, if it's spoofable then it probably won't be spoofed with comma-separated values. Why would an attacker bother doing that?)

@adam-p Yes, and I followed pretty much the same line of reasoning, lol. However, now that v1.0 has custom parsers, it'd be pretty easy to make one that expects just a single IP and use that as the default for x-client-ip and x-real-ip. Or even make ReomteIp.Parsers.Generic the single-IP parser and set up RemoteIp.Parsers.XForwardedFor as the comma-separated one (but that wouldn't be backwards compatible).

So many ideas flowing from this thread! 🤩

@ajvondrak
Copy link
Owner

@halostatue Thanks for the thorough breakdown in #29 (comment) 💜

It'd be interesting to offer all these different strategies. This library was always pretty opinionated about the algorithm, although everything else around it has become super configurable. So I guess making the algorithm configurable is kind of the next frontier, lol. Definitely going to be thinking on this. Want to get my thoughts organized enough to start spawning some follow-up tickets.

@adam-p
Copy link

adam-p commented Apr 13, 2022

(I revisited your algorithm document just now and I see that the answer to my question was there. Sorry!)

@ajvondrak IMO, putting the values from different headers into a single ordered list doesn't seem right. Even ignoring the random header ordering. That's just not what the different headers mean -- they're not meant to be a single order list altogether.

However...

If the user has configured a single header (which really is what they should do), then it's okay.

If you're using a strictly rightmost algorithm AND the user's reverse proxies are replacing any headers that are configured to be looked at AND XFF is being appended properly AND X-Real-IP (etc.) is being set properly AND there's only one reverse proxy, then it's okay.

For example:

X-Forwarded-For: 1.1.1.1, 2.2.2.2
X-Real-IP: 2.2.2.2

You get one of these two lists:

1.1.1.1, 2.2.2.2, 2.2.2.2
2.2.2.2, 1.1.1.1, 2.2.2.2

In either case, the rightmost is the same.

Now let's say the server is behind Cloudflare and one more internal reverse proxy:

X-Forwarded-For: 1.1.1.1, 2.2.2.2, 103.21.244.1 (this is Cf), 10.0.0.1
Cf-Connecting-IP: 2.2.2.2
==>
1.1.1.1, 2.2.2.2, 103.21.244.1, 10.0.0.1, 2.2.2.2
2.2.2.2, 1.1.1.1, 2.2.2.2, 103.21.244.1, 10.0.0.1

Same result in either case: RightmostTrustedRangeStrategy

Different result between cases: Strict rightmost, LeftmostNonPrivateStrategy, RightmostTrustedCountStrategy, RightmostNonPrivateStrategy

(Of course, some of those aren't correct in this case no matter what. But the point is that they'll randomly give different answers.)

So... the current approach is okay for most cases, but a) it feels weird to me, and b) it falls apart if you start handling different algorithms/strategies (and maybe if the reverse proxy configuration gets complex).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants