Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex rewriting of upstream/downstream headers in proxy #2144

Merged
merged 4 commits into from Mar 6, 2019

Conversation

@comp500
Copy link
Contributor

commented Apr 28, 2018

1. What does this change do, exactly?

Adds an extra parameter to header_upstream and header_downstream that allows regex-based replacement of headers. They are in the format header_upstream [header] [regex] [replacement]

This allows unwanted values from the server and client (e.g. redirects, cookies) to be modified by Caddy. This therefore allows search (and mobile) to be fixed on wikipedia.matt.life, and wikimirror.

2. Please link to the relevant issues.

#442 - An existing (but not merged) implementation of Location rewriting
#606 - Nginx proxy_redirect feature request
This change may be obsoleted by #1639, the proxy middleware rewrite.

3. Which documentation changes (if any) need to be made because of this PR?

Add the extra parameter to header_upstream and header_downstream, and describe it. Give examples of it being used to change the Location header, possibly referencing proxy_redirect.

4. Checklist

  • I have written tests and verified that they fail without my change
  • I have squashed any insignificant commits
  • This change has comments for package types, values, functions, and non-obvious lines of code
  • I am willing to help maintain this change if there are issues with it later
@CLAassistant

This comment has been minimized.

Copy link

commented Apr 28, 2018

CLA assistant check
All committers have signed the CLA.

@tobya

This comment has been minimized.

Copy link
Collaborator

commented Apr 30, 2018

Hi @comp500
Thank you for the PR. Can you give a specific example of how you would write the header_upstream directive with your new functionality and what it would be used for?
Thanks.

@comp500

This comment has been minimized.

Copy link
Contributor Author

commented Apr 30, 2018

It could be used for modifying redirects for a mirrored site. For example, Wikipedia uses absolute redirects to en.wikipedia.org for the homepage and search.

The directive could be used like this:

wikipedia.yoursite.com

proxy / https://en.wikipedia.org {
    header_downstream Location en.wikipedia.org wikipedia.yoursite.com
}

This would ensure that all redirects are for the correct host, rather than just the main page as in Matt's tweet.
It works for any header sent to or from the proxied server, in the format header_upstream [header] [regex] [replacement] or header_downstream [header] [regex] [replacement].

@mholt

This comment has been minimized.

Copy link
Member

commented Jun 17, 2018

This is cool. Hopefully I or someone else gets a chance to review this soon!

@lukenowak

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2018

Are there any news regarding this pull request? Can I help somehow (eg. giving it a try in my environments and sharing the results)?

{"proxy / localhost:8080 {\n transparent \nheader_upstream X-Test Tester \nheader_upstream X-Test Test Host \n}"},

// Test #3: transparent preset with multiple params
{"proxy / localhost:8080 {\n transparent \nheader_upstream X-Test Tester \nheader_upstream X-Test Test Host \nheader_upstream X-Test er ing \n}"},

This comment has been minimized.

Copy link
@dyrkin

dyrkin Aug 16, 2018

@comp500
As I understand it, the first arg is a name of a header, the second is the regular expression of what we want to replace, and the last argument is the value for the replacement.
Will it work if the last argument contains white spaces?

This comment has been minimized.

Copy link
@comp500

comp500 Aug 16, 2018

Author Contributor

You are correct. It will work if any argument contains white spaces, because you can surround each argument in quotes, so the white spaces within are part of the argument.
e.g. header_downstream Location https://en.wikipedia.org "http://local host"

@tobya

This comment has been minimized.

Copy link
Collaborator

commented Aug 16, 2018

@lukenowak it would be great if you could test in your environment.,

@dyrkin

This comment has been minimized.

Copy link

commented Aug 16, 2018

UPD

I'm trying to run it locally, but I get an error:
2018/08/16 13:28:52 /etc/caddy/Caddyfile:39 - Error during parsing: unknown property '443'

This is what I did:

dyrkin@nas:~/go/src/github.com/comp500/caddy/caddy$ git branch
  master
* rewriteheader

dyrkin@nas:~/go/src/github.com/comp500/caddy/caddy$ go run build.go

dyrkin@nas:~/go/src/github.com/comp500/caddy/caddy$ ./caddy -log stdout -agree=true -conf=/etc/caddy/Caddyfile -root=/var/tmp
    2018/08/16 13:28:52 /etc/caddy/Caddyfile:39 - Error during parsing: unknown property '443'

My Caddyfile

webmin.example.com {
    import wildcard_cert

    proxy / http://localhost:10000/ {
        transparent
        header_downstream Location "10000" "443"
    }
}

Did I miss something?

Update
I figured out the root cause of the issue. To fix it I removed original github.com/mholt and renamed github.com/comp500 folder to github.com/mholt, after that I ran the build again.
Sorry, I'm new in go.

I verified this functionality and it works excellent.
Directive header_downstream Location http://webmin.example.com:10000/ https://webmin.example.com/ works for me.

@comp500

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2018

Your caddyfile seems to work fine for me. Try checking out the branch in the ~/go/src/github.com/mholt/caddy folder instead, as the build script might be using the wrong package.

@lukenowak

This comment has been minimized.

Copy link
Contributor

commented Aug 28, 2018

@tobya I was able to remove specific cookies from the browser sent to the upstream by using:

http://luke.example.com:11080 {
  bind 127.0.10.1
  proxy / http://127.0.10.1:8888 {
    header_upstream Cookie "(.*)(^Chocolate=[^;]*; |; Chocolate=[^;]*|^Chocolate=[^;]*$)(.*)" "$1 $3"
    header_upstream Cookie "(.*)(^Cacao=[^;]*; |; Cacao=[^;]*|^Cacao=[^;]*$)(.*)" "$1 $3"
  }
}

It is super cool, that one header can be rewritten many times.

So for me this feature is super useful and flexible.

NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Sep 3, 2018
Not released yet functionality for regular expression cookie rewriting
is available: caddyserver/caddy#2144
Copy link
Member

left a comment

Thanks, this is a useful change and definitely useful. I've only made a brief pass, but it's looking mostly good so far.

Add the extra parameter to header_upstream and header_downstream, and describe it. Give examples of it being used to change the Location header, possibly referencing proxy_redirect.

Can you write up the specific docs for this? This is a complex feature and it needs to be documented correctly. For example, are placeholders (captures) supported? To what extent can people use Go's regular expressions here?

caddyhttp/proxy/upstream.go Outdated Show resolved Hide resolved
caddyhttp/proxy/upstream.go Outdated Show resolved Hide resolved
@comp500

This comment has been minimized.

Copy link
Contributor Author

commented Sep 8, 2018

I've written some docs in this gist, to be added to https://caddyserver.com/docs/proxy. If anything is unclear in it, please comment on the gist.

NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Sep 11, 2018
Not released yet functionality for regular expression cookie rewriting
is available: caddyserver/caddy#2144
@LaurensBosscher

This comment has been minimized.

Copy link

commented Nov 2, 2018

@mholt , this PR would be very useful for us and seems almost complete.

Is there any way I can contribute to getting this merged?

Thanks!

@francislavoie

This comment has been minimized.

Copy link
Collaborator

commented Nov 2, 2018

@LaurensBosscher You could help by testing the changes. Check out the latest master and merge this PR locally, compiling from source. You can also review the code and point out any potentially questionable behaviour if any.

NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Dec 5, 2018
Not released yet functionality for regular expression cookie rewriting
is available: caddyserver/caddy#2144

Caddy v0.11.1 fixes QUIC issues.

Zenexer fixes for regerssion in v0.11.1

Note: Since Caddy release v0.11.0 certificates has to match sites
      served by Caddy. This will result in lack of response for sites served
      on HTTPS for which certificate names (subject COMMON_NAME,
      subjectAltName DNS or IP) does not match site name.
NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Dec 6, 2018
Caddy v0.11.1 fixes QUIC issues.

Not released yet functionality for regular expression cookie rewriting
is available: caddyserver/caddy#2144

Not released yet functionality for ca_certifices in proxy:
caddyserver/caddy#2380

Zenexer fixes for regerssion in v0.11.1

Note: Since Caddy release v0.11.0 certificates has to match sites
      served by Caddy. This will result in lack of response for sites served
      on HTTPS for which certificate names (subject COMMON_NAME,
      subjectAltName DNS or IP) does not match site name.
@whitestrake whitestrake requested a review from mholt Feb 26, 2019
@Ramshackles

This comment has been minimized.

Copy link

commented Feb 27, 2019

Very much interested in this PR 👍

@tobya

This comment has been minimized.

Copy link
Collaborator

commented Mar 4, 2019

@Ramshackles Any chance you would be able to compile locally and test this change?

@lukenowak

This comment has been minimized.

Copy link
Contributor

commented Mar 6, 2019

@comp500 are you willing to continue working on this? Rebase on top of master (trivial change)?

@tobya I tried to run the test suite after rebasing this work on master, but my go environment seems "damaged" (caddytls/config.go:410:31: undefined: tls.VersionTLS13 and FAIL github.com/mholt/caddy/caddytls [build failed] during go test ./...), for now I have no time to investigate it further.

@mholt

This comment has been minimized.

Copy link
Member

commented Mar 6, 2019

@lukenowak Based on that I bet you just need to update to Go 1.12 to fix the error.

@Ramshackles

This comment has been minimized.

Copy link

commented Mar 6, 2019

@Ramshackles Any chance you would be able to compile locally and test this change?

Not sure how I would go about doing it, but guess I could try if needed.

@comp500 comp500 force-pushed the comp500:rewriteheader branch from e33be14 to 80c5c15 Mar 6, 2019
@comp500

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

@lukenowak I am certainly willing to continue working on this. I've rebased my PR on top of master, although the AppVeyor build seems to still be using Go 1.11 for some reason.

@francislavoie

This comment has been minimized.

Copy link
Collaborator

commented Mar 6, 2019

Yeah, just ignore the AppVeyor CI failures for now. AppVeyor still doesn't support Go 1.12 appveyor/ci#2875

@mholt
mholt approved these changes Mar 6, 2019
Copy link
Member

left a comment

Thanks for the work. :) Let's give this a shot.

@mholt mholt merged commit 47b7871 into caddyserver:master Mar 6, 2019
2 of 3 checks passed
2 of 3 checks passed
continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details
@yannouchou

This comment has been minimized.

Copy link

commented Apr 2, 2019

Any chance to ship this one into a release soon? I found out that many back-ends behave badly and don't honor the X-Scheme or X-Forwarded-Proto headers and send 302 redirects to an http scheme even when the original request is https, and this is often blocked by the browser...
The only workaround is downstream Location header rewriting, so I'm really hoping to get this PR released...

@mholt

This comment has been minimized.

Copy link
Member

commented Apr 2, 2019

It'll go out in our next release, which should be 1.0beta1.

NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Apr 26, 2019
This reverts commit 6d2019b, as new caddy
has issues with tls certificate configuration:
caddyserver/caddy#2588

About nxd-v0.11.5-4-g9d3151db:

 * not released yet functionality for regular expression cookie rewriting
   is available: caddyserver/caddy#2144

 * not released yet functionality for ca_certifices in proxy:
   caddyserver/caddy#2380

 * support for builtin log rotation disabling
@CristianCantoro

This comment has been minimized.

Copy link

commented Sep 12, 2019

@comp500 said:

This allows unwanted values from the server and client (e.g. redirects, cookies) to be modified by Caddy. This therefore allows search (and mobile) to be fixed on wikipedia.matt.life, and wikimirror.

Oh my goodness, more than one year after I discover this that in fact fixes a problem that I reported just recently on Wikimedia Phabricator (T232213) and here (#1011).

This is so awesome! It's hearthwarming knowing that there is somebody out there fixing my bugs even without me knowing it!

I have already pushed it on vikiansiklopedi.org and wikiproxy (formerly wikimirror, because - well - it's a proxy ont a mirror)

@mholt

This comment has been minimized.

Copy link
Member

commented Sep 14, 2019

FWIW, I just got this implemented into Caddy 2: https://github.com/caddyserver/caddy/wiki/v2:-Documentation#httphandlersreverse_proxy

It reuses the existing http.handlers.headers module, which also has the same ability to mutate headers by adding, deleting, setting, or replacing substrings in header values. It's pretty powerful. Try it out and let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.