Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support HTTP Proxy #709

Closed
2 of 3 tasks
mikz opened this issue May 15, 2018 · 13 comments
Closed
2 of 3 tasks

Support HTTP Proxy #709

mikz opened this issue May 15, 2018 · 13 comments
Assignees
Milestone

Comments

@mikz
Copy link
Contributor

mikz commented May 15, 2018

APIcast should support proxying outgoing HTTP(S) connections through HTTP(S) proxy.

The underlying support is there: ledgetech/lua-resty-http#112

JIRA https://issues.jboss.org/browse/THREESCALE-221

Use Cases

Concerns

Using keepalives with HTTPS proxy could require small patch to the lua-resty-http library (ledgetech/lua-resty-http#112 (comment)).

APIcast might need to use lua-resty-http instead of native upstream+balancer when using HTTP proxy. Fortunately this could be abstracted away by the http_ng library.

@mikz
Copy link
Contributor Author

mikz commented Aug 22, 2018

@3scale/product this is now finished, but without Basic authentication. That requirement is not in JIRA. We probably won't have time to implement that in this release anyway. If we need support for Basic authentication, please open another ticket.

@albgus
Copy link

albgus commented Nov 2, 2018

So this appears to make traffic to the API backed go trough the proxy as well, and that seems weird.

Is there a way to use proxy only for the Service Management API and Account Management API?

@mikz
Copy link
Contributor Author

mikz commented Nov 2, 2018

@albgus you can use NO_PROXY ENV variable to disable proxy for some (internal/upstream) domains.

This is standard way of configuring a proxy server. I don't know any software that would use proxy deliberately just for a subset of outgoing connections.

@albgus
Copy link

albgus commented Nov 2, 2018

There is nothing standard about having a reverse proxy, which directs everything through another proxy, to finally reach the intended API backend.

Look in the THREESCALE-221 ticket, and specifically the sidecar proxy workaround. The issue needing solving is when the gateway is deployed into a internal network, and needs to go trough a corporate proxy to reach the internet and talk to the APIs of the 3scale SaaS. I would imagine that everyone in this situation wants the API Gateway to service API Backends that is also located in the internal networks and doesn't need to go trough the proxy.

@mikz
Copy link
Contributor Author

mikz commented Nov 2, 2018

@albgus by "configuring a proxy server" I meant configuring any software to use a proxy server.

If you don't want to proxy internal connections by a proxy, just set NO_PROXY=*.internal. This is how your OS, Browser and every other piece of software is configured.

@magnusvage
Copy link

magnusvage commented Nov 7, 2018

@mikz how would you suggest using that variable in a very complex network environment? Hundreds of lines in the autoconfig proxy script used by browsers in the network, with a plethora of rules depending on both your own subnet and the destination subnet. Could such a script be used instead of the NO_PROXY environment variable? (Without having to write the mother of all homegrown unmaintainable scripts to parse the browser config script into a NO_PROXY string)

@mikz
Copy link
Contributor Author

mikz commented Nov 7, 2018

@magnusvage Unfortunately I don't have any suggestion. We support only the standard env variables, not the Proxy auto-config (PAC).
PAC is a JavaScript function, that can have any logic, so it can't really be translated into NO_PROXY in some generic way. I don't really see that we would want to support that.

In server components, it is pretty common to rely just on simple NO_PROXY and whitelist internal domains. The only software I know that supports the PAC are the web browsers.

@albgus
Copy link

albgus commented Nov 8, 2018

So, to further illustrate the point. I imagine that most companies where a proxy is required to reach the internet has a network structure that looks something like this, with the proxy used for reaching 3scale SaaS services.

     <Internet>
          ^                    <Public API>
          |                         |
+---------|---------------+---------|---------------------+
|         |               |DMZ      ⌄                     |
|    <Corp proxy> <------------<apicast>------*           |
|                         |         ^          |          |
|Corp Infra               |         |          |          |
+-------------------------+---------|--+-------|----------+
|Office                             |  |       |          |
|                       <PCs>-------*  |       ⌄          |
|                                      |   <API Backend>  |
|                                      |                  |
|                                      |Services zone     |
+--------------------------------------+------------------+

Now, with the current implementation that API Backend traffic also goes trough the proxy, this happens, and that causes problems.

        <Internet>
             ^                 <Public API>
             |                      |
+------------|------------+---------|---------------------+
|            |            |         ⌄                  DMZ|
|    <Corp proxy> <------------<apicast>                  |
|           |             |         ^                     |
|Corp Infra |             |         |                     |
+-----------|-------------+---------|--+------------------+
|Office     |                       |  |                  |
|           |           <PCs>-------*  |                  |
|           *------------------------>X|   <API Backend>  |
|                                      |                  |
|                                      |Services zone     |
+--------------------------------------+------------------+

@mikz
Copy link
Contributor Author

mikz commented Nov 8, 2018

@albgus So how would you make curl work from the DMZ? It is configured through the same env variables.

You can just give your API Backend services a proper DNS name such as *.api.internal and set NO_PROXY=*.api.internal. Then APIcast will route that traffic directly, not through the proxy.
This is how UNIX based utilites are configured.

Even Operating Systems are configured the same way. You enable the proxy for everything and then whitelist what is not going through the proxy.

screenshot 2018-11-08 at 09 51 00

@magnusvage
Copy link

If NO_PROXY would accept CIDR notation and not only hostnames, that would be a great leap forward (Yes, of course a proper DNS structure should be set up, but reality isn't always perfect).

@mikz
Copy link
Contributor Author

mikz commented Nov 8, 2018

@magnusvage that is a valid requirement and something we should be able to do. But, then you'd have to set up your services as IP addresses, not hostnames. If that is ok we could open an issue for CIDR support in the NO_PROXY.

@magnusvage
Copy link

magnusvage commented Nov 8, 2018

Couldn't the client check both hostname and the resolved IP towards the NO_PROXY? If either match, then go direct, otherwise proxy. Similar to your screenshot, NO_PROXY should support both types of entries. In an organisation with a rapidly changing PAC, most of the changes would not affect these parts of the network is my feeling. If a new subdomain or subnet needs to be accessed directly, I would think it is reasonable to have to edit the NO_PROXY. With a little forethought, the overhead should be minimal.

@mikz
Copy link
Contributor Author

mikz commented Nov 8, 2018

@magnusvage I believe the setting on the screenshot applies only to IP addresses. So NO_PROXY should support both types, but it does not do resolving before applying the whitelist.

Try it for yourself:

NO_PROXY=127.0.0.1 ALL_PROXY=example.com:80 curl -v http://localhost:8080
NO_PROXY=127.0.0.1 ALL_PROXY=example.com:80 curl -v http://127.0.0.1:8080

It connects to the proxy. The NO_PROXY is applied to the hostname, not to the resolved IP address.

I don't see a big deal adding DNS entries for your internal API services under one internal domain. For example *.dmz.internal. Then you know those are accessible from the DMZ. It is just another kind of whitelist. DNS is a many-to-many relationship, so your services can have multiple DNS addresses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants