Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proxy authentication #1337

Merged
merged 6 commits into from Dec 18, 2018
Merged

Conversation

mmmheeren
Copy link
Contributor

Configuration from either Maven settings or system properties
Issues: #525, #1304

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

@mmmheeren
Copy link
Contributor Author

I signed the CLA

@mmmheeren
Copy link
Contributor Author

mmmheeren commented Dec 12, 2018 via email

Copy link
Member

@chanseokoh chanseokoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting up this! We have no real expertise in proxy, so we value your input. @GoogleContainerTools/java-tools

(So, not having expertise, if I am asking a question in my comments, it is not a rhetorical question.)

I'm not sure why the CLA bot cannot verify you signing CLA. Maybe try singing it again?

Lastly, please try to add some tests.

Integer.parseInt(System.getProperty("http.proxyPort"))),
new UsernamePasswordCredentials(
System.getProperty("http.proxyUser"), System.getProperty("http.proxyPassword")));
} else if (System.getProperty("https.proxyUser") != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both http.proxyUser and https.proxyUser are given, should we do this them all? That is, should we remove else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I thought to have separate sets of properties for http and https. But I just learned that Maven active proxy applies to both http and https, so we need only a single set of credential properties. I will change the code to only use http.proxyUser and http.proxyPassword, and apply them on AuthScope.ANY.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm not getting this correct:

What you suggested seems to make sense when they were given through the Maven settings file. What if both http.proxyUser and https.proxyUser are defined via command line system properties instead of the Maven settings file? (A Gradle project is an example, but it's also a possible scenario in Maven.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And isn't it that setting the scope with the given host is the correct way? I wonder if AuthScope.ANY may interfere with Jib setting auth for container registries.

Related, what's the end result of calling .setCredentials()? For example, does it end up with setting the Proxy-Authorization: Basic xxx? (Or, the Authorization: header?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related, what's the end result of calling .setCredentials()? For example, does it end up with setting the Proxy-Authorization: Basic xxx? (Or, the Authorization: header?)

I found a way to test and check this locally: #1304 (comment)

So, to answer my question myself, it sets Proxy-Authorization indeed in the proxy case. (Haven't verified myself, but it may depend on the response code and header being WWW-Authenticate or Proxy-Authenticate.) And because it's not the plain Authorization and sent only to a proxy, I believe it won't interfere with target host authentication. We are good with .setCredentials() here.

And from my testing, it is OK to call .setCredentials() multiple times with different hosts (and their corresponding user name and password pairs). The Apache client maintains a map for multiple hosts. Therefore, I think the structure should be

// one for HTTP
if (System.getProperty("http.proxyUser") != null) {
}
// the other for HTTPS
if (System.getProperty("https.proxyUser") != null) {
}

instead of chaining the if statements with else.

In any case, the Javadoc of AuthScope.ANY says

Default scope matching any host, port, realm and authentication scheme. In the future versions of HttpClient the use of this parameter will be discontinued.

so we should not use AuthScope.ANY. Also, it doesn't make sense to send auth info to totally unrelated proxies for which no proxy settings are configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, indeed if organization has proxy server(s) with different credential pairs for accessing proxy over https, and accessing over http, then we would need to have 2 separate credential pairs. I guess in most cases - like my organization - it would be the same credential pair. So would be nice if you could also specify just a single credential pair indifferent of the protocol. I mean, by default use http.proxyUser for both protocols, unless https.proxyUser is explicitly specified. This would also align with sending just single credential pair (http.proxyUser) from active proxy in Maven settings.

Copy link
Contributor Author

@mmmheeren mmmheeren Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

  private static void addProxyCredentials(HttpTransport transport) {
    DefaultHttpClient httpClient =
        (DefaultHttpClient) ((ApacheHttpTransport) transport).getHttpClient();

    boolean httpProxy = System.getProperty("http.proxyHost") != null;
    boolean httpCreds = System.getProperty("http.proxyUser") != null;
    boolean httpsProxy = System.getProperty("https.proxyHost") != null;
    boolean httpsCreds = System.getProperty("https.proxyUser") != null;
    if (httpProxy && httpCreds) {
      httpClient
              .getCredentialsProvider()
              .setCredentials(
                      new AuthScope(System.getProperty("http.proxyHost"), Integer.parseInt(System.getProperty("http.proxyPort"))),
                      new UsernamePasswordCredentials(
                              System.getProperty("http.proxyUser"), System.getProperty("http.proxyPassword")));
    }
    if (httpsProxy && (httpsCreds || httpCreds)) {
      httpClient
              .getCredentialsProvider()
              .setCredentials(
                      new AuthScope(System.getProperty("https.proxyHost"), Integer.parseInt(System.getProperty("https.proxyPort"))),
                      new UsernamePasswordCredentials(
                              httpsCreds ? System.getProperty("https.proxyUser") : System.getProperty("http.proxyUser"), 
                              httpsCreds ? System.getProperty("https.proxyPassword") : System.getProperty("http.proxyPassword")));
    }
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind my last remark on Maven active proxy. DefaultProxySelector currently relied on, selects proxy protocol properties matching request protocol. So indeed its better to take into account multiple active proxies for both http and https requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look on my last commit. It takes into account multiple Maven active proxies and registers proxy credentials per protocol.

@loosebazooka
Copy link
Member

@mmmheeren I checked the CLA service and it does not appear to have you github username associated with any signing. Can you check your status here: https://cla.developers.google.com/clas

And if you do not see yourself can you try signing it again: https://cla.developers.google.com/

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

@chanseokoh
Copy link
Member

chanseokoh commented Dec 13, 2018

Obviously, I've been confused in this proxy business. I've been doing some research and got some knowledge. I am probably still wrong in many aspects, but I'd like to share some of things I think worth pointing out (hopefully they are right):

  1. Normally a proxy server is over HTTP, not HTTPS. Clients usually talk to the proxy server over plain HTTP.
    • Don't be confused. This does not mean the proxy server cannot handle HTTPS requests to a target website. Most proxy servers nowadays can handle both HTTP/HTTPS requests to target websites. That is, a proxy server over plain HTTP can act as an HTTP and HTTPS proxy server.
    • However, looks like it is not impossible to set up SSL on a proxy server, but I think this is a very sophisticated setup.
      • Related, I have not found a way for the JDK java.net.URL or the Apache HttpClient to establish TLS connection to a proxy through http.proxyHost or https.proxyHost. They all initiate plain HTTP connection to a proxy, even if I set up a mock proxy listening on port 443 (the default HTTPS port). There might exist some advanced API usage to make this happen, but by just using http(s).proxyHost with basic Java connection code, I couldn't.
  • For JDK java.net and Apache HttpClient, http(s) in http(s).proxyHost refer to the protocol for target websites. proxyHost in both cases refer to a proxy host. Since a proxy server can normally handle both protocols, often you will set the same proxy server for http.proxyHost and https.proxyHost, but of course it's possible to have different proxy servers depending on HTTP or HTTPS requests to target websites.
  • Interestingly, JDK java.net does some smart thing with proxies: if it cannot connect to a proxy server given by http(s).proxyHost, it by-passes the proxy setting and try to reach the target website directly. On the other hand, Apache HttpClient doesn't do this. If it cannot connect to http(s).proxyHost, it'll fail.
  • Some people seem to claim that the <protocol>http</protocol> and <protocol>https</protocol> in the Maven settings are for the protocol to use to connect to a proxy (here and here), but I doubt it. Actually, some comments in the linked answers counter the argument. I want to believe that <protocol> matches the semantics of http(s) in http(s).proxyHost, i.e., a protocol for target websites, not a protocol for proxy servers.
    • Assuming that the latter argument is correct, what we do in this PR (assigning http(s).proxyHost based on <protocol>http(s)) should be the right way.
  • Some people claim that you can have multiple active proxy servers defined in the Maven settings and they do work nicely: link and link. Interestingly, some people also claim that basically there can be only one active proxy in Maven, saying only the active proxy setting encountered first will be effective and the rest will be ignored. I guess both of the arguments might be right in some sense, if for example, Maven is smart enough to work like this: tries the first active proxy, and if it doesn't work, try the next one. But that's just my guess, and it's really unclear to me which one is right.

I suspect the approach we'll really need here is to populate a ProxySelector — since the proxies can be per-host —

It doesn't seem possible to specify per-host proxies in Maven settings (or through http.proxyHost and https.proxyHost, so I think it doesn't really make much sense to add support for such advanced proxy configuration in Jib for now, if not ever.

@mmmheeren
Copy link
Contributor Author

changed my profile to hopefully fix the cla

@chanseokoh
Copy link
Member

@mmmheeren our CLA system shows that you have signed the CLA. Try commenting "I signed it!" again to see if the CLA bot picks that up.

@briandealwis
Copy link
Member

The Maven settings description says:

protocol, host, port: The protocol://host:port of the proxy, separated into discrete elements.

But the Configuring a proxy doc does suggest that an expectation of a single proxy.

@mmmheeren
Copy link
Contributor Author

I signed it!

@chanseokoh
Copy link
Member

@mmmheeren looks like the CLA bot was correct that not everyone signed the CLA. In your initial commit (ec4c65e), I see that @heeremk also contributed the code. I guess you need to have @heeremk also sign the CLA and another comment "I signed it!" from them.

@loosebazooka
Copy link
Member

@mmmheeren either than or you can rewrite/squash the commit history and force push a change to your fork?

@mmmheeren
Copy link
Contributor Author

I confirm you can have multiple active proxies in Maven settings without problem. The way I use them here is to drive the DefaultProxySelector to select a proxy based on the target registry protocol . So you would need proxy settings like below to be able to access both http and https registries during a jib build.

  <proxies>
    <proxy>
      <id>https</id>
      <active>true</active>
      <protocol>https</protocol>
      <host>ps-bxl-usr.cec.eu.int</host>
      <port>8012</port>
      <username>${env.M2_USERNAME}</username>
      <password>${env.M2_PASSWORD}</password>
      <nonProxyHosts>*.eu.int</nonProxyHosts>
    </proxy>
    <proxy>
      <id>http</id>
      <active>true</active>
      <protocol>http</protocol>
      <host>ps-bxl-usr.cec.eu.int</host>
      <port>8012</port>
      <username>${env.M2_USERNAME}</username>
      <password>${env.M2_PASSWORD}</password>
      <nonProxyHosts>*.eu.int</nonProxyHosts>
    </proxy>
  </proxies>

Maven itself uses these settings to select a proxy for accessing maven repositories. Their proxy selector / matching algorithm is probably similar, i.e. matching by target url protocol (and not excluded by nonProxyHosts).

In https://github.com/srcclrapache1/maven-aether/blob/master/aether-util/src/main/java/org/eclipse/aether/util/repository/DefaultProxySelector.java it seems they build a map with one proxy def per protocol.

@chanseokoh
Copy link
Member

chanseokoh commented Dec 13, 2018

The Maven settings description says:

protocol, host, port: The protocol://host:port of the proxy, separated into discrete elements.

This made me really confused, so I did some digging. And I think the doc is not correct. For example, I had

  <proxies>
    <proxy>
      <id>myhttpsproxy</id>
      <active>true</active>
      <protocol>https</protocol>
      <host>localhost</host>
      <port>443</port>
      <username>proxyuser</username>
      <password>somepassword</password>
    </proxy>
  </proxies>

where I deliberated specified the port as 443 (the default HTTPS port), which rules out the possibility that Maven may not talk to a proxy on HTTPS if it is not 443 or that Maven may auto-fall back to HTTP if an HTTPS connection attempt fails.

Then, I listen on 443 with sudo nc -lp 443, delete some project/build dependency from ~/.m2/repository, and ran ./mvnw -U -X package. Maven logs show that it attempted connecting to localhost:443 as expected:

[DEBUG] Using connector BasicRepositoryConnector with priority 0.0 for https://repo.maven.apache.org/maven2 via localhost:443 with username=proxyuser, password=***
Downloading from central: https://repo.maven.apache.org/maven2/com/google/cloud/tools/jib-maven-plugin/0.9.10/jib-maven-plugin-0.9.10.pom

And on the command line running nc, I see the clear request in HTTP:

$ sudo nc -lp 443
CONNECT repo.maven.apache.org:443 HTTP/1.1
Host: repo.maven.apache.org
User-Agent: Apache-HttpClient/4.5.5 (Java/1.8.0_161-google-v7)
Proxy-Authorization: Basic cHJveHl1c2VyOnNvbWVwYXNzd29yZA==

So, although I'm not exactly sure what's the real effect of <protocol> in the Maven settings (UPDATED: see the paragraph below), but at least the doc is incorrect to say it is for the protocol://host:port of the proxy.

And actually, <protocol> may not matter much when using a single proxy for both HTTP- and HTTPS-targeting requests. When I have a single proxy entry in the Maven settings, then even if I specify http (not https), I can see my nc proxy server gets an HTTPS-targeting request (e.g., CONNECT repo.maven.apache.org:443 HTTP/1.1). So it's enough to specify only one proxy server in the Maven settings in practice. (Although this might be Maven's desperate attempt to try an HTTPS-targeting request to a proxy server that may not be able to handle HTTPS-targeting requests.) If it is true, then it kind of implies specifying http or https doesn't matter that much, if you are specifying a single proxy. However, I do see that, for example, if I specify two different proxy servers (one for HTTP-targeting and the other for HTTPS-targeting) and if the target request is HTTPS, Maven does pick the proxy with <protocol>https even if that proxy settings comes after <protocol>http.

@chanseokoh
Copy link
Member

chanseokoh commented Dec 13, 2018

  • Some people claim that you can have multiple active proxy servers defined in the Maven settings and they do work nicely: link and link. Interestingly, some people also claim that basically there can be only one active proxy in Maven, saying only the active proxy setting encounter first will be effective and the rest will be ignored. I guess both of the arguments might be right in some sense, if for example, Maven is smart enough to work like this: tries the first active proxy, and if it doesn't work, try the next one. But that's just my guess, and it's really unclear to me which one is right.

I also did some testing, and I think Maven uses only the first active proxy setting (per <protocol>) encountered and ignores the reset. For example, I had

  <proxies>
    <proxy>
      <id>myproxy1</id>
      <active>true</active>
      <protocol>https</protocol>
      <host>localhost</host>
      <port>8086</port>
    </proxy>
    <proxy>
      <!-- Some people claimed it is important to use a different id. -->
      <id>myproxy2</id>
      <active>true</active>
      <protocol>https</protocol>
      <host>localhost</host>
      <port>8087</port>
    </proxy>
    <proxy>
      <id>myproxy3</id>
      <active>true</active>
      <protocol>http</protocol> <-- just trying "http" for possible back-up ->
      <host>localhost</host>
      <port>8088</port>
    </proxy>
  </proxies>

And to see if Maven works in a smart way to try another proxy when the first one fails, I listened on port 8087 nc -lp 8087 (for the second active proxy setting) and ran ./mvnw -U -X package as before. Turns out, it only tries port 8086. Never 8087.

Then, if I have

 <proxies>
    <proxy>
      <id>myproxy1</id>
      <active>true</active>
      <protocol>http</protocol> <!-- note this is http -->
      <host>localhost</host>
      <port>8086</port>
    </proxy>
    <proxy>
      <id>myproxy2</id>
      <active>true</active>
      <protocol>https</protocol>
      <host>localhost</host>
      <port>8087</port>
    </proxy>
  </proxies>

then even if the myproxy is the first active proxy, Maven will use the second proxy (myproxy2) for HTTPS-targeting requests. This confirms that <protocol> is for the target website protocol, and you can have only one proxy effectively active per <protocol>. For example, you can have two active proxies, one for HTTP-targeting and the other for HTTPS-targeting.

@chanseokoh
Copy link
Member

chanseokoh commented Dec 13, 2018

I confirm you can have multiple active proxies in Maven settings without problem.

@mmmheeren I think it makes sense when you define multiple active proxies with different protocols like in your current settings. However, from my testing #1337 (comment), having multiple active proxies with the same protocol doesn't seem to work. (The first proxy setting always wins.)

Anyways, to summarize my conclusion:

  1. Find the first <protocol>https in the Maven settings, and if found,
    • set the values to https.proxyHost, etc.
    • call .setCredentials() with the correct scope.
  2. Find the first <protocol>http, and if found,
    • set the values to http.proxyHost, etc.
    • call .setCredentials() with the correct scope.

1 & 2 are independent and they can happen simultaneously.

boolean httpCreds = System.getProperty("http.proxyUser") != null;
boolean httpsProxy = System.getProperty("https.proxyHost") != null;
boolean httpsCreds = System.getProperty("https.proxyUser") != null;
if (httpProxy && httpCreds) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #1337 (comment), I think the logic should be

    if (httpProxy && httpCreds) {
      httpClient.getCredentialsProvider().setCredentials(
          new AuthScope(http.proxyHost, http.proxyPort),
          new UsernamePasswordCredentials(http.proxyUser, http.proxyPassword));
    }
    if (httpsProxy && httpsCreds) {
      httpClient.getCredentialsProvider().setCredentials(
          new AuthScope(https.proxyHost, https.proxyPort),
          new UsernamePasswordCredentials(https.proxyUser, https.proxyPassword));
    }

http.proxyHost and https.proxyHost are independent, and I don't think it makes sense to set http.proxyUser for https.proxyHost. Agreed?

* @param name proxy system property name
* @param value proxy system property value
*/
private static void propagateProxyProperty(String name, String value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use @Nullable for value: ...(String name, @Nullable String value) {

* @param value proxy system property value
*/
private static void propagateProxyProperty(String name, String value) {
if (value != null && System.getProperty(name) == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proxy updating should happen all or not. For example, someone can define https.proxyHost from the command line but leave https.proxyPort undefined, in which case the default port will be chosen. It is also possible to leave https.proxyUser if the proxy doesn't require auth. So, I think the logic should basically be

if (only when none of http.x properties are defined) {
  Proxy firstActiveHttpProxy = <first active HTTP-targeting proxy from settings.xml>;
  System.setProperty(http.proxyHost, firstActiveHttpProxy.getHost());
  System.setProperty(http.proxyPort, firstActiveHttpProxy.getPort());
  ...
}
if (only when none of https.x properties are defined) {
  Proxy firstActiveHttpsProxy = <first active HTTPS-targeting proxy from settings.xml>;
  System.setProperty(https.proxyHost, firstActiveHttpsProxy.getHost());
  System.setProperty(https.proxyPort, firstActiveHttpsProxy.getPort());
  ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will refactor with approach of first protocol maven proxy wins, unless one of protocol system properties were already set.
FYI, I am thinking of moving to custom proxy selector approach where eligible maven proxies are added to proxy map inside custom RegistryProxySelector. This RegistryProxySelector can then be used by Connection / HttpTransport, to select proxy matching registry url, and register proxy credentials on transport where needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will refactor with approach of first protocol maven proxy wins, unless one of protocol system properties were already set.

Sound good.

This RegistryProxySelector can then be used by Connection / HttpTransport, to select proxy matching registry url

I honestly think you don't have to do this. I don't see you can configure per-host (or per-container-registry) proxies through settings.xml. You change the current a little bit, and I think we're good to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it would still be max 1 proxy per protocol. But I thought it could be a bit more elegant way to transfer proxy config from maven plugin into core. If you tell me not to bother I will only do the small refactor. In any case I probably will not be able to finish/test before Monday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, don't be bothered. In this case, I think we can keep things simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will commit today but test finally on Monday

Configuration from either Maven settings or system properties
@googlebot
Copy link

CLAs look good, thanks!

Copy link
Member

@chanseokoh chanseokoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is in the right direction. Thanks for fixing the issue!

@GoogleContainerTools/java-tools there are several things to fix, but I think I can take it over after merging this.

@chanseokoh
Copy link
Member

@GoogleContainerTools/java-tools filed a follow-up PR #1366. If you have concerns, add comments to #1366.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants