Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
447 lines (364 sloc) 15.9 KB
------
Guide to Advanced HTTP Wagon Configuration
------
John Casey
------
2011-12-12
------
~~ Licensed to the Apache Software Foundation (ASF) under one
~~ or more contributor license agreements. See the NOTICE file
~~ distributed with this work for additional information
~~ regarding copyright ownership. The ASF licenses this file
~~ to you under the Apache License, Version 2.0 (the
~~ "License"); you may not use this file except in compliance
~~ with the License. You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing,
~~ software distributed under the License is distributed on an
~~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~~ KIND, either express or implied. See the License for the
~~ specific language governing permissions and limitations
~~ under the License.
~~ NOTE: For help with the syntax of this file, see:
~~ http://maven.apache.org/doxia/references/apt-format.html
Advanced Configuration of the HttpClient HTTP Wagon
%{toc}
*Notice on Maven Versioning and Availability
**Maven 2.2.0
Starting in <<Maven 2.2.0>>, the HttpClient wagon was the implementation in use. The remainder of this document
deals specifically with the differences between the HttpClient- and Sun-based HTTP wagons.
**Maven 2.2.1
Due to several critical issues introduced by the HttpClient-based HTTP wagon, <<Maven 2.2.1>> reverted back to
the Sun implementation (a.k.a. 'lightweight') of the HTTP wagon as the default for HTTP/HTTPS transfers.
The issues with the HttpClient-based wagon were mainly related to checksums, transfer timeouts, and NTLM proxies,
and served as the primary cause for the release of <<2.2.1>> in the first place.
<<However>>, starting in <<Maven 2.2.1>> you have a choice: you can use the default wagon implementation for a given
protocol, or you can select an alternative wagon <<<provider>>> you'd like to use on a per-protocol basis.
For more information, see the {{{./guide-wagon-providers.html}Guide to Wagon Providers}} \[3\].
**Maven 3.0.4
With 3.0.4, the default wagon http(s) is now the HttpClient based on {{{http://hc.apache.org/httpcomponents-client-ga}Apache Http Client 4.1.2}}.
There is now a http connection pooling to prevent reopening http(s) to remote server for each requests.
This pool feature is configurable with some parameters \[4\].
This new defaut wagon comes with some default configuration:
* http(s) connection pool: default to 20.
* readTimeout: default to 1800000ms (~30 minutes) (see section <<<Read time out>>> below)
* default Preemptive Authentication only with PUT (GET doesn't use anymore default Preemptive Authentication)
*Introduction
Using the HttpClient-based HTTP wagon, you have a lot more control over the
configuration used to access HTTP-based Maven repositories. For starters, you have fine-grained control over what HTTP headers
are used when resolving artifacts. In addition, you can also configure a wide range of parameters to control the behavior
of HttpClient itself. Best of all, you have the ability to control these headers and parameters for all requests, or individual
request types (Maven issues GET, HEAD, and PUT requests for different parts of the artifact-management subsystem).
*The Basics
Without any special configuration, Maven's HTTP wagon will use some default HTTP headers and client parameters when managing
artifacts. The default headers are:
+---+
Cache-control: no-cache
Cache-store: no-store
Pragma: no-cache
Expires: 0
Accept-Encoding: gzip
+---+
In addition, PUT requests made with the HTTP wagon will use the following HttpClient parameter:
+---+
http.protocol.expect-continue=true
+---+
From the HttpClient documentation\[2\], this parameter provides the following functionality:
-----
Activates 'Expect: 100-Continue' handshake for the entity enclosing methods.
The 'Expect: 100-Continue' handshake allows a client that is sending a request
message with a request body to determine if the origin server is willing to
accept the request (based on the request headers) before the client sends the
request body.
The use of the 'Expect: 100-continue' handshake can result in noticeable performance
improvement for entity enclosing requests (such as POST and PUT) that require
the target server's authentication.
'Expect: 100-continue' handshake should be used with caution, as it may cause
problems with HTTP servers and proxies that do not support HTTP/1.1 protocol.
-----
Without this setting, PUT requests that require authentication will transfer their entire payload to the server before that server
issues an authentication challenge. In order to complete the PUT request, the client must then re-send the payload with the proper
credentials specified in the HTTP headers. This results in twice the bandwidth usage, and twice the time to transfer each artifact.
Another option to avoid this double transfer is what's known as preemptive authentication, which involves sending the authentication
headers along with the original PUT request. However, there are a few potential issues with this approach. For one thing, in the event
you have an unused <<<\<server\>>>> entry that specifies an invalid username/password combination, some servers may respond with
a <<<401 Unauthorized>>> even if the server doesn't actually require any authentication for the request. In addition, blindly sending
authentication credentials with every request regardless of whether the server has made a challenge can result in a security hole,
since the server may not make provisions to secure credentials for paths that don't require authentication.
We'll discuss preemptive authentication in another example, below.
*Configuring GET, HEAD, PUT, or All of the Above
In all of the examples below, it's important to understand that we can configure the HTTP settings for all requests made to a given
server, or for only one method. To configure all methods for a server, you'd use the following section of the <<<settings.xml>>> file:
+---+
<settings>
[...]
<servers>
<server>
<id>the-server</id>
<configuration>
<httpConfiguration>
<all>
[ Your configuration here. ]
</all>
</httpConfiguration>
</configuration>
</server>
</servers>
</settings>
+---+
On the other hand, if you can live with the default configuration for most requests - say, HEAD and GET requests, which are used to
check for the existence of a file and retrieve a file respectively - maybe you only need to configure the PUT method:
+---+
<settings>
[...]
<servers>
<server>
<id>the-server</id>
<configuration>
<httpConfiguration>
<put>
[ Your configuration here. ]
</put>
</httpConfiguration>
</configuration>
</server>
</servers>
</settings>
+---+
For clarity, the other two sections are <<<\<get\>>>> for GET requests, and <<<\<head\>>>> for HEAD requests. I know that's going to
be hard to remember...
*Taking Control of Your HTTP Headers
As you may have noticed above, the default HTTP headers do have the potential to cause problems. For instance, some websites
set the encoding for downloading GZipped files as <<<gzip>>>, in spite of the fact that the HTTP request itself isn't being
sent using GZip compression. If the client is using the <<<Accept-Encoding: gzip>>> header, this can result in the client itself
decompressing the GZipped file <during the transfer> and writing the decompressed file to the local disk with the original filename.
This can be misleading to say the least, and can use up an inordinate amount of disk space on the local computer.
To turn off this default behavior, we'll simply disable the default headers. Then, we'll need to respecify the other headers
that we are still interested in, like this:
+---+
<settings>
[...]
<servers>
<server>
<id>openssl</id>
<configuration>
<httpConfiguration>
<put>
<useDefaultHeaders>false</useDefaultHeaders>
<headers>
<header>
<name>Cache-control</name>
<value>no-cache</value>
</header>
<header>
<name>Cache-store</name>
<value>no-store</value>
</header>
<header>
<name>Pragma</name>
<value>no-cache</value>
</header>
<header>
<name>Expires</name>
<value>0</value>
</header>
<header>
<name>Accept-Encoding</name>
<value>*</value>
</header>
</headers>
</put>
</httpConfiguration>
</configuration>
</server>
[...]
</servers>
[...]
</settings>
+---+
*Fine-Tuning HttpClient Parameters
Going beyond the power of HTTP request parameters, HttpClient provides a host of other configuration options. In most cases,
you won't need to customize these. But in case you do, Maven provides access to specify your own fine-grained configuration
for HttpClient. Again, you can specify these parameter customizations per-method (HEAD, GET, or PUT), or for all methods of
interacting with a given server. For a complete list of supported parameters, see the link\[2\] in Resources section below.
**Non-String Parameter Values
Many of the configuration parameters for HttpClient have simple string values; however, there are important exceptions to
this. In some cases, you may need to specify boolean, integer, or long values. In others, you may even need to specify
a collection of string values. You can specify these using a simple formatting syntax, as follows:
[[1]] <<booleans:>> <<<%b,\<value\>>>>
[[2]] <<integer:>> <<<%i,\<value\>>>>
[[3]] <<long:>> <<<%l,\<value\>>>> (yes, that's an 'L', not a '1')
[[4]] <<double:>> <<<%d,\<value\>>>>
[[5]] <<collection of strings:>> <<<%c,\<value1\>,\<value2\>,\<value3\>,...>>>, which could also be specified as:
+---+
%c,
<value1>,
<value2>,
<value3>,
...
+---+
[]
As you may have noticed, this syntax is similar to the format-and-data strategy used by functions like <<<sprintf()>>>
in many languages. The syntax has been chosen with this similarity in mind, to make it a little more intuitive to use.
**Example: Using Preemptive Authentication
Using the above syntax, we can configure preemptive authentication for PUT requests using the boolean HttpClient parameter
<<<http.authentication.preemptive>>>, like this:
+---+
<settings>
<servers>
<server>
<id>my-server</id>
<configuration>
<httpConfiguration>
<put>
<params>
<property>
<name>http.authentication.preemptive</name>
<value>%b,true</value>
</property>
</params>
</put>
</httpConfiguration>
</configuration>
</server>
</servers>
</settings>
+---+
Another option is to make write it like this:
+---+
<settings>
<servers>
<server>
<id>my-server</id>
<configuration>
<httpConfiguration>
<put>
<usePreemptive>true</usePreemptive>
</put>
</httpConfiguration>
</configuration>
</server>
</servers>
</settings>
+---+
**Ignoring Cookies
Like the example above, telling the HttpClient to ignore cookies for all methods of request is a simple matter of
configuring the <<<http.protocol.cookie-policy>>> parameter (it uses a regular string value, so no special syntax
is required):
+---+
<settings>
<servers>
<server>
<id>my-server</id>
<configuration>
<httpConfiguration>
<all>
<params>
<param>
<name>http.protocol.cookie-policy</name>
<value>ignore</value>
</param>
</params>
</all>
</httpConfiguration>
</configuration>
</server>
</servers>
</settings>
+---+
The configuration above can be useful in cases where the repository is using cookies - like the session cookies
that are often mistakenly turned on or left on in appservers - alongside HTTP redirection. In these cases, it
becomes far more likely that the cookie issued by the appserver will use a <<<Path>>> that is inconsistent with
the one used by the client to access the server. If you have this problem, and know that you don't need to use
this session cookie, you can ignore cookies from this server with the above configuration.
*Support for General-Wagon Configuration Standards
It should be noted that configuration options previously available in the HttpClient-driven HTTP wagon are still
supported in addition to this new, fine-grained approach. These include the configuration of HTTP headers
and connection timeouts. Let's examine each of these briefly:
**HTTP Headers
In all HTTP Wagon implementations, you can add your own HTTP headers like this:
+---+
<settings>
<servers>
<server>
<id>my-server</id>
<configuration>
<httpHeaders>
<httpHeader>
<name>Foo</name>
<value>Bar</value>
</httpHeader>
</httpHeaders>
</configuration>
</server>
</servers>
</settings>
+---+
It's important to understand that the above approach doesn't allow you to turn off all of the default HTTP headers; nor
does it allow you to specify headers on a per-method basis. However, this configuration remains available in both the
lightweight and httpclient-based Wagon implementations.
**Connection Timeouts
All wagon implementations that extend the <<<AbstractWagon>>> class, including those for SCP, HTTP, FTP, and more,
allow the configuration of a connection timeout, to allow the user to tell Maven how long to wait before giving
up on a connection that has not responded. This option is preserved in the HttpClient-based wagon, but this wagon
also provides a fine-grained alternative configuration that can allow you to specify timeouts per-method for a
given server. The old configuration option - which is still supported - looks like this:
+---+
<settings>
<servers>
<server>
<id>my-server</id>
<configuration>
<timeout>6000</timeout> <!-- milliseconds -->
</configuration>
</server>
</servers>
</settings>
+---+
...while the new configuration option looks more like this:
+---+
<settings>
<servers>
<server>
<id>my-server</id>
<configuration>
<httpConfiguration>
<put>
<connectionTimeout>10000</connectionTimeout> <!-- milliseconds -->
</put>
</httpConfiguration>
</configuration>
</server>
</servers>
</settings>
+---+
If all you need is a per-server timeout configuration, you still have the option to use the old <<<\<timeout\>>>>
parameter. If you need to separate timeout preferences according to HTTP method, you can use one more like that
specified directly above.
** Read time out
With Wagon 2.0 and Apache Maven 3.0.4, a default timeout of 30 minutes comes by default.
If you want to change this value, you can add the following setup in your settings:
+---+
<settings>
<servers>
<server>
<id>my-server</id>
<configuration>
<httpConfiguration>
<put>
<readTimeout>120000</readTimeout> <!-- milliseconds -->
</put>
</httpConfiguration>
</configuration>
</server>
</servers>
</settings>
+---+
*Resources
[[1]] {{{http://hc.apache.org/httpclient-3.x/}HttpClient website}}
[[2]] {{{http://hc.apache.org/httpclient-3.x/preference-api.html}HttpClient preference architecture and configuration guide}}
[[3]] {{{./guide-wagon-providers.html}Guide to Wagon Providers}}
[[4]] {{{/wagon/wagon-providers/wagon-http/}Wagon Http}}
[]