tex/chapter06.tex

\chapter{HTTP header manipulation}
\label{chap:header_manipulation}

In HTTP mode, it is possible to rewrite, add or delete some of the request and
response headers based on regular expressions. It is also possible to block a
request or a response if a particular header matches a regular expression,
which is enough to stop most elementary protocol attacks, and to protect
against information leak from the internal network. But there is a limitation
to this : since HAProxy's HTTP engine does not support keep-alive, only headers
passed during the first request of a TCP session will be seen. All subsequent
headers will be considered data only and not analyzed. Furthermore, HAProxy
never touches data contents, it stops analysis at the end of headers.

There is an exception though. If HAProxy encounters an "Informational Response"
(status code 1xx), it is able to process all rsp* rules which can allow, deny,
rewrite or delete a header, but it will refuse to add a header to any such
messages as this is not HTTP-compliant. The reason for still processing headers
in such responses is to stop and/or fix any possible information leak which may
happen, for instance because another downstream equipment would unconditionally
add a header, or if a server name appears there. When such messages are seen,
normal processing still occurs on the next non-informational messages.

This section covers common usage of the following keywords, described in detail
in section~\ref{sec:keywords_reference} :
\begin{itemize}
\item[-] reqadd     <string>
\item[-] reqallow   <search>
\item[-] reqiallow  <search>
\item[-] reqdel     <search>
\item[-] reqidel    <search>
\item[-] reqdeny    <search>
\item[-] reqideny   <search>
\item[-] reqpass    <search>
\item[-] reqipass   <search>
\item[-] reqrep     <search> <replace>
\item[-] reqirep    <search> <replace>
\item[-] reqtarpit  <search>
\item[-] reqitarpit <search>
\item[-] rspadd     <string>
\item[-] rspdel     <search>
\item[-] rspidel    <search>
\item[-] rspdeny    <search>
\item[-] rspideny   <search>
\item[-] rsprep     <search> <replace>
\item[-] rspirep    <search> <replace>
\end{itemize}

With all these keywords, the same conventions are used. The <search> parameter
is a POSIX extended regular expression (regex) which supports grouping through
parenthesis (without the backslash). Spaces and other delimiters must be
prefixed with a backslash ('\verb|\|') to avoid confusion with a field delimiter.
Other characters may be prefixed with a backslash to change their meaning:

\begin{tabular}{|l|l|}
\hline
\verb|\t|   & for a tab \\
\hline
\verb|\r|   & for a carriage return (CR) \\
\hline
\verb|\n|   & for a new line (LF) \\
\hline
\verb|\|    & to mark a space and differentiate it from a delimiter \\
\hline
\verb|\#|   & to mark a sharp and differentiate it from a comment \\
\hline
\verb|\\|   & to use a backslash in a regex \\
\hline
\verb|\\\\| & to use a backslash in the text (*2 for regex, *2 for haproxy) \\
\hline
\verb|\xXX| & to write the ASCII hex code XX as in the C language \\
\hline
\end{tabular}

The <replace> parameter contains the string to be used to replace the largest
portion of text matching the regex. It can make use of the special characters
above, and can reference a substring which is delimited by parenthesis in the
regex, by writing a backslash \chr{\bslash} immediately followed by one digit from 0 to
9 indicating the group position (0 designating the entire line). This practice
is very common to users of the "sed" program.

The <string> parameter represents the string which will systematically be added
after the last header line. It can also use special character sequences above.

Notes related to these keywords:
\begin{itemize}
\item[-] these keywords are not always convenient to allow/deny based on header
    contents. It is strongly recommended to use ACLs with the "block" keyword
    instead, resulting in far more flexible and manageable rules.

\item[-] lines are always considered as a whole. It is not possible to reference
    a header name only or a value only. This is important because of the way
    headers are written (notably the number of spaces after the colon).

\item[-] the first line is always considered as a header, which makes it possible to
    rewrite or filter HTTP requests URIs or response codes, but in turn makes
    it harder to distinguish between headers and request line. The regex prefix
    \verb|^[^\ \t]*[\ \t]| matches any HTTP method followed by a space, and the prefix
    \verb|^[^ \t:]*:| matches any header name followed by a colon.

\item[-] for performances reasons, the number of characters added to a request or to
    a response is limited at build time to values between 1 and 4 kB. This
    should normally be far more than enough for most usages. If it is too short
    on occasional usages, it is possible to gain some space by removing some
    useless headers before adding new ones.

\item[-] keywords beginning with "reqi" and "rspi" are the same as their counterpart
    without the 'i' letter except that they ignore case when matching patterns.

\item[-] when a request passes through a frontend then a backend, all req* rules
    from the frontend will be evaluated, then all req* rules from the backend
    will be evaluated. The reverse path is applied to responses.

\item[-] req* statements are applied after "block" statements, so that "block" is
    always the first one, but before "use\_backend" in order to permit rewriting
    before switching.
\end{itemize}