-
Notifications
You must be signed in to change notification settings - Fork 43
HTTP
cheyiliu edited this page Feb 17, 2016
·
7 revisions
Most HTTP communication is initiated by a user agent and consists of
a request to be applied to a resource on some origin server. In the
simplest case, this may be accomplished via a single connection (v)
between the user agent (UA) and the origin server (O).
request chain ------------------------>
UA -------------------v------------------- O
<----------------------- response chain
A more complicated situation occurs when one or more intermediaries
are present in the request/response chain. There are three common
forms of intermediary: proxy, gateway, and tunnel. A proxy is a
forwarding agent, receiving requests for a URI in its absolute form,
rewriting all or part of the message, and forwarding the reformatted
request toward the server identified by the URI. A gateway is a
receiving agent, acting as a layer above some other server(s) and, if
necessary, translating the requests to the underlying server's
protocol. A tunnel acts as a relay point between two connections
without changing the messages; tunnels are used when the
communication needs to pass through an intermediary (such as a
firewall) even when the intermediary cannot understand the contents
of the messages.
request chain -------------------------------------->
UA -----v----- A -----v----- B -----v----- C -----v----- O
<------------------------------------- response chain
The figure above shows three intermediaries (A, B, and C) between the
user agent and origin server. A request or response message that
travels the whole chain will pass through four separate connections.
This distinction is important because some HTTP communication options
may apply only to the connection with the nearest, non-tunnel
neighbor, only to the end-points of the chain, or to all connections
along the chain. Although the diagram is linear, each participant may
be engaged in multiple, simultaneous communications. For example, B
may be receiving requests from many clients other than A, and/or
forwarding requests to servers other than C, at the same time that it
is handling A's request.
Any party to the communication which is not acting as a tunnel may
employ an internal cache for handling requests. The effect of a cache
is that the request/response chain is shortened if one of the
participants along the chain has a cached response applicable to that
request. The following illustrates the resulting chain if B has a
cached copy of an earlier response from O (via C) for a request which
has not been cached by UA or A.
request chain ---------->
UA -----v----- A -----v----- B - - - - - - C - - - - - - O
<--------- response chain
Not all responses are usefully cacheable, and some requests may
contain modifiers which place special requirements on cache behavior.
HTTP requirements for cache behavior and cacheable responses are
defined in section 13.
HTTP communication usually takes place over TCP/IP connections. The
default port is TCP 80 [19], but other ports can be used. This does
not preclude HTTP from being implemented on top of any other protocol
on the Internet, or on other networks. HTTP only presumes a reliable
transport; any protocol that provides such guarantees can be used;
the mapping of the HTTP/1.1 request and response structures onto the
transport data units of the protocol in question is outside the scope
of this specification.
In HTTP/1.0, most implementations used a new connection for each
request/response exchange. In HTTP/1.1, a connection may be used for
one or more request/response exchanges, although connections may be
closed for a variety of reasons (see section 8.1).
- 略,基本语法规则
The HTTP version of an application is the highest HTTP version for
which the application is at least conditionally compliant.
Proxy and gateway applications need to be careful when forwarding
messages in protocol versions different from that of the application.
Since the protocol version indicates the protocol capability of the
sender, a proxy/gateway MUST NOT send a message with a version
indicator which is greater than its actual version. If a higher
version request is received, the proxy/gateway MUST either downgrade
the request version, or respond with an error, or switch to tunnel
behavior.
The HTTP protocol does not place any a priori limit on the length of
a URI. Servers MUST be able to handle the URI of any resource they
serve, and SHOULD be able to handle URIs of unbounded length if they
provide GET-based forms that could generate such URIs. A server
SHOULD return 414 (Request-URI Too Long) status if a URI is longer
than the server can handle (see section 10.4.15).
Note: Servers ought to be cautious about depending on URI lengths
above 255 bytes, because some older client or proxy
implementations might not properly support these lengths.
The "http" scheme is used to locate network resources via the HTTP
protocol. This section defines the scheme-specific syntax and
semantics for http URLs.
http_URL = "http:" "//" host [ ":" port ] [ abs_path [ "?" query ]]
If the port is empty or not given, port 80 is assumed. The semantics
are that the identified resource is located at the server listening
for TCP connections on that port of that host, and the Request-URI
for the resource is abs_path (section 5.1.2). The use of IP addresses
in URLs SHOULD be avoided whenever possible (see RFC 1900 [24]). If
the abs_path is not present in the URL, it MUST be given as "/" when
used as a Request-URI for a resource (section 5.1.2). If a proxy
receives a host name which is not a fully qualified domain name, it
MAY add its domain to the host name it received. If a proxy receives
a fully qualified domain name, the proxy MUST NOT change the host
name.
When comparing two URIs to decide if they match or not, a client
SHOULD use a case-sensitive octet-by-octet comparison of the entire
URIs, with these exceptions:
- A port that is empty or not given is equivalent to the default
port for that URI-reference;
- Comparisons of host names MUST be case-insensitive;
- Comparisons of scheme names MUST be case-insensitive;
- An empty abs_path is equivalent to an abs_path of "/".
Characters other than those in the "reserved" and "unsafe" sets (see
RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.
For example, the following three URIs are equivalent:
http://abc.com:80/~smith/home.html
http://ABC.com/%7Esmith/home.html
http://ABC.com:/%7esmith/home.html
HTTP messages consist of requests from client to server and responses
from server to client.
HTTP-message = Request | Response ; HTTP/1.1 messages
Request (section 5) and Response (section 6) messages use the generic
message format of RFC 822 [9] for transferring entities (the payload
of the message). Both types of message consist of a start-line, zero
or more header fields (also known as "headers"), an empty line (i.e.,
a line with nothing preceding the CRLF) indicating the end of the
header fields, and possibly a message-body.
generic-message = start-line
*(message-header CRLF)
CRLF
[ message-body ]
start-line = Request-Line | Status-Line
A request message from a client to a server includes, within the
first line of that message, the method to be applied to the resource,
the identifier of the resource, and the protocol version in use.
Request = Request-Line ; Section 5.1
*(( general-header ; Section 4.5
| request-header ; Section 5.3
| entity-header ) CRLF) ; Section 7.1
CRLF
[ message-body ] ; Section 4.3
The Request-Line begins with a method token, followed by the
Request-URI and the protocol version, and ending with CRLF. The
elements are separated by SP characters. No CR or LF is allowed
except in the final CRLF sequence.
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
The Method token indicates the method to be performed on the
resource identified by the Request-URI. The method is case-sensitive.
Method = "OPTIONS" ; Section 9.2
| "GET" ; Section 9.3
| "HEAD" ; Section 9.4
| "POST" ; Section 9.5
| "PUT" ; Section 9.6
| "DELETE" ; Section 9.7
| "TRACE" ; Section 9.8
| "CONNECT" ; Section 9.9
| extension-method
extension-method = token
The list of methods allowed by a resource can be specified in an
Allow header field (section 14.7). The return code of the response
always notifies the client whether a method is currently allowed on a
resource, since the set of allowed methods can change dynamically. An
origin server SHOULD return the status code 405 (Method Not Allowed)
if the method is known by the origin server but not allowed for the
requested resource, and 501 (Not Implemented) if the method is
unrecognized or not implemented by the origin server. The methods GET
and HEAD MUST be supported by all general-purpose servers. All other
methods are OPTIONAL; however, if the above methods are implemented,
they MUST be implemented with the same semantics as those specified
in section 9.
After receiving and interpreting a request message, a server responds
with an HTTP response message.
Response = Status-Line ; Section 6.1
*(( general-header ; Section 4.5
| response-header ; Section 6.2
| entity-header ) CRLF) ; Section 7.1
CRLF
[ message-body ] ; Section 7.2
The first line of a Response message is the Status-Line, consisting
of the protocol version followed by a numeric status code and its
associated textual phrase, with each element separated by SP
characters. No CR or LF is allowed except in the final CRLF sequence.
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
The Status-Code element is a 3-digit integer result code of the
attempt to understand and satisfy the request. These codes are fully
defined in section 10. The Reason-Phrase is intended to give a short
textual description of the Status-Code. The Status-Code is intended
for use by automata and the Reason-Phrase is intended for the human
user. The client is not required to examine or display the Reason-
Phrase.
The first digit of the Status-Code defines the class of response. The
last two digits do not have any categorization role. There are 5
values for the first digit:
- 1xx: Informational - Request received, continuing process
- 2xx: Success - The action was successfully received,
understood, and accepted
- 3xx: Redirection - Further action must be taken in order to
complete the request
- 4xx: Client Error - The request contains bad syntax or cannot
be fulfilled
- 5xx: Server Error - The server failed to fulfill an apparently
valid request
The individual values of the numeric status codes defined for
HTTP/1.1, and an example set of corresponding Reason-Phrase's, are
presented below. The reason phrases listed here are only
recommendations -- they MAY be replaced by local equivalents without
affecting the protocol.
Status-Code =
"100" ; Section 10.1.1: Continue
| "101" ; Section 10.1.2: Switching Protocols
| "200" ; Section 10.2.1: OK
| "201" ; Section 10.2.2: Created
| "202" ; Section 10.2.3: Accepted
| "203" ; Section 10.2.4: Non-Authoritative Information
| "204" ; Section 10.2.5: No Content
| "205" ; Section 10.2.6: Reset Content
| "206" ; Section 10.2.7: Partial Content
| "300" ; Section 10.3.1: Multiple Choices
| "301" ; Section 10.3.2: Moved Permanently
| "302" ; Section 10.3.3: Found
| "303" ; Section 10.3.4: See Other
| "304" ; Section 10.3.5: Not Modified
| "305" ; Section 10.3.6: Use Proxy
| "307" ; Section 10.3.8: Temporary Redirect
| "400" ; Section 10.4.1: Bad Request
| "401" ; Section 10.4.2: Unauthorized
| "402" ; Section 10.4.3: Payment Required
| "403" ; Section 10.4.4: Forbidden
| "404" ; Section 10.4.5: Not Found
| "405" ; Section 10.4.6: Method Not Allowed
| "406" ; Section 10.4.7: Not Acceptable
| "407" ; Section 10.4.8: Proxy Authentication Required
| "408" ; Section 10.4.9: Request Time-out
| "409" ; Section 10.4.10: Conflict
| "410" ; Section 10.4.11: Gone
| "411" ; Section 10.4.12: Length Required
| "412" ; Section 10.4.13: Precondition Failed
| "413" ; Section 10.4.14: Request Entity Too Large
| "414" ; Section 10.4.15: Request-URI Too Large
| "415" ; Section 10.4.16: Unsupported Media Type
| "416" ; Section 10.4.17: Requested range not satisfiable
| "417" ; Section 10.4.18: Expectation Failed
| "500" ; Section 10.5.1: Internal Server Error
| "501" ; Section 10.5.2: Not Implemented
| "502" ; Section 10.5.3: Bad Gateway
| "503" ; Section 10.5.4: Service Unavailable
| "504" ; Section 10.5.5: Gateway Time-out
| "505" ; Section 10.5.6: HTTP Version not supported
| extension-code
extension-code = 3DIGIT
Reason-Phrase = *<TEXT, excluding CR, LF>
HTTP status codes are extensible. HTTP applications are not required
to understand the meaning of all registered status codes, though such
understanding is obviously desirable. However, applications MUST
understand the class of any status code, as indicated by the first
digit, and treat any unrecognized response as being equivalent to the
x00 status code of that class, with the exception that an
unrecognized response MUST NOT be cached. For example, if an
unrecognized status code of 431 is received by the client, it can
safely assume that there was something wrong with its request and
treat the response as if it had received a 400 status code. In such
cases, user agents SHOULD present to the user the entity returned
with the response, since that entity is likely to include human-
readable information which will explain the unusual status.
Request and Response messages MAY transfer an entity if not otherwise
restricted by the request method or response status code. An entity
consists of entity-header fields and an entity-body, although some
responses will only include the entity-headers.
In this section, both sender and recipient refer to either the client
or the server, depending on who sends and who receives the entity.
Entity-header fields define metainformation about the entity-body or,
if no body is present, about the resource identified by the request.
Some of this metainformation is OPTIONAL; some might be REQUIRED by
portions of this specification.
entity-header = Allow ; Section 14.7
| Content-Encoding ; Section 14.11
| Content-Language ; Section 14.12
| Content-Length ; Section 14.13
| Content-Location ; Section 14.14
| Content-MD5 ; Section 14.15
| Content-Range ; Section 14.16
| Content-Type ; Section 14.17
| Expires ; Section 14.21
| Last-Modified ; Section 14.29
| extension-header
extension-header = message-header
The extension-header mechanism allows additional entity-header fields
to be defined without changing the protocol, but these fields cannot
be assumed to be recognizable by the recipient. Unrecognized header
fields SHOULD be ignored by the recipient and MUST be forwarded by
transparent proxies.
The entity-body (if any) sent with an HTTP request or response is in
a format and encoding defined by the entity-header fields.
entity-body = *OCTET
An entity-body is only present in a message when a message-body is
present, as described in section 4.3. The entity-body is obtained
from the message-body by decoding any Transfer-Encoding that might
have been applied to ensure safe and proper transfer of the message.
8.1 Persistent Connections
8.1.1 Purpose
Prior to persistent connections, a separate TCP connection was
established to fetch each URL, increasing the load on HTTP servers
and causing congestion on the Internet. The use of inline images and
other associated data often require a client to make multiple
requests of the same server in a short amount of time. Analysis of
these performance problems and results from a prototype
implementation are available [26] [30]. Implementation experience and
measurements of actual HTTP/1.1 (RFC 2068) implementations show good
results [39]. Alternatives have also been explored, for example,
T/TCP [27].
Persistent HTTP connections have a number of advantages:
- By opening and closing fewer TCP connections, CPU time is saved
in routers and hosts (clients, servers, proxies, gateways,
tunnels, or caches), and memory used for TCP protocol control
blocks can be saved in hosts.
- HTTP requests and responses can be pipelined on a connection.
Pipelining allows a client to make multiple requests without
waiting for each response, allowing a single TCP connection to
be used much more efficiently, with much lower elapsed time.
- Network congestion is reduced by reducing the number of packets
caused by TCP opens, and by allowing TCP sufficient time to
determine the congestion state of the network.
- Latency on subsequent requests is reduced since there is no time
spent in TCP's connection opening handshake.
- HTTP can evolve more gracefully, since errors can be reported
without the penalty of closing the TCP connection. Clients using
future versions of HTTP might optimistically try a new feature,
but if communicating with an older server, retry with old
semantics after an error is reported.
HTTP implementations SHOULD implement persistent connections.
A significant difference between HTTP/1.1 and earlier versions of
HTTP is that persistent connections are the default behavior of any
HTTP connection. That is, unless otherwise indicated, the client
SHOULD assume that the server will maintain a persistent connection,
even after error responses from the server.
Persistent connections provide a mechanism by which a client and a
server can signal the close of a TCP connection. This signaling takes
place using the Connection header field (section 14.10). Once a close
has been signaled, the client MUST NOT send any more requests on that
connection.
A client that supports persistent connections MAY "pipeline" its
requests (i.e., send multiple requests without waiting for each
response). A server MUST send its responses to those requests in the
same order that the requests were received.
Clients which assume persistent connections and pipeline immediately
after connection establishment SHOULD be prepared to retry their
connection if the first pipelined attempt fails. If a client does
such a retry, it MUST NOT pipeline before it knows the connection is
persistent. Clients MUST also be prepared to resend their requests if
the server closes the connection before sending all of the
corresponding responses.
Clients SHOULD NOT pipeline requests using non-idempotent methods or
non-idempotent sequences of methods (see section 9.1.2). Otherwise, a
premature termination of the transport connection could lead to
indeterminate results. A client wishing to send a non-idempotent
request SHOULD wait to send that request until it has received the
response status for the previous request.
NOTE, idempotent(幂等) http://baike.baidu.com/view/2067025.htm
Just build something.