Skip to content
This repository has been archived by the owner on Dec 17, 2021. It is now read-only.

What data should go in headers, and what data in the response body #45

Closed
adelevie opened this issue Aug 11, 2014 · 14 comments
Closed

What data should go in headers, and what data in the response body #45

adelevie opened this issue Aug 11, 2014 · 14 comments

Comments

@adelevie
Copy link
Contributor

From an HN comment:

First off: Because this functionality is in HTTP spec itself, I believe it should be leveraged when it makes sense. Otherwise, why even bother with HTTP and response codes?

Secondly, because you don't need to parse a JSON response to make decisions based on the response, both clients and servers can potentially be simpler and faster. If you got a 206 response and 1000 items back, but you made the request without an accept range header, not knowing if you were going to receive 10 items or 10000 items, you don't need to parse the JSON to find this out.

This is opposed to just retrieving a 200 response, parsing and processing 1000 items, getting to the end to find a "next" property in your JSON you weren't expecting, and then firing off another request, when you could have already queued a second request if you had read the header first before processing your response. (RFC2616 Section 3.12 gives you some leverage in specifying range units, so its easy to define the range in items instead of bytes.)

However, I don't think that creating new headers is a good idea. My recommendation is simple: If it's in the HTTP spec and you are can use it according to how it's defined (like my items range example), then you should use it. If it's not in the spec, just go ahead and put it in the URI/response. The exceptions to that is maybe custom authentication mechanisms (i.e. HMAC or cookie based authentication), or anything which may be performance sensitive. For example, I have endpoints that are used in browser and by batch machines. It's easy to just use cookies for the browser and HMAC for batch machines.

This seems like it's in the grey area between standard and recommendation, but what does everyone think? cc @konklone @GUI

@konklone
Copy link
Contributor

I've never used or seen used the range acceptance headers for pagination -- and would that cover all the metadata you may need? (total count, how many per page, current offset) I get that there's a benefit to being able to get metadata about the response without actually parsing the response body. It just doesn't seem like a huge benefit.

@adelevie
Copy link
Contributor Author

Is there room for some language merely flagging this for the API producer to consider?

@konklone
Copy link
Contributor

I'm not personally comfortable recommending something none of us here at 18F have ever used ourselves. If that's not true, and an 18Fer has used it and thinks it's a good idea to include, then let's find a way to at least reference it.

@adelevie
Copy link
Contributor Author

There are two pieces here:

  • using range values in the header
  • returning an HTTP status code separate from the response body

Don't a lot of us already do the second?

Currently, we address basic numbering for errors, but don't explicitly discuss where to put the status code. It's implied by the code sample to put it in the response body, but why not say that without a compelling reason not to, there needs to be an HTTP status code in the response headers (per http://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html)?

@konklone
Copy link
Contributor

The suggestion here is to use HTTP status code 206 for Partial Content, and I don't think that that code was meant for paginated resultsets.

There's no way to return an HTTP response without a status code. It doesn't make sense to put it "in the response headers" -- it's a core part of an HTTP response. We could potentially add language suggesting the use of response codes commonly used in APIs, like 201 Created for when content is created (though our standards don't really address write APIs), or 400 for generic invalid syntax by the client. But I wouldn't include 206 among them, myself.

@adelevie
Copy link
Contributor Author

Totally agree re abstaining from any recommendation or language on pagination and codes.

Can we just make clear that if you're sending a status code in the JSON body, it should match the code in the response header? E.g. no 200s in the header when a 500 is in the JSON body? Or is this too obvious?

konklone added a commit that referenced this issue Aug 11, 2014
This can be handled in the HTTP status code alone. Having a status
code in the JSON body was really only needed over JSONP, where only
200 HTTP status codes would allow clients to see the response. Since
we're not using JSONP, this isn't necessary anymore.

Related to #45.
konklone added a commit that referenced this issue Aug 11, 2014
This can be handled in the HTTP status code alone. Having a status
code in the JSON body was really only needed over JSONP, where only
200 HTTP status codes would allow clients to see the response. Since
we're not using JSONP, this isn't necessary anymore.

Related to #45.
@konklone
Copy link
Contributor

Yeah, okay, this inspired me to update the error message body in 41ab481 to remove the status code from the JSON body altogether. That makes the subsequent references to status codes unambiguously about HTTP and not the JSON body.

The only reason to have separate JSON body status codes from HTTP status codes is if you are supporting JSONP. All JSONP responses, including errors, need to use 2XX response codes, or else browsers silently drop the callback and the request never completes. So, it's common practice for JSONP error messages to use a 200 response code for errors. We're not using JSONP, so this isn't necessary for us.

@adelevie
Copy link
Contributor Author

👍

Should we link to the HTTP spec/list of status codes in the section where we mention them. E.g.:

HTTP responses with error details should use a 4XX status code to indicate a client-side failure (such as invalid authorization, or an invalid parameter), and a 5XX status code to indicate server-side failure (such as an uncaught exception). Please refer to the HTTP Specification for a comprehensive explanation of these status codes.

@brianv0
Copy link

brianv0 commented Aug 11, 2014

Hi, I wrote the original comment on HN.

First off, I'm not entirely sure of the scope of these API recommendations. My experience comes from creating APIs to support dataset discovery and large scale batch data processing (i.e. supporting thousands of parallel jobs). So that's where I'm coming from, and my users aren't the average user. Nonetheless, if an API might be used for large scale data processing, I do have a lot of experience with that.

First off: Headers are typically meant to be metadata about the request and/or underlying resource (endpoint) itself. I don't believe everything should be packed into the headers by any means, but I believe a lot of value can be added when adding headers that are actionable in nature to a response (i.e. caching).

For the Range header/partial comment thing, I think it's probably something to think about, but I wouldn't take it to be strict at all. My experience has been providing APIs specifically for the return of (physics) datasets from a system akin to a virtual file system, where the quantity of datasets can easily surpass 10k items and gzipped responses can easily still be over several megabytes in size. Furthermore, due to the nature of metadata which is attached to my datasets, object size isn't entirely predictable. One of those 10k objects might be 100 bytes long, one might be 1kB.

As such, it has been my experience has been that chrome can occasionally choke on processing JSON that large, and naive JSON parsers aren't always up to the task either. In these cases, it is useful in understanding the nature of the response before processing. For example, another thread could fire up an additional request before it starts processing, easily saving a half second or more in latency. So maybe it makes sense when returning 10k objects, but likely it doesn't make sense when returning 10 objects (unless they are really, really big objects).

For the HTTP status code, I do think that using the standard HTTP status codes provides lots of benefits. However, it is clear some of these status codes were written for a different age, and as such, they can occasionally be awkward. A good example is the 401-Unauthorized which mandates that a WWW-Authenticate header be returned, which I'm not sure makes sense for many of the authentication schemes available today. However, I'm not a huge fan of denormalized data, and I believe the redundancy of an HTTP error code in the response body can lead to confusion for both implementers and users, unless it's explicitly specified that the status code returned is an application error code. The difference is subtle, but an application error code may or may not be different than a transport error code. In the case of a RESTful API, the application and transport error code tend to be one and the same. In the case an RPC-like API, or any web application is more of an abstraction/proxy/gateway to a current application or system, the application and transport error codes might tend to differ.

This is why you can easily end up with responses where someone returns a 200-OK semantically meaning "the web application is alive and responding correctly to HTTP requests" with responses like:

  • The request to the application was routed and successful
    {"results":"[long list of results]"}
  • The request to the application routed and unable to be fulfilled
    {"status":404, "message":"Unable to find object user requested", "exception":"..."}
  • The request to the application was unable to be routed/application not responding
    {"status":500, "message":"Unable to connect to database", "exception":"..."}

There is some value in this, however. An example being when web applications are behind proxies/load balancers which prefer to return a 4xx or 5xx when the proxies themselves can't find the web application or have issues. This can be semantically confusing to a user if the user only checks the status code, but it could easily be worked around if, for example, the web application typically responds with a header about the application itself which hopefully isn't sanitized by the proxy server (a Server header will often be rewritten with a proxy's Server header, for example).

For example:
x-api-id: Service-API/1.0

...Which is a good use case, I believe, for adding metadata about the response in a header.

I'd concede many of these recommendations don't always make sense for many APIs, especially low volume, small response, and relatively static APIs.

Hopefully this discussion is useful for you guys :)

@adelevie
Copy link
Contributor Author

I gave this a first read, and will re-read a few more times, parse, and respond. Until then, I just want to thank you for taking the time share your knowledge and API domain experience with us, the the general public; on HN, and right here.

@brianv0
Copy link

brianv0 commented Aug 11, 2014

Yeah it was a little long, sorry about that.

Maybe a good recommendation on what should go in a header is this:

Whenever data can be immediately actionable by a client, server, or a proxy[1] between them, it may be useful to add this data to a header.

Examples include:

  • Caching: Include headers that support caching directives and data freshness
  • Authentication: Use Authorization header, or a user-defined header if using a custom authentication scheme.
  • Request preconditions: Useful when modifying (i.e. POST or PUT) an endpoint/resource; Requests which might lead to 409, 412, 417 typically use preconditions
  • Server, Application, or other Response Metadata: Typically user-defined; useful for application name and versioning, for example.
  • Large Responses: Range headers may be useful to include, but generally not advised for most cases.

[1] A proxy is anything which may read and act based on data in a request or response, such as a load balancer, gateway, or filter.

@GUI
Copy link
Member

GUI commented Aug 12, 2014

Yeah, okay, this inspired me to update the error message body in 41ab481 to remove the status code from the JSON body altogether. That makes the subsequent references to status codes unambiguously about HTTP and not the JSON body.

👍 to this change. I've definitely encountered my fair share of API errors being returned with a 200 response status, so I think this nicely clears up the guidance (since it did seem a bit ambiguous what the intention of the "status" field was before).

Should we link to the HTTP spec/list of status codes in the section where we mention them. E.g.:

@adelevie I also like your suggestion of linking off to details about the various HTTP error codes to pick from (since I think developers can often be drawn towards just using something generic like 500 all the time, even if there might be more appropriate status codes to use). However, what about linking off to the wikipedia page on the topic instead? The RFC 2616 page does go into nice detail, but the wikipedia page aggregates together some of the other common status codes, like the additions from RFC 6585 (plus, the all important 418 I'm a teapot additions from RFC 2324 ;). I'm not sure how important these additions are to link to, but if we did want a single aggregated list, the wikipedia page might be the easiest place to link to.

And thank you @brianv0 for the detailed feedback! There's certainly a fair bit to chew over here. Regarding the specific topic of using Range headers and 206 responses for pagination, that's pretty interesting. I had no idea you could specify arbitrary range types other than bytes. I can see some potential use-cases, but it also doesn't seem like most API users are used to performing queries this way. And from the server-side, I'd also be worried about how this would interact with HTTP caching layers without more research (I didn't think most HTTP caching server supported caching range requests, but I could be wrong). As you said, this approach for pagination is probably not advisable for most cases, so I'd be somewhat hesitant to include it in these type of general standards. But it's still an interesting approach, so thanks for bringing it up (and everything else).

@jpyuda
Copy link
Member

jpyuda commented Aug 15, 2014

The folks who have worked on (and are working on) the data platform there may have useful thoughts with regards to paginating large result sets. Pinging @cndreisbach, @marcesher.

@adelevie
Copy link
Contributor Author

@GUI, shall we link to both?

@mgwalker mgwalker closed this as completed Dec 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants