Generated etags stay the same after updating contents #7

ysugiura · 2014-04-03T22:30:59Z

I'm having a trouble to get etag decorator working. Generated etags don't change after updating the corresponding contents (responses from drf views are updated too).

By looking at the code in drf-extensions, the default behaviour of the default_etag_func seems to be calculating etags based on requests. Wouldn't it supposed to be based on responses?

Before attempting to write a custom etag_func, I would like to make sure that I'm not missing anything.

I'm testing with following versions.

drf-extensions==0.2.2
djangorestframework==2.3.13

chibisov · 2014-04-04T04:30:54Z

The main feature of etag calculation is prevention from heavy response calculation. It's not only about saving bandwith. Thats why i think etag calculation should be based on request data, not just taking hash function from calculated response.

I would love to help you to compose custom etag function if you described here your use cases in more details.

ysugiura · 2014-04-05T19:51:22Z

Thanks for the explanation and even offering me to help writing a custom etag_func. As a matter of fact, "prevention from heavy response calculation" is exactly what I want to do.

Now that I know it is the intended design, I'll try to create custom etag_func. I may ask you some questions if I encounter any problems.

Besides, I'm still wondering if default_etag_func is working. It uses DefaultKeyConstructor which purpose, I believe, is to create a cache key and has nothing to do with the data to be served.

chibisov · 2014-04-05T20:22:47Z

Key constructors are just the abstraction that helps to calculate keys. You can use it for cache key calculation, etag key calculation, etc. It's your choices whether to use it or not. The only their purpose is to create key string that could be use by @etag or @cache_response decorators for conditional requests or caching facilities. You can easily replace them with simple function.

DefaultKeyConstructor is fairly simple and it isn't trying to be very smart. It's constructed from unique_method_id, format and language of the request. Every such part is called a bit. You can create your custom bit by following example from documentation or use default bits for your custom use cases.

ysugiura · 2014-04-05T23:58:45Z

So @etag is not working out of the box?

In case you are not aware, DefaultKeyConstructor is assigned to default_etag_func.

rest_framework_extensions/utils.py:

    70: default_cache_key_func = DefaultKeyConstructor()
    ...
    74: default_etag_func = default_cache_key_func

chibisov · 2014-04-06T06:03:48Z

I thought from documentation it's clear enough that you should tune key construction. Maybe I should add more explicit information about it.

ysugiura · 2014-04-06T07:10:52Z

I admit I should have read the documentation more carefully. Anyways, I wrote a simple custom etag_func and it's working. Thanks for your time.

mbox · 2015-10-01T11:36:09Z

Sorry @chibisov but that is not the purpose of ETag. See https://en.wikipedia.org/wiki/HTTP_ETag - an ETag should change if the CONTENT ever changes for a resource, not if the REQUEST changes.

This default implementation is dangerously wrong - if put into use it will result in all sorts of hard-to-diagnose caching issues.

chibisov · 2015-10-01T12:10:24Z

@mbox but the request changes the way content rendered. Doesn't it?

mbox · 2015-10-01T14:19:41Z

If the underlying entity has not changed, then the same request should return the same response. A different request can of course produce a different response.

So the Etag should be calculated based on the entity to be returned - I guess it would be fine to mix in things like the Language and Encoding headers when deciding if the content is the "same". But it's definitely not OK to generate the same Etag if the resource has changed.

Simple example:

1 Create a model and simple model-based APIView
2 A client requests the model via the API - server sends Etag: 12345
3 Edit the model via the admin
4 The client requests the model again, sending the Etag 12345 in the request headers.

Should the server say "hey, nothing's changed", or send back the updated data? Obviously the latter - but that only works if the Etag value calculated by the server at 2 & 4 depends on the content. However, the default etag function in drf-extensions will send a 304 response.

Say the server does the right thing, and sends Etag: 45678 along with a 200 response at 4. Now the client requests the same URL, but this time with a different Accept-Language header. If the bytes to be returned didn't change (hey, it's an API, maybe the language doesn't matter) then the Etag should still be 45678. If it did change, then the Etag is different.

Either way, the Etag is always logically calculated based on the response to be sent. It may of course be possible for the server to "cheat" rather than generate the entire response to calculate the Etag - but it should be invisible to the client.

chibisov · 2015-10-01T16:01:32Z

You're absolutely right. But DefaultKeyConstructor is simple and works under assumption that you'll have a stale cache only till CACHE_TIMEOUT time will come. If you need a custom cache invalidation logic just add it.

mbox · 2015-10-02T04:00:06Z

You're right for reads that this just means a stale value will be served for CACHE_TIMEOUT seconds - which may or may not be an issue for a particular system.

However, Etags aren't just used for read implementations - they are the foundation of detecting conflicts on updates. Allowing a window for data destruction during CACHE_TIMEOUT doesn't seem like a good idea to me.

Fundamentally I think mixing the idea of server-side cache key generation and Etag generation is a mistake. Server side caching needs to calculate a cache key based on the request to look up the response data, but the Etag calculation needs to calculate a value based on the response data - which if using caching will be the cache value.

I've two suggestions for improving this:

a) Change the default etag function to simply take a hash of the response - this won't cost any more or less server-side computation than not using etags, but at least saves the network bandwidth
b) Update the documents to point out that the default etags function has the issues listed above

For model-based views, an default etag function that simply hashed the queryset or model object would work well, since it would avoid the serialization and data transfer overheads (and if the view is already using a cache to return the ORM results, then it plays nicely).

aijogja · 2015-12-30T22:05:59Z

I also have an issue like @mbox explanation. What is the custom etag_func that I should write to handling etag when the result was changed?

I use:
djangorestframework==3.3.1
drf-extensions==0.2.8

Thanks

chibisov closed this as completed Apr 6, 2014

shanx mentioned this issue Jan 26, 2017

Correct implementation of ETAGs in drf-extensions #177

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated etags stay the same after updating contents #7

Generated etags stay the same after updating contents #7

ysugiura commented Apr 3, 2014

chibisov commented Apr 4, 2014

ysugiura commented Apr 5, 2014

chibisov commented Apr 5, 2014

ysugiura commented Apr 5, 2014

chibisov commented Apr 6, 2014

ysugiura commented Apr 6, 2014

mbox commented Oct 1, 2015

chibisov commented Oct 1, 2015

mbox commented Oct 1, 2015

chibisov commented Oct 1, 2015

mbox commented Oct 2, 2015

aijogja commented Dec 30, 2015

Generated etags stay the same after updating contents #7

Generated etags stay the same after updating contents #7

Comments

ysugiura commented Apr 3, 2014

chibisov commented Apr 4, 2014

ysugiura commented Apr 5, 2014

chibisov commented Apr 5, 2014

ysugiura commented Apr 5, 2014

chibisov commented Apr 6, 2014

ysugiura commented Apr 6, 2014

mbox commented Oct 1, 2015

chibisov commented Oct 1, 2015

mbox commented Oct 1, 2015

chibisov commented Oct 1, 2015

mbox commented Oct 2, 2015

aijogja commented Dec 30, 2015