Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple update token (a.k.a etags) specification #18121

Open
Tracked by #22954
divega opened this issue Sep 29, 2019 · 1 comment
Open
Tracked by #22954

Simple update token (a.k.a etags) specification #18121

divega opened this issue Sep 29, 2019 · 1 comment

Comments

@divega
Copy link
Contributor

divega commented Sep 29, 2019

This is a copy of an old EF specification written many years ago, with only minor edits. It is a bit naïve, boring, and a lot in it may no longer be valid.

TL;DR

We can greatly simplify the code necessary to perform database updates in Web applications and services in a way that both correctly handles concurrency and minimizes database roundtrips, by introducing update tokens as a new first-class concept in the EF Core API.

Update tokens encapsulate all required original values in a way that can easily round-trip as an opaque payload alongside the current values of entity properties.

Usage would be similar to this:

// GET
var blog = context.Blogs.Find(key);
return (blog, context.Entry(blog).UpdateToken);
// PUT
context.Update(blog, updateToken); 
context.SaveChanges();

What's broken today

There is an inherent mismatch between the Unit-of-Work pattern embodied by EF's DbContext and the requirements of Web applications or services that perform simple update operations on entities on a stateless fashion:

  • EF is designed so that you can use a DbContext instance to query for some objects, then track any changes applied to those objects, and later persist those changes back to the database. In order to correctly detect changes, EF can listen to property change notification events or compare current property values against a snapshot of the original property values taken when each object is first retrieved.

  • In Web scenarios the server is stateless, and it is common to have updates split into two separate HTTP requests (e.g. GET and PUT). When using EF in ASP.NET, each request will use a separate DbContext instance. The GET operation will retrieve an object (or group of objects) from the database and return it to the view. When the PUT operation is invoked with the modified entity, the logic needs to re-hydrate the necessary state into its DbContext to mimic what would have been there if it was the same DbContext instance used in the GET operation.

The logic of PUT is hard to get right, especially given that there are so many variations, like full vs. partial updates (in partial updates only some properties of an object are ever transferred and rendered in the view), regular vs shadow FK properties,ad-hoc SQL vs. stored procedure updates, variety of concurrency tokens, etc.

Although some simple patterns exist that work well in common scenarios, it is still a frequent source of customer questions and application bugs.

The simple patterns we lead customers to (primarily in the code we scaffold automatically and in the samples in our documentation) usually suffer of one ore more of these limitations:

  1. They introduce an extra database roundtrip to re-hydrate the entity on the PUT request.

  2. They can't handle concurrency control correctly unless the application uses store-generated, read-only row versions.

  3. They achieve correctness through a lot of unwanted ceremony, including copying and applying individual original property values.

Clearly, preserving data consistency in concurrency scenarios isn't a concern in all use cases in all Web applications.

However, we care that customers that want to handle this concern can do so with a coding pattern that is simple and efficient.

Unfortunately, there is no single simple coding pattern available that works well across all these cases and concerns, until now.

Goals

  1. Reduce the size and complexity of the code that needs to be written in order to implement sound update operations using EF Core in controller code:

    • Updates operations should be safe and respect all data consistency constraints, including optimistic concurrency control if specified in the model
    • Code should be lean and frictionless, and it should not require duplicating information we have in metadata like keys and concurrency property names
    • Code should not feel too low level and it should be easy to follow and understand with very few concepts in mind
    • Code should allow separation of concerns: it should be easily to encapsulate persistence logic in some part of the application and only expose POCO entities and simple types outside, be able to write good unit tests, etc.
  2. Produce useful guidance on the code patterns that customers and ASP.NET Core scaffolding should use to implement update controller actions based on diverse requirements

Current solutions

There are a few relatively simple and well-known patterns for implementing the Edit controller action when using EF. Here is a non-exhaustive list of examples:

  1. Finding the entity and then applying current values:
// PUT
var blog = context.Blogs.Find(id);
TryUpdateChanges(blog);
context.SaveChanges();
  1. Attaching and marking the whole entity as modified:
// PUT
context.Entry(blog).State = EntityState.Modified;
context.SaveChanges();
  1. Attaching and marking individual properties as modified:
// PUT
context.Entry(blog).State = EntityState.Unchanged;
context.Entry(blog).Property(b => b.Title).IsModified = true;
context.SaveChanges();
  1. Attaching a stub and then applying current values:
// PUT
var blog = new Blog {Id = id};
context.Entry(blog).State = EntityState.Unchanged;
TryUpdateChanges(blog);
context.SaveChanges();

While each of these patterns is relatively simple, they present serious limitations outside some narrow set of scenarios:

  1. In cases in which the original values of properties marked for concurrency are not preserved between requests, there is no guarantee of data consistency on concurrent updates

  2. In all cases in which we end up not having values for all properties and we want to only perform partial updates, there will be data loss if we are using stored procedures for CUD operations

  3. If properties are in shadow state, then the object instances will not carry enough information about the change, so changes can be lost or incorrectly detected

  4. Picking the right pattern can be complicated (even for us!). It is also challenging to choose exactly which original values to preserve when we know we need them.

Although we could focus in developing a pattern that can work in the narrow set of scenarios that we believe is the most common (e.g. updating whole entities with either no concurrency control or only a read-only/store-generated timestamp property and dynamic SQL CUD operations), this seems to be an opportunity to take advantage of the metadata in the model to come up with something much better that can simplify the experience for all customers while guaranteeing data consistency.

Let’s step back and try to recognize the problem again…

An old adversary

Web application programming models such as MVC typically use local method/using block scoping or the Session-per-Request pattern to control the lifecycle of the Unit-of-Work. This implies that updates often need to be performed in a Unit-of-Work that is different from the one that was originally used to populate the view (and hence the one that was used to initiate the logical transaction). On the other hand, the EF’s Unit-of-Work classes, i.e. the DbContext, has been designed to handle update scenarios with great flexibility when they have the opportunity to track all changes performed on objects that were retrieved from the database using the same context. In fact, using its native ability to track changes, EF context objects can handle update scenarios in the broad space defined by the following dimensions:

  1. Use dynamic SQL or stored procedures for CUD operations

  2. With updates affecting only some properties or all properties in the entity

  3. With regular scalar properties and blobs

  4. In just one entity or multiple entities changing in the same business transaction

  5. With or without concurrency control specified in the model

  6. Using different types of concurrency tokens

  7. With Foreign Key associations and independent associations

  8. Using snapshot-based vs. notification-based change tracking

And they can do it while the same time maintaining the following good attributes:

  1. Data consistency

  2. Separation of concerns

  3. Simple programming model

Let’s then propose a working hypothesis: The challenge of performing updates correctly in Web applications using EF in the general case can be described as simulating the capabilities of a long-lived context while using separate, short-lived contexts for the initial data retrieval and update operations, which it essentially equivalent to the N-Tier problem of working with disconnected graphs.

The patterns described before are not different to the patterns we previously offered for performing N-Tier with single entities. It is not a surprise that many customers try to solve those Web scenarios using the more sophisticated (and much more brittle) solution we offer for N-Tier in the form of Self-Tracking Entities.

Towards a solution

We should learn about how we and our partners solved this problem in the past (e.g. in STEs, EDSC, Data Services, RIA Services, etc.), although none of the existing solutions leads to the simple handcrafted code we want to enable in MVC.

Here is how they all work at a high level:

  1. Retrieve entity from database

  2. Store its required original values somewhere

  3. Send the full entity and preserved original values to the client

    • Note: In EDSC, as an MVC scenarios it can be just a subset of the properties rather than the full entity
  4. Modify the current values on the client

  5. Send back both current and preserved original values to the server

  6. Re-build an entity with original values and attach it

    • Note: In Data Services we instead retrieve the entity again from the database using its keys and then we set the preserved values as original values
  7. Apply current values

  8. Save changes

    • Here ordering of operations and store-side concurrency checks happen automatically

There are several variations but they all have in common one thing: a subset of property values of the entity is preserved separate from the entity.

Conversely all the examples in our gallery of simple handcrafted solutions suffer of the same pitfall: they do not have sufficient data to make good decisions about what to update in all cases.

Which exact subset is necessary to preserve depends on whether:

  1. Partial updates are supported

  2. The pattern re-queries the database before saving

  3. Change tracking is performed using notification or snapshot comparisons

  4. Updates are performed with ad hoc SQL or stored procedures

Update Token

HTTP has a feature called ETags to do optimistic concurrency control. Conceptually, ETags are opaque identifiers assigned by a service to a specific version of an entity found at an URL. If the version of the entity ever changes, the service will assign a new ETag to the new version. As they are used in HTTP headers ETags should be easy and efficient to serialize. Although it should be possible to treat ETags as opaque identifiers on the client side, since identity of ETags is checked on the server, the service can actually choose to include useful information in them for later processing.
ETags are a great solution for keeping the sufficient out-of-band information to update an Entity, but EF isn’t really a Web technology or responsible for formatting of serialization payloads into HTTP headers, so we shouldn’t be in the business of creating ETags support. However we can provide easy to use building blocks that can enable the use of ETags or a similar approach.
By making the Update Token a first class concept in the EF we can enable simpler patterns for sound controller actions, e.g. following the re-query before update pattern:

// GET
var blog = context.Blogs.Find(key);
ViewBag.Blog = blog;
ViewBag.BlogUpdateToken = context.Entry(blog).UpdateToken;
return View();
// PUT
var blog = context.Blogs.Find(key);
context.Entry(blog).UpdateToken = eTag; 
TryUpdateChanges(blog);
context.SaveChanges();

Update Token Modes

As we established above, the actual subset of properties encoded into the update token is going to depend on several factors. Different modes can support computing concurrency tokens optimized for different scenarios. Setting the UpdateToken property may work the same regardless of what mode was used to obtain it, i.e. it will simply set the original values available on the payload.
We can pick a reasonably resilient mode to be the default and other modes can be turned on to leverage characteristics of the model and the pattern used and to optimize for different parameters such as serialization payload vs. number of database roundtrips.
We can defined an enum, UpdateTokenMode with the following members:

  • AllMembers: can be used when snapshot comparisons are used for change tracking and the ability to set any property to its default value is desired
  • RequiredMembersWithPartialUpdates: returns minimal set of properties required to build the original state of an entity assuming not all current values are round-tripped
  • RequiredMembers: returns minimal set of properties required to build the original state of an entity assuming all current values are round-tripped
  • ConcurrencyMembers: returns only properties marked for concurrency, useful in the pattern in which we re-query before each update

Notice that RequiredMembersWithPartialUpdates will yield the same set of properties for RequiredMembers if using ad-hoc updates, and the same set of properties as AllMembers if stored procedure mapping is being used.
The concurrency token is populated from original values as opposed to current values. That way, it is even possible to reconstruct changes for an entity that is modified before the concurrency token is obtained.

Key Token

Notice that we don’t need to include the entity key values in UpdateToken. Although update tokens are only relevant to a specific entity the keys are unnecessary if the UpdateToken is stored alongside the entity and this one includes the key values. That said, obtaining entity keys in a serialization friendly format and metadata driven way is also useful.

In scenarios in which multiple entities can be modified in a single Unit-of-Work, it can be useful to store their UpdateToken instances in a dictionary. For defining the keys of such dictionary we need a unique representation of the entities identity. Keys themselves are required to be unique, but it is desirable to have an opaque representation that we can use and serialize always, independent of the type of the keys, independent on whether keys are composite and also in a manner that is agnostic from the specific type. From the model, we know what the key members are, so we shouldn’t require customers to specify them every time!

var keyToken = context.Entry(blog).KeyToken;
var updateToken = context.Entry(blog).UpdateToken;
_originalValuesDictionary[keyToken] = updateToken;

Notice that KeyToken is only relevant to the key-space of the specific entity set/type but do not contain information about the type or the set. Also notice that unless the UpdateToken property on DbEntityEntry, the KeyToken property would generally be read-only as keys are not mutable.

The method DbSet.Find could be adapted to accept a KeyToken to retrieve an entity, e.g.:

var blogEntry = context.BlogEntries.Find(keyTag);

Usage

By making UpdateToken and KeyToken a first class concept in the EF we can enable simpler patterns for sound controller actions, e.g. following the re-query before update pattern:

// Context Initialization
context.Options.UpdateTokenMode = UpdateTokenMode.ConcurrencyMembers;
// GET
var blog = context.Blogs.Find(key);
ViewBag.Blog = blog;
ViewBag.BlogKeyToken = context.Entry(blog).KeyToken;
ViewBag.BlogUpdateToken = context.Entry(blog).UpdateToken;
Return View();
// PUT
var blog = context.Blogs.Find(keyToken);
context.Entry(blog).UpdateToken = updateToken; 
TryUpdateChanges(blog);
context.SaveChanges();
// DELETE
var blog = context.Blogs.Find(keyToken);
context.Entry(blog).UpdateToken = updateToken; 
context.Entry(blog).State = EntityState.Deleted; 
context.SaveChanges();

If we are not following the re-query before update pattern, and we don’t need partial updates the code would look like this:

// Context Initialization
context.Options.UpdateTokenMode = UpdateTokenMode.RequiredMembers;
// GET
var blog = context.Blogs.Find(key);
ViewBag.Blog = blog;
ViewBag.BlogUpdateToken = context.Entry(blog).UpdateToken;
return View();
// PUT
context.Entry(blog).State = EntityState.Unchanged; 
context.Entry(blog).UpdateToken = updateToken; 
context.SaveChanges();
// DELETE
context.Entry(blog).State = EntityState.Deleted; 
context.Entry(blog).UpdateToken = updateToken; 
context.SaveChanges();

Other questions/open issues

  • Update method: having to set the State of a DbEntityEntry to and pass the updateToken to an entity feels a bit weird and not very explicit. Can we add an Update method that makes it nicer? E.g.:
     context.Blogs.Update(blog, updateToken); 
     context.SaveChanges();
  • Persistence ignorance: UpdateToken and KeyToken are not completely orthogonal to persistence, although ETags have a recognized role in the Web that is more about service orientation and REST than about persistence per se. Given that it is conceivable that people would adapt to the need to accept ETag-like opaque values in methods in a repository as long as they are independent of the POCO entity itself and we use a neutral type to represent them (i.e. byte[] or string). The generic repository pattern actually becomes easier to implement thanks to the introduction of these concepts.
  • Wire format: It is desirable to have a serialization and Web friendly format but it is also desirable for it to be very lean. We could consider something like JSON light if it doesn’t involve acquiring a huge dependency on EF. We should rather consider building the feature in an extensible way, so that a dependency resolver can be used to plug-in different implementations.
  • Where to put it: although general purpose the motivation for this feature is in Web scenarios. We have already identified several other things that would ideally live in an EntityFramework.Mvc package, such as EF data validation for MVC, a library of EF-optimized controllers and model binders and an EF-driver for the OData support in Web API. If create such package, it could be possible to include a JSON-light formatter for UpdateToken/KeyToken generation service. Then EntityFramework could contain a simpler formatter.
  • Runtime type: As mentioned before, there are benefits in using a type that is neutral such as string or byte[]. However there are potential issues, e.g. if we choose to support string, what happens for DbSet.Find when there is an entity type that has a string key: how can we tell if the expected format in the string passes is the raw value of the string key column or something else used for UpdateToken?
  • BLOBs: This is an open issue. BLOBs may be in general too large for us to want to include them fully serialized in an UpdateToken. That said, today we only do trivial reference comparisons for blobs in non-key properties, which causes false positives in N-Tier and Web scenarios. False positives in change detection of BLOBs are also very bad because it forces us to serialize the blob completely to the database on each save. An alternative to this would be to include a hash of the blob in the UpdateToken. Then we would need to make the necessary changes in EF core to store the hash associated with the original values and that later we compute the hash of the current values and we compare them rather than doing pure reference comparisons as we do today. Since there is no single behavior that is deal for most scenarios, this something that would probably require knobs in public surface to control the behavior.

Interesting links

@roji
Copy link
Member

roji commented Sep 29, 2019

I'm sure this discussion has happened many times before and nothing I say will be very new, but just dumping some reactions here in case we discuss again soon. Am also probably missing many details.

  • Sending any unnecessary data (entity data) to HTTP clients should be avoided if possible - this means increased network bandwidth (ETag headers are also limited in size and could overflow).
  • I'm probably missing something, but "current solution 4" (stub) seems like a best practice to me, assuming concurrency tokens are also managed: the user provides EF with exactly the data needed to generate the correct UPDATE: key(s), concurrency token(s), properties to update.
  • It seems reasonable that if there are no concurrency tokens (or if these are not properly populated on the stub), there is no guarantee of data consistency. First, that may not be a problem in all applications. Second, nothing else will solve this - even re-querying a 2nd time right before saving back would be prone to consistency unless a concurrency token is used.
  • I can't see any best practice in which we want recommend re-querying the database. Aside from the perf impact (2nd query), data may of course have changed between the 1st and 2nd query, and may change again between the 2nd query and the update. The only scenario where this makes sense to me is a totally stateless application where no session can exist for some reason. I haven't done web in a while but I'm not sure we need to worry about this (I may be wrong).
  • There indeed seems to be a pattern for returning the concurrency token in ETag, and for clients to include it the If-Match header of PUT requests (see this example). We could document this as a best practice for a well-behaved REST API application, including an ASP.NET middleware that catches DbUpdateConcurrencyException and returns HTTP 412 precondition failed.

Finally and most interestingly, this is making me think that we are trying to twist the unit of work into the web scenario where, well, it simply doesn't fit very well, and is another case where non-UoW support may be appropriate, i.e. bulk updates again (#17958).

At the end of the day, users simply need a way to generate UPDATE customers SET x=y WHERE id=8 (with possibly an additional clause for the concurrency token). Attaching a stub and updating is a way to make EF generate that, but it seems somewhat convoluted (or at the very least verbose/ceremonious). If we had bulk update, the user could simply express the above directly, in one line of code, without paying any overhead for tracking which isn't really needed here (or allocating the stub).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants