Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.transaction() #6394

Closed
mikermcneil opened this issue Jul 13, 2013 · 12 comments
Closed

.transaction() #6394

mikermcneil opened this issue Jul 13, 2013 · 12 comments

Comments

@mikermcneil
Copy link
Member

@irlnathan @ponzicoder @particlebanana
Re our recent conversations about this subject

See bottom of this issue for proposed usage examples

tldr;

0.5) Add transaction support for uniqueness constraints (already done probably, since we're sending that down as part of the schema)

  1. Add support for same-adapter transactions, using built-in stuff for the database (e.g. mysql transactions). The API looks like User.transaction().find(....), instance.rollback(cb) and instance.commit(cb). If either of those fail, we're in deep trouble. See the very bottom of this issue.

  2. Add transaction support (using our new transaction API described above) on top of:

    • aggregate (createEach(), plural usage of update(), plural usage of destroy())
    • and composite queries (findOrCreate(), and findOrCreateEach())
  3. Add increment() and decrement(), which are just shortcuts on top of update. Only thing is they need to be transactional. Using the same new functionality, of course.

  4. Figure out how to handle commit() and rollback() errors. This may already be worked out at the adapter level.

  5. Add support for instance.attempts(), which will reattempt a waterline call n times until it works (with incremental back-off). We can probably leverage async for this.

  6. Figure out how to do cross-adapter transactions in a user-friendly way. This is explored in a NOT-user friendly way at the bottom of this issue.

Motivations:

Ensuring isolated access to complex units of business logic, beyond what is available in a single database (since these logics may bridge multiple adapters)

From http://stackoverflow.com/a/14044329/486547:

We started with full-on CRUD mutices, but soon realized that was pretty hard. Better to leave that to the database. But sometimes you don't want to-- because you want to be able to switch databases and keep your code agnostic. So then we simplified down to the next easiest piece-- named transactions.

Enabling this sort of cross-adapter behavior is crucial for allowing flexible usage of different services, and avoiding vendor lock-in (e.g. you could use the transactional storage quota support in S3, but what if you want to switch cloud storage providers? Same thing could be said of the transaction support in Oracle.)

Use cases:

Unique Data

Problem (Signup Form)

Keep in mind that this same scenario could exist if User A was making both requests- the trouble is the concurrency, not that the two users involved are coming from different sessions, et al.

  1. User A looks to see if any users exist with username "bronyboy47"
  2. User B looks to see if any users exist with username "bronyboy47"
  3. User A receives a response from Waterline and sees that no users yet exist with that username, so she calls create().
  4. User B receives a response from Waterline and sees that no users yet exist with that username, so she calls create().
  5. Two users with username "bronyboy47" are created.

Solution

Circumvent the need for transactions by using the unique constraint in your adapter.

We sidestep this issue by using the unique validation rule (as long as the adapter natively supports the unique validation-- otherwise we're screwed) But in general, most production databases should support this, so who cares? Get a new database, right? What about sails-disk during development? It would be pretty annoying to not be able to test your logic without hooking up your real production database!! Luckily, sails-disk uses an in-memory representation of your data model for simplicity, so all of its queries are transactional out of the box, since any requests against it are processed FIFO (which is a bottleneck-- but again, this is not a production situation!) Good enough for me- let's stick to the low-hanging fruit.

Increment/Decrement

This one's tougher.

Problem (Blackjack)

In the following example, a user is able to trick a Blackjack game.

  1. User A hits (i)
  2. User A hits (ii)
  3. User B hits (I)
  4. User B hits (II)
  5. User B's first request (I) is processed-- he gets dealt an "8", making the total value of his hand = 8.
  6. User A's first request (i) is processed-- she gets dealt a "9", making the total value of her hand = 9.
  7. User A hits (iii)
  8. User B hits (III)
  9. User B's second request (II) is processed-- he gets dealt an "8", making the total value of his hand = 16.
  10. User B's third request (III) is processed-- he gets dealt an "8", making the total value of his hand = 24. User B busts. User A wins! But then....
  11. User A's second request (ii) is processed-- she gets dealt a "9", making the total value of her hand = 18.
  12. User A's third request (iii) is processed-- she gets dealt a "9", making the total value of her hand = 27. This is greater than 21 and she busts. But wait.. she already won?

Solution

Use Hand's adapter's built-in support for transactions (TODO), plus Waterline's API for performing transactional increment/decrement (TODO).

var valueOfNewCard = Math.floor( Math.random() * 10 ) + 1; 
Hand.increment(currentHandId, {
  total: valueOfNewCard
}).done(cb);

But let's dig in a bit deeper and use the raw transaction primitives:

Instead of Hand, use Hand.transaction(). Instance models returned in the callback will also share access to the transaction.

Transactions are not technically written to the database until they're "committed". However, they may also cause some or all of the database to lock, so it's important that, even in the event of failure, they are released (SQL calls this "rolled back").

Notice that any logic outside of Hand's adapter will need to be "transactional-ized" manually. Within the database though, any updates, creates, or destroys will be automatically rolled back when the transaction releases the lock, either by calling "rollback" or "commit".

// Let's assume cb() is our exit

var dealtCard = Math.floor( Math.random() * 10 ) + 1; 
Hand
.transaction()
.findOne(3)
.done(function (err, hand) {
  if (hand.total > 21) {
    return hand.rollback(cb);
  }

  hand.total += dealtCard;
  hand.save().commit(cb);
})

When possible (i.e. adapter supports transactions natively, and all models involved belong to the same adapter), we should use the transaction support built into that adapter. IMO, this is where Waterline's role ends-- single adapter transactions that have native support. This functionality should be mixin-able in the adapter. findOrCreate() should use built-in transactions automatically if they exist.

If rollback or commit fails, we're in a world of trouble. See my comment below.

Conclusion

To protect an atomic unit of business logic-- where logic which may touch more than one model, or even more than one asynchronous service (i.e. multiple adapters), we have a more complex problem, where we might want to get involved at the Sails level (i.e. locking middleware). BUT! As far as I can tell right now, multi-adapter transaction problems can always be simplified into a single-adapter problem. I could quite easily be wrong though. I'd love to get your ideas.

More on this below.

@mikermcneil
Copy link
Member Author

Cross-adapter transactions

Here's an example of something that crosses multiple adapters that could be simplified to only require one transactional one. Keep in mind, in this case, we'd still need to roll back the pending File model if the upload fails, but that's a separate problem, since it doesn't have to be transactional.

In the following example, a user is able to trick a file storage quota:

  1. User A (with a file upload quota of 50GB) sends a request to POST /file with a binary payload as a 40GB multipart file upload.
  2. User A (with a file upload quota of 50GB) sends a request to POST /file with a binary payload as a 40GB multipart file upload.
  3. User A uploads 80GB of data, and both requests are successful.

Technically, only the quota checking needs to be transactional here-- as soon as an upload has been "accepted", it doesn't need to block other requests, since it can mark the 40GB as "pending" and prevent uploads until it is canceled. This is analogous to a write-ahead log.

Let's take a deeper dive.

NOTE: In this example, File implements the following adapters:

  • MySQL (for create(), find(), update(), and destroy())
  • S3 (for upstream() and downstream()-- destroy() also impacts this binary adapter)
/**
 * This policy prevents a user's storage quota from being exceeded
 */
function enforceQuota (req,res) {

  // Size of file that user is trying to upload
  var sizeOfNewFile = req.header('content-length');

  // To keep this simple, we'll assume the identity of the requesting user 
  // and her cloud storage quota are stored in the session
  var myId = req.session.userId;
  var myQuota = req.session.quota;

  // But we still need to figure out how much of her data she's used so far
  // We'll do that by finding the sum of all of her uploaded file records
  var AtomicFile = File.transaction();
  AtomicFile.findByOwner(myId).sum('size').done(function (err, bytesUsed) {
    if (err) return res.send(err,500);

    // Check if quota would be exceeded by this upload, and prevent this upload if necessary
    if ( bytesUsed + sizeOfNewFile > my.quota ) {
      return AtomicFile.rollback(function (err) {
        if (err) { /* If rollback fails, we're in crazytown!!!!! See below. */ }
        res.send(413, 'Quota exceeded!  Not enough storage available to upload this file.  ' +
                           'Please contact your system administrator to purchase more cloud storage.');
      });
    }

    // Otherwise, we're good to go!
    // We do need to keep track of the fact that we're about to upload a file though
    File.create({
      status: 'uploading', // we need to mark this file as uploading so it can be reversed
      filename: req.header('x-file-name'),
      size: req.header('Content-Length'),
      owner: myId
    }).done(function (err, file) {
      if (err) {
        return AtomicFile.rollback(function (err) {
          if (err) { /* If rollback fails, we're in crazytown!!!!! See below. */ }
          res.send(err, 500);
        });
      }

      // Finally, we can continue on to the next piece of middleware
      // confident that our quotas are consistent, and that our upload is on track to be successful
      // We'll keep track of the File though, so we can mark the upload as complete later.
      // We can implement a job to look for stranded files (files not in the "complete" state)
      // and wipe them out, repairing the owner's storage allotment.  This should technically only occur if the
      // server crashes during the upload
      req.uploadingFile = file;
      AtomicFile.commit(function (err) {
        if (err) { /* if commit fails, we're in crazytown!!!!! See below. */ }
        next();
      });
    });
  });
}

// Here's an example of what the controller method might look like
// This logic should not be accessed without passing our `enforceQuota()` policy
// The good news is that it doesn't actually have to be transactional itself!
// Finally, when we're sure we're not going to exceed the quota, do the actual file upload
function uploadFile (req, res) {

  // Use model instance ('uploadingFile') stored in request
  req.uploadingFile.upstream(req.header('x-file-name')).done(function (err) {
    // If an error occurred, we must remove the pending File model, since the quota is not actually being used
    // and I'm sure the user would like to have their bytes back
    if (err) {
      req.uploadingFile.destroy(function (err) {
        if (err) {
          /* if THIS fails, we should probably retry at least once, and if that doesn't work, inform a system administrator so that this user doesn't end up never getting his/her lost storage allotment */
        }
        return res.send('File failed to upload- storage allotment reclaimed successfully.',500);
      });
    }

    // Otherwise, everything is gravy.  We can update the state of the file.
    file.status = 'uploaded';
    file.save(function (err) {
      if (err) { /* same story as above-- if this fails, we should retry a few times, eventually informing a sysadmin if it's not working-- in this case so that the user's file isn't inadvertently cleaned up by our garbage collection job */ }
      res.send('File (' + req.header('x-file-name') + ' uploaded successfully!', 200);
    });
  });
}

Final note on commit/rollback errors

Rollbacks and commits CANNOT FAIL! If they do, serious measures need to be taken to avoid inconsistent state. In particular, there is a chance of deadlock.

Notice how we've sort of stumbled across a need here for a special kind of error -- even bigger than a system crash. If a user's financial data is at risk of being inconsistent, we need to trigger an emergency job that triggers, first, incrementally backing-off retries, or worst case, human intervention.

This is crucial. It's not ok if your check bounces because your deposit caused a mysql error!

My hypothesis is that these types of error situations only arise as a result of a failed rollback (deleting our "file" model above) or failed commit (updating the state to "uploaded"). This is in custom transactions and built-in transactions alike.

One last thing

The only reason this cross-adapter example works is because we were able to use the File model as our durable write-ahead log. We're persistently tracking the state of the transaction there, and if the server crashes at any point, we're protected since the access to that write-ahead log is protected by an ACID-compliant adapter.

So- only adapters with support for true transactions can work as a commit log for cross-adapter transactions.

@mikermcneil
Copy link
Member Author

Can we just lock middleware or routes in Sails?

If you lock a piece of middleware, you lock it for all users of your app. This is probably not what you want. In the case of storage quotas that are based on user accounts, we really just want to prevent access to a piece of logic for a particular user.

If the quotas lived at the directory level, we'd need to make access atomic per directory.

This sort of transactional gating could be wrapped up and occur at the policy level, however the controller itself would need to call commit(), and rollback() would need to occur all along the way.

A much simpler approach is probably to wrap up something like the following in a service or model method:

// UploadService.js
/**
 * @param {Integer} userId -> id of the user doing the upload, whose quota we should consider
 * @param {Integer} quota -> user's storage quota
 * @param {Integer} filesize -> size of the new file in bytes
 * @param {String} filename -> new file's name
 * @param {String} cb -> callback to trigger when upload completes successfully, or if an error occurs (including an EXTREME fatal error after retrying a commit/rollback 3 times)
*/

exports.uploadFile = function (userId, filesize, filename, quota, cb) {
  // ...
};

@particlebanana @sgress454:
what do you think of all this as it impacts catamaran? I think we can provide transactional protection for a piece, or bundle, of middleware, but we should tackle that as a last priority.

Increment/decrement is totally workable for now!

@mikermcneil
Copy link
Member Author

Handing "fiascos"

A proposal for a generic solution of dealing with commit/rollback errors

Because of the way cross-adapter transactions work, rollback/commit errors aren't always on just commit() and rollback(). Could be any waterline instance methods!

But this might work:

If the method fails, Waterline will reattempt the specified number of times (defaults to 3) with incremental back-off. Finally, if it still hasn't worked, we'll trigger the error handler.

If there's nothing we can really do, since we can't effectively rollback or commit the transaction, usually this means giving the user "their money back" (or equivalent-- reverting their quota) and sending a note to the system administrator, then letting the user know that the issue has been reported.

/* ... */

file.save()
.attempts(n)
.success(cb)
.error(handleFiasco);

// .done() works as well, but since we're using .attempts(), it makes more sense to show promisey usage for the example


// Triggered after n attempts
function handleFiasco (errors) {
   // errors is a list of errors from each attempt
  // respond to the user and let them know everything's on fire over here
  // do something to help them out
  // and probably send an email to our sysadmin
}

/* ... */

What do you think?

@mikermcneil
Copy link
Member Author

To wrap this up:

The good news is that most database vendors support this issue quite cleanly natively. Most of this can be avoided by adding native support for common atomic use cases, like increment, decrement, findOrCreate(), create([ /* all */ ]), update([ /* all */ ]), and destroy([ /* all */ ])

@mikermcneil
Copy link
Member Author

@Zolmeister added a few examples of atomic usage we discussed above. Basically it's compound queries, aggregate queries, and increment/decrement

@CWyrtzen
Copy link

CWyrtzen commented May 8, 2014

Noted~

@CWyrtzen CWyrtzen closed this as completed May 8, 2014
@lucasfeliciano
Copy link

How is the process of implementation of transactions feature? I'm looking forward to start working on it, anyone have already started do something that want to share it?

@lwansbrough
Copy link
Contributor

Hey @mikermcneil, have we locked the spec here? I'm asking because I'm not sure how well instance.commit() will work. IMO it might be more beneficial to be able to grab a transaction instance from the response and be able to reference it. How about I give an example:

User
  .transaction()
  .findOne({ username: 'lwansbrough' })
  .populate('roles')
  .done(function(err, user, transaction) {
    // handle mess of err/null user here with callback and transaction.rollback()
    User
      .transaction(transaction)
      .update({ id: user.id }, { id: user.id, username: 'loch' }, function(err, user) {
         // handle mess of err/null user here with callback and transaction.rollback()

         // run some sort of async task in a different part of the application with the transaction reference
         // where eventually transaction.commit() is called
      });
  });

I think doing it this way may be preferable. Having access to a reference of the transaction (or even just an ID that can be passed to a Transaction service of some sort) would make the transactions a little more portable. This may end up being necessary in some cases. I haven't thought about this as much as you guys have, so let me know if there's any glaring flaws. I'm also not familiar enough with how Waterline generates queries to know if this would work very well. Seems to me like putting the transaction stuff on the model doesn't make a whole lot of sense, given that most transactions encompass access to many different models (or that's what transactions should allow for, anyway).

@mikermcneil
Copy link
Member Author

@shamimevatix
Copy link

Anything for postgresql?.

@carloslfu
Copy link

@mikermcneil
Copy link
Member Author

@carloslfu @shamimevatix @lwansbrough @lucasfeliciano Transactions are now a built-in, core feature in Sails v1 / Waterline 0.13. Docs: https://github.com/balderdashy/sails-docs/blob/ac3a379e7a696a6741e7b57d8d06af6a81ba47ec/reference/waterline/datastores/transaction.md

Hit up gitter, etc. if you have any questions!

@balderdashy balderdashy locked and limited conversation to collaborators Mar 1, 2017
@balderdashy balderdashy unlocked this conversation May 20, 2019
@johnabrams7 johnabrams7 transferred this issue from balderdashy/waterline May 20, 2019
@balderdashy balderdashy locked as resolved and limited conversation to collaborators May 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

6 participants