changes in connection #182

fabiansinz · 2015-12-29T16:10:26Z

Add max_allowed_packet to conn_info since pymysql does not fetch that info from the server.
transaction context manager only starts transaction if not in a transaction already

…eady

dimitri-yatsenko · 2015-12-29T18:18:02Z

MySQL and SQL more generally do not support nested transactions.

Users should not get to the situation when they need to check whether they are inside a transaction because code that runs within transactions should never start new transactions. No syntactic clutter. If they attempt to start a transaction from within a transaction, they should get a hard error not a warning. This allows them to realize their mistake and decide what to do with the ongoing transaction.

The risk of ignoring the start of an inner transaction is in incorrect execution of the outer transaction. When the inner transaction commits, it will prematurely commit the outer transaction.

Throwing an error and preventing entering nested transactions is the only cogent behavior in this case.

eywalker · 2015-12-29T18:23:06Z

Just curious: what's the significance of max_allowed_packet here?

eywalker · 2015-12-29T18:31:39Z

I'm also concerned about the proposed handling of the transaction. When a person writes transaction A within transaction B and if all the code pertaining to the inner transaction completes but the outer transaction hits an error and stops, they may falsely assume that the queries pertaining to the inner transaction are committed, despite the fact this is not the case. And I don't think the warning message will be enough to make people aware of this issue.

dimitri-yatsenko · 2015-12-29T18:34:46Z

Anything other than throwing an error may lead to data loss and/or corruption.

fabiansinz · 2015-12-29T18:41:07Z

@eywalker: I think this would actually be the correct behaviour. If you start a transaction you want it rolled back if anything his wrong. Even if part of the code runs through.
@dimitri-yatsenko can you make a convincing example for that claim?

dimitri-yatsenko · 2015-12-29T18:42:28Z

datajoint/connection.py

How is this used?

Yeah, I'm also curious what this does? I'm guessing that you have encountered a problem with a large query hitting the packet size limit?

When I encountered the broken pipe issues in pymysql, I found that pymysql has its own max_allowed_packet for some reason that does not reflect the servers max_allowed_packet. Therefore I set it to the maximally allowed value by MySQL to let the server settings decide which packet sizes are accepted. Alternatively, a cleaner solution would be to put that value in dj.config.

eywalker · 2015-12-29T18:45:22Z

Yes, but what I'm saying is that if a person has code and assumes that they have an inner transaction, they would expect that the completion of the inner transaction code means that the inner transaction is fully committed, even if the rest of the outer transaction fails afterwards. Obviously this is not true because there is only one level of transaction and any inner transaction calls will be ignored with the current implementation. However, even allowing such nested transaction calls to be executed may lead users to have incorrect assumptions about how transactions are handled. Since there is no such thing as nested transaction, I think it is safer to throw an error upon any attempt of nested transaction.

eywalker · 2015-12-29T18:52:55Z

I'm also generally curious what kind of use case results in a potential call to start_transaction when there is already a transaction opened?

dimitri-yatsenko · 2015-12-29T18:58:05Z

Users should never allow for nested transactions in their code because they are not supported. If they allow such a situation, they should get an error and correct their code.

Here is a pseudocode that would produce data loss or corruption if you do anything other than throw an error:

start_transaction 1
    insert(A)
    start_transaction 2
        insert(B)
    commit or rollback 2
    insert(C)  
commit or rollback 1

eywalker · 2015-12-29T19:11:51Z

But he is suggesting that inner transaction do NOT actually insert start and commit transaction statements. So given that no data corruption will occur per se. If an error occurs anywhere, everything rolls back.

On Dec 29, 2015, at 12:58 PM, Dimitri Yatsenko notifications@github.com wrote:

Users should never allow for nested transactions in their code because they are not supported. If they allow such a situation, they should get an error and correct their code.

Here is a pseudocode that would produce data loss or corruption if you do anything other than throw an error:

start_transaction 1
insert(A)
start_transaction 2
insert(B)
commit or rollback 2
insert(C)
commit or rollback 1
―
Reply to this email directly or view it on GitHub.

dimitri-yatsenko · 2015-12-29T19:21:03Z

In the pseudocode above, if the inner transaction decides to rollback its transaction, it will rollback insert(A), resulting in the unnecessary loss of A and failure to insert B.

If instead the user got an error when starting the inner transaction, they could decide to commit A or rollback the outer transaction. They would be forced to rewrite the code correctly.

eywalker · 2015-12-29T19:25:39Z

I think whether that's unnecessary depends right? If the outer transaction wants to ensure that the inner statement insert(B) to complete for it to say all of its queries are good and thus commit, it only make sense that error within causes roll back of everything outer including insert(A)

Now that being said, this behavior is not one would expect of "nested transactions", and given MySQL doesn't support nested transaction this is an expected problem...

On Dec 29, 2015, at 1:21 PM, Dimitri Yatsenko notifications@github.com wrote:

So in the pseudocode above, if the inner transaction decides to rollback its transaction, it will rollback insert(A), resulting in the unnecessary loss of A.

―
Reply to this email directly or view it on GitHub.

dimitri-yatsenko · 2015-12-29T19:32:59Z

@eywalker, Yes, if the inner transaction rollbacked, it should not rollback the outer transaction. Otherwise, it would be equivalent to a single outer transaction. The inner transaction should not commit either since the outcome of the outer transaction is still pending. The inner transaction cannot be considered a transaction in any sense and should not be allowed.

eywalker · 2015-12-29T19:42:34Z

Hmm it sounds like you are suggesting that the inner transaction should act like a thread fork that basically should not affect the parent thread? I guess at least you are suggesting that the success or failure of the inner transaction should not affect that of the outer one? I thought it made sense to say that outer transaction required then success of the inner one.

I totally agree ( and I have had since the beginning) that we cannot have a meaningful inner transaction since MySQL don't support it but I'm also not sure what the behavior of a nested transaction would be should MySQL support it... Without a good idea what a real nested transaction should be like I'm having hard time telling what should be the expected behavior when the inner transaction fails.

This being said I have a feeling that @fabee is considering quite a different use case/perspective in wanting this implementation.

On Dec 29, 2015, at 1:32 PM, Dimitri Yatsenko notifications@github.com wrote:

@Eyewalker, Yes, if the inner transaction rollbacked, it should not rollback the outer transaction. Otherwise, it would be equivalent to a single outer transaction. The inner transaction should not commit either since the outcome of the outer transaction is still pending. The inner transaction cannot be considered a transaction in any sense and should not be allowed.

―
Reply to this email directly or view it on GitHub.

fabiansinz · 2015-12-29T19:42:45Z

Ok, maybe to clarify: I put in that behaviour, because I decorated insert1 to also insert additional logging information (about git) into another table. I don't know who calls insert1 and whether that is in a transaction or not, but I want make sure that the original tuple and the additional information only get inserted together.

Regarding nested transactions, I am not convinced by your arguments for the following reason: If I start a transaction, I want that anything that gets inserted during that transaction is either inserted together or not at all (different to @dimitri-yatsenko expected behaviour, in line with @eywalker expected behaviour). If the "inner" transaction fails, then parts of the data could not be inserted. Therefore, the outer transaction should fail as well. It that sense, @dimitri-yatsenko is right: A nested transaction does not make sense. What you really want is:

if a transaction is already open, do nothing and just keep the outer transaction
if there is no open transaction, open one.

This is exactly what connection.transaction does.

I do not see how this can lead to data loss or corruption, because the alternative in @dimitri-yatsenko example would have been to run insert(B) without transaction which, if it fails, would rollback the outer transaction anyway. The transaction context manager implements exactly that behaviour.

dimitri-yatsenko · 2015-12-29T20:41:14Z

@fabiansinz, Using the pseudocode above, if transaction 2 rollsback, what will happen to insert(C)?

fabiansinz · 2015-12-29T20:53:54Z

It will not be inserted as it should because the outer transaction is only complete if A, B, and C are inserted. B fails so C does not need to be inserted.

dimitri-yatsenko · 2015-12-29T20:59:02Z

Exactly. How does the outer transaction know not to insert(C)?

dimitri-yatsenko · 2015-12-29T21:38:56Z

okay, @eywalker just explained to me that our disagreement stemmed from the fact that @fabiansinz entangled exception handling with transaction handling. So here is an example that disentangles them and exposes problems with the suggested solution:

def fun():
  try:
    with conn.transaction:
      insert(B1)
      insert(B2)
  except:
      handle_all_my_errors     


# the following code does not know that fun() uses transactions
with conn.transaction:
    insert(A)
    fun()
    insert(C)

Does this clarify why emulating nested transactions is a bad idea? What will happen if insert(B2) fails?

fabiansinz · 2015-12-29T22:11:09Z

Yes, this "entanglement" is how this context manager works. If you don't want that behaviour, don't use it, but start_transaction etc. instead.

Regarding your example, I am not convinced because it is basically a misuse of the context manager. You could have made the example just like

with conn.transaction:
    insert(A)
    try:
         insert(C)
    except: 
         # handle all my errors

and caused a inconsistent state because of the misuse of the context manager.

Alternatively, you can run into the same situation without using the context manager at all.

def fun():
  try:
      insert(B1)
      insert(B2)
  except:
      # handle_all_my_errors     

start_transaction()
try:
    insert(A)
    fun()
    insert(C)
except:
    rollback()
else:
    commit_transaction()

Same inconsistent state because the outer try does not know that fun catches exceptions.

You did convince me though that a failure in the inner transaction should already rollback the outer transaction. This would solve your example.

If you are unhappy with potential misunderstandings, what about naming the context manager ensure_transaction?

dimitri-yatsenko · 2015-12-29T22:19:52Z

@fabiansinz , No, in my example above the errors were handled outside the context manager in fun() so there was no abuse as in your counterexample. Am I not allowed to handle errors rethrown by the context manager?

dimitri-yatsenko · 2015-12-29T22:23:55Z

In your second example fun() is not using a transaction, but I stated that the main function that used it does not know what fun() does and the user should not be forced to trace through all functions that she uses in order to ensure that there are no transactions with handled exceptions.

The whole point of my example is that fun() uses transactions and handles its own errors.

dimitri-yatsenko · 2015-12-29T22:29:39Z

Here is a clearer example then:

Example 3

def fun():
  start_transaction()
  try:
    insert(B1)
    insert(B2)
  except:
    rollback()
  else:
    commit()


# the author of the following code does not know that fun() uses transactions
with conn.transaction:
    insert(A)
    fun()
    insert(C)

dimitri-yatsenko · 2015-12-29T22:37:18Z

I think the main assumption that you are making is that any subroutine that rollbacks a transaction also raises an exception immediately after. This cannot generally be true. Rolling back a transaction can be handled by subroutines without raising an exception all the way up the entire calling stack. We should be able to create routines with transactions that take care of their own exceptions.

fabiansinz · 2015-12-29T22:42:50Z

This is exactly what I am assuming with the context manager. It is the same logic as implemented in populate and _make_tuples. You are right, there are more general cases, so the context manager is not always the right tool. That's why I think that your examples point out situations in which the context manager is not a good idea, but I don't think that they demonstrate genuine problems of the "nested" transaction.

That said, I am not trying to introduce nested transactions. I just want a context manager that ensures that a transaction is open. I am happy to give it a better name like ensure_transaction, but otherwise I don't see what your problem with it is.

dimitri-yatsenko · 2015-12-29T22:46:39Z

@fabiansinz No, you misunderstood. Even if you use the context manager in Example 3, you are also assuming that all functions inside the context manager at any depth also use context managers and none of them at any level handle exceptions raised below. Please consider Example 3 carefully. The problem arises from the fact that fun() does not throw exceptions but the user does not know that.

dimitri-yatsenko · 2015-12-29T22:50:56Z

An ensure_transaction has the same problem if the code that it surrounds can call any other function that uses transactions and handles its own exceptions.

Example 4

def fun():
  start_transaction()
  try:
    insert(B1)
    insert(B2) 
  except:
    rollback()
  else:
    commit()


# the author of the following code does not know that fun() uses transactions
with conn.ensure_transaction:
    insert(A)
    fun()
    insert(C)

If insert(B2) fails, then insert(C) will be done without insert(A).

fabiansinz · 2015-12-29T22:58:05Z

Example 4 will cause a rollback automatically because you are trying to open a transaction in a transaction.

Maybe I wasn't clear enough. As I already wrote before, the assumption of the context manager is:

Any error that would cause an inconsistent state should be passed to the context manager.

It does not matter whether the error is re-raised by another context manager or whether it is re-raised by any other exception handling. If the error is not passed to the context manager, the assumption is violated and it will break. You provided enough examples for that.

By now I would like to know whether somebody will accept the pull request with the following updated solution:

I rename the context manager into ensure_transaction
As soon as any of them catches an exception, it will cause a rollback.
I'll make it clear in the documentation of the context manager what its assumptions are.

dimitri-yatsenko · 2015-12-29T23:04:21Z

Ah, I see, my bad. I was oversimplified. The following example will break things. What conventions does it break?

Example 5

def sub()
    with conn.transaction:
       insert(B1)
       insert(B2)

def fun():
  try: 
    sub()
  except: pass

# the author of the following code does not know that fun() uses transactions
with conn.ensure_transaction:
    insert(A)
    fun()
    insert(C)

If insert(B2) fails, then insert(C) will be done without insert(A).

dimitri-yatsenko · 2015-12-29T23:05:28Z

In Example 5, fun() does not produce an inconsistent state.

dimitri-yatsenko · 2015-12-29T23:19:58Z

The correct solution is to never create code that uses transactions in some cases and not others.

It should either use transactions correctly or not use them.

Here is a scenario that is quite realistic and closer to home.

Example 6

For example, if you override insert1 to do something before or after inserting the tuple. Then I write the function insert_either that attempts to insert1(A) and if that fails, insert1(B). Then if I use insert_either() within a _make_tuples call and insert1(A) fails halfway, it will raise an exception to insert_either, which catches the exception and calls insert1(B) instead. Then _make_tuples() returns successfully and commits the entire transaction, leaving insert1(A) in a partially completed state.

dimitri-yatsenko · 2015-12-29T23:23:16Z

No, I don't think I can accept a method that implicitly decides to use transactions in some contexts but not others. If something does not need to be enclosed in a transaction, it should never be enclosed in a transaction. If something does, then it always does. Let's not cheapen transactions by making them break occasionally in ways that will be difficult to debug.

fabiansinz · 2015-12-29T23:27:01Z

What's wrong with the logic:

If transaction not open, open one
If open do nothing and jump on the ongoing transaction.

dimitri-yatsenko · 2015-12-29T23:31:29Z

The code that uses this logic uses transactions in some contexts and not others. "Jumping on the ongoing transaction" is synonymous with "not using a transaction" as you described. So you need to rephrase:

What's wrong with the logic:

Use a transaction in some cases but not others when calling the same function.

Remember, the calling code might not be aware that it's inside a transaction or not. So you might be debugging without a transaction but running inside a transaction.

dimitri-yatsenko · 2015-12-29T23:33:25Z

A transaction is a contract with the calling function "I will execute everything correctly or nothing." This logic breaks that contract, effectively, annulling the function of transactions.

dimitri-yatsenko · 2015-12-29T23:38:37Z

Example 6 above illustrates one such problem that would require extensive knowledge of datajoint's works in order to debug. Instead, we should simply not allow code with transactions to be called from _make_tuples by throwing an error if a nested transaction is attempted.

fabiansinz added 4 commits December 29, 2015 10:08

add new substance

0648bbc

transaction context manager checks for transaction

b46cd05

small bugfix in transaction context manager

6306948

transaction context manager now throws a waring if in transaction alr…

a92924b

…eady

dimitri-yatsenko reviewed Dec 29, 2015
View reviewed changes

fabiansinz closed this Dec 29, 2015

changes in connection #182

changes in connection #182

Uh oh!

Conversation

fabiansinz commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

eywalker commented Dec 29, 2015

Uh oh!

eywalker commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

fabiansinz commented Dec 29, 2015

Uh oh!

dimitri-yatsenko Dec 29, 2015

Choose a reason for hiding this comment

Uh oh!

eywalker Dec 29, 2015

Choose a reason for hiding this comment

Uh oh!

fabiansinz Dec 29, 2015

Choose a reason for hiding this comment

Uh oh!

eywalker commented Dec 29, 2015

Uh oh!

eywalker commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

eywalker commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

eywalker commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

eywalker commented Dec 29, 2015

Uh oh!

fabiansinz commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

fabiansinz commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

fabiansinz commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Example 3

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

fabiansinz commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Example 4

Uh oh!

fabiansinz commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Example 5

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Uh oh!

dimitri-yatsenko commented Dec 29, 2015

Example 6

Uh oh!

dimitri-yatsenko commented Dec 29, 2015