Skip to content

Conversation

@fabiansinz
Copy link
Contributor

  • Add max_allowed_packet to conn_info since pymysql does not fetch that info from the server.
  • transaction context manager only starts transaction if not in a transaction already

@dimitri-yatsenko
Copy link
Member

MySQL and SQL more generally do not support nested transactions.

Users should not get to the situation when they need to check whether they are inside a transaction because code that runs within transactions should never start new transactions. No syntactic clutter. If they attempt to start a transaction from within a transaction, they should get a hard error not a warning. This allows them to realize their mistake and decide what to do with the ongoing transaction.

The risk of ignoring the start of an inner transaction is in incorrect execution of the outer transaction. When the inner transaction commits, it will prematurely commit the outer transaction.

Throwing an error and preventing entering nested transactions is the only cogent behavior in this case.

@eywalker
Copy link
Contributor

Just curious: what's the significance of max_allowed_packet here?

@eywalker
Copy link
Contributor

I'm also concerned about the proposed handling of the transaction. When a person writes transaction A within transaction B and if all the code pertaining to the inner transaction completes but the outer transaction hits an error and stops, they may falsely assume that the queries pertaining to the inner transaction are committed, despite the fact this is not the case. And I don't think the warning message will be enough to make people aware of this issue.

@dimitri-yatsenko
Copy link
Member

Anything other than throwing an error may lead to data loss and/or corruption.

@fabiansinz
Copy link
Contributor Author

@eywalker: I think this would actually be the correct behaviour. If you start a transaction you want it rolled back if anything his wrong. Even if part of the code runs through.
@dimitri-yatsenko can you make a convincing example for that claim?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm also curious what this does? I'm guessing that you have encountered a problem with a large query hitting the packet size limit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I encountered the broken pipe issues in pymysql, I found that pymysql has its own max_allowed_packet for some reason that does not reflect the servers max_allowed_packet. Therefore I set it to the maximally allowed value by MySQL to let the server settings decide which packet sizes are accepted. Alternatively, a cleaner solution would be to put that value in dj.config.

@eywalker
Copy link
Contributor

Yes, but what I'm saying is that if a person has code and assumes that they have an inner transaction, they would expect that the completion of the inner transaction code means that the inner transaction is fully committed, even if the rest of the outer transaction fails afterwards. Obviously this is not true because there is only one level of transaction and any inner transaction calls will be ignored with the current implementation. However, even allowing such nested transaction calls to be executed may lead users to have incorrect assumptions about how transactions are handled. Since there is no such thing as nested transaction, I think it is safer to throw an error upon any attempt of nested transaction.

@eywalker
Copy link
Contributor

I'm also generally curious what kind of use case results in a potential call to start_transaction when there is already a transaction opened?

@dimitri-yatsenko
Copy link
Member

Users should never allow for nested transactions in their code because they are not supported. If they allow such a situation, they should get an error and correct their code.

Here is a pseudocode that would produce data loss or corruption if you do anything other than throw an error:

start_transaction 1
    insert(A)
    start_transaction 2
        insert(B)
    commit or rollback 2
    insert(C)  
commit or rollback 1

@eywalker
Copy link
Contributor

But he is suggesting that inner transaction do NOT actually insert start and commit transaction statements. So given that no data corruption will occur per se. If an error occurs anywhere, everything rolls back.

On Dec 29, 2015, at 12:58 PM, Dimitri Yatsenko notifications@github.com wrote:

Users should never allow for nested transactions in their code because they are not supported. If they allow such a situation, they should get an error and correct their code.

Here is a pseudocode that would produce data loss or corruption if you do anything other than throw an error:

start_transaction 1
insert(A)
start_transaction 2
insert(B)
commit or rollback 2
insert(C)
commit or rollback 1

Reply to this email directly or view it on GitHub.

@dimitri-yatsenko
Copy link
Member

In the pseudocode above, if the inner transaction decides to rollback its transaction, it will rollback insert(A), resulting in the unnecessary loss of A and failure to insert B.

If instead the user got an error when starting the inner transaction, they could decide to commit A or rollback the outer transaction. They would be forced to rewrite the code correctly.

@eywalker
Copy link
Contributor

I think whether that's unnecessary depends right? If the outer transaction wants to ensure that the inner statement insert(B) to complete for it to say all of its queries are good and thus commit, it only make sense that error within causes roll back of everything outer including insert(A)

Now that being said, this behavior is not one would expect of "nested transactions", and given MySQL doesn't support nested transaction this is an expected problem...

On Dec 29, 2015, at 1:21 PM, Dimitri Yatsenko notifications@github.com wrote:

So in the pseudocode above, if the inner transaction decides to rollback its transaction, it will rollback insert(A), resulting in the unnecessary loss of A.


Reply to this email directly or view it on GitHub.

@dimitri-yatsenko
Copy link
Member

@eywalker, Yes, if the inner transaction rollbacked, it should not rollback the outer transaction. Otherwise, it would be equivalent to a single outer transaction. The inner transaction should not commit either since the outcome of the outer transaction is still pending. The inner transaction cannot be considered a transaction in any sense and should not be allowed.

@eywalker
Copy link
Contributor

Hmm it sounds like you are suggesting that the inner transaction should act like a thread fork that basically should not affect the parent thread? I guess at least you are suggesting that the success or failure of the inner transaction should not affect that of the outer one? I thought it made sense to say that outer transaction required then success of the inner one.

I totally agree ( and I have had since the beginning) that we cannot have a meaningful inner transaction since MySQL don't support it but I'm also not sure what the behavior of a nested transaction would be should MySQL support it... Without a good idea what a real nested transaction should be like I'm having hard time telling what should be the expected behavior when the inner transaction fails.

This being said I have a feeling that @fabee is considering quite a different use case/perspective in wanting this implementation.

On Dec 29, 2015, at 1:32 PM, Dimitri Yatsenko notifications@github.com wrote:

@Eyewalker, Yes, if the inner transaction rollbacked, it should not rollback the outer transaction. Otherwise, it would be equivalent to a single outer transaction. The inner transaction should not commit either since the outcome of the outer transaction is still pending. The inner transaction cannot be considered a transaction in any sense and should not be allowed.


Reply to this email directly or view it on GitHub.

@fabiansinz
Copy link
Contributor Author

Ok, maybe to clarify: I put in that behaviour, because I decorated insert1 to also insert additional logging information (about git) into another table. I don't know who calls insert1 and whether that is in a transaction or not, but I want make sure that the original tuple and the additional information only get inserted together.

Regarding nested transactions, I am not convinced by your arguments for the following reason: If I start a transaction, I want that anything that gets inserted during that transaction is either inserted together or not at all (different to @dimitri-yatsenko expected behaviour, in line with @eywalker expected behaviour). If the "inner" transaction fails, then parts of the data could not be inserted. Therefore, the outer transaction should fail as well. It that sense, @dimitri-yatsenko is right: A nested transaction does not make sense. What you really want is:

  • if a transaction is already open, do nothing and just keep the outer transaction
  • if there is no open transaction, open one.

This is exactly what connection.transaction does.

I do not see how this can lead to data loss or corruption, because the alternative in @dimitri-yatsenko example would have been to run insert(B) without transaction which, if it fails, would rollback the outer transaction anyway. The transaction context manager implements exactly that behaviour.

@dimitri-yatsenko
Copy link
Member

@fabiansinz, Using the pseudocode above, if transaction 2 rollsback, what will happen to insert(C)?

@fabiansinz
Copy link
Contributor Author

It will not be inserted as it should because the outer transaction is only complete if A, B, and C are inserted. B fails so C does not need to be inserted.

@dimitri-yatsenko
Copy link
Member

Exactly. How does the outer transaction know not to insert(C)?

@dimitri-yatsenko
Copy link
Member

okay, @eywalker just explained to me that our disagreement stemmed from the fact that @fabiansinz entangled exception handling with transaction handling. So here is an example that disentangles them and exposes problems with the suggested solution:

def fun():
  try:
    with conn.transaction:
      insert(B1)
      insert(B2)
  except:
      handle_all_my_errors     


# the following code does not know that fun() uses transactions
with conn.transaction:
    insert(A)
    fun()
    insert(C)

Does this clarify why emulating nested transactions is a bad idea? What will happen if insert(B2) fails?

@fabiansinz
Copy link
Contributor Author

Yes, this "entanglement" is how this context manager works. If you don't want that behaviour, don't use it, but start_transaction etc. instead.

Regarding your example, I am not convinced because it is basically a misuse of the context manager. You could have made the example just like

with conn.transaction:
    insert(A)
    try:
         insert(C)
    except: 
         # handle all my errors

and caused a inconsistent state because of the misuse of the context manager.

Alternatively, you can run into the same situation without using the context manager at all.

def fun():
  try:
      insert(B1)
      insert(B2)
  except:
      # handle_all_my_errors     

start_transaction()
try:
    insert(A)
    fun()
    insert(C)
except:
    rollback()
else:
    commit_transaction()

Same inconsistent state because the outer try does not know that fun catches exceptions.

You did convince me though that a failure in the inner transaction should already rollback the outer transaction. This would solve your example.

If you are unhappy with potential misunderstandings, what about naming the context manager ensure_transaction?

@dimitri-yatsenko
Copy link
Member

@fabiansinz , No, in my example above the errors were handled outside the context manager in fun() so there was no abuse as in your counterexample. Am I not allowed to handle errors rethrown by the context manager?

@dimitri-yatsenko
Copy link
Member

In your second example fun() is not using a transaction, but I stated that the main function that used it does not know what fun() does and the user should not be forced to trace through all functions that she uses in order to ensure that there are no transactions with handled exceptions.

The whole point of my example is that fun() uses transactions and handles its own errors.

@dimitri-yatsenko
Copy link
Member

Here is a clearer example then:

Example 3

def fun():
  start_transaction()
  try:
    insert(B1)
    insert(B2)
  except:
    rollback()
  else:
    commit()


# the author of the following code does not know that fun() uses transactions
with conn.transaction:
    insert(A)
    fun()
    insert(C)

@dimitri-yatsenko
Copy link
Member

I think the main assumption that you are making is that any subroutine that rollbacks a transaction also raises an exception immediately after. This cannot generally be true. Rolling back a transaction can be handled by subroutines without raising an exception all the way up the entire calling stack. We should be able to create routines with transactions that take care of their own exceptions.

@fabiansinz
Copy link
Contributor Author

This is exactly what I am assuming with the context manager. It is the same logic as implemented in populate and _make_tuples. You are right, there are more general cases, so the context manager is not always the right tool. That's why I think that your examples point out situations in which the context manager is not a good idea, but I don't think that they demonstrate genuine problems of the "nested" transaction.

That said, I am not trying to introduce nested transactions. I just want a context manager that ensures that a transaction is open. I am happy to give it a better name like ensure_transaction, but otherwise I don't see what your problem with it is.

@dimitri-yatsenko
Copy link
Member

@fabiansinz No, you misunderstood. Even if you use the context manager in Example 3, you are also assuming that all functions inside the context manager at any depth also use context managers and none of them at any level handle exceptions raised below. Please consider Example 3 carefully. The problem arises from the fact that fun() does not throw exceptions but the user does not know that.

@dimitri-yatsenko
Copy link
Member

An ensure_transaction has the same problem if the code that it surrounds can call any other function that uses transactions and handles its own exceptions.

Example 4

def fun():
  start_transaction()
  try:
    insert(B1)
    insert(B2) 
  except:
    rollback()
  else:
    commit()


# the author of the following code does not know that fun() uses transactions
with conn.ensure_transaction:
    insert(A)
    fun()
    insert(C)

If insert(B2) fails, then insert(C) will be done without insert(A).

@fabiansinz
Copy link
Contributor Author

Example 4 will cause a rollback automatically because you are trying to open a transaction in a transaction.

Maybe I wasn't clear enough. As I already wrote before, the assumption of the context manager is:

Any error that would cause an inconsistent state should be passed to the context manager.

It does not matter whether the error is re-raised by another context manager or whether it is re-raised by any other exception handling. If the error is not passed to the context manager, the assumption is violated and it will break. You provided enough examples for that.

By now I would like to know whether somebody will accept the pull request with the following updated solution:

  • I rename the context manager into ensure_transaction
  • As soon as any of them catches an exception, it will cause a rollback.
  • I'll make it clear in the documentation of the context manager what its assumptions are.

@dimitri-yatsenko
Copy link
Member

Ah, I see, my bad. I was oversimplified. The following example will break things. What conventions does it break?

Example 5

def sub()
    with conn.transaction:
       insert(B1)
       insert(B2)

def fun():
  try: 
    sub()
  except: pass

# the author of the following code does not know that fun() uses transactions
with conn.ensure_transaction:
    insert(A)
    fun()
    insert(C)

If insert(B2) fails, then insert(C) will be done without insert(A).

@dimitri-yatsenko
Copy link
Member

In Example 5, fun() does not produce an inconsistent state.

@dimitri-yatsenko
Copy link
Member

The correct solution is to never create code that uses transactions in some cases and not others.

It should either use transactions correctly or not use them.

Here is a scenario that is quite realistic and closer to home.

Example 6

For example, if you override insert1 to do something before or after inserting the tuple. Then I write the function insert_either that attempts to insert1(A) and if that fails, insert1(B). Then if I use insert_either() within a _make_tuples call and insert1(A) fails halfway, it will raise an exception to insert_either, which catches the exception and calls insert1(B) instead. Then _make_tuples() returns successfully and commits the entire transaction, leaving insert1(A) in a partially completed state.

@dimitri-yatsenko
Copy link
Member

No, I don't think I can accept a method that implicitly decides to use transactions in some contexts but not others. If something does not need to be enclosed in a transaction, it should never be enclosed in a transaction. If something does, then it always does. Let's not cheapen transactions by making them break occasionally in ways that will be difficult to debug.

@fabiansinz
Copy link
Contributor Author

What's wrong with the logic:

  • If transaction not open, open one
  • If open do nothing and jump on the ongoing transaction.

@dimitri-yatsenko
Copy link
Member

The code that uses this logic uses transactions in some contexts and not others. "Jumping on the ongoing transaction" is synonymous with "not using a transaction" as you described. So you need to rephrase:

What's wrong with the logic:

  • Use a transaction in some cases but not others when calling the same function.

Remember, the calling code might not be aware that it's inside a transaction or not. So you might be debugging without a transaction but running inside a transaction.

@dimitri-yatsenko
Copy link
Member

A transaction is a contract with the calling function "I will execute everything correctly or nothing." This logic breaks that contract, effectively, annulling the function of transactions.

@dimitri-yatsenko
Copy link
Member

Example 6 above illustrates one such problem that would require extensive knowledge of datajoint's works in order to debug. Instead, we should simply not allow code with transactions to be called from _make_tuples by throwing an error if a nested transaction is attempted.

@fabiansinz fabiansinz closed this Dec 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants