-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LOWPRI] Why is Orangeboard executing Cypher statements in a transaction? #88
Comments
From SO: You should try not to use transactions; open transactions prevent changes to indexes and constraints and increase memory usage. The only reason to use transactions is for the rollback potential; if you want to see what the results of the query are, and maybe undo it depending on those results, then use a transaction. Otherwise use a session. |
Thank you @dkoslicki . Fixed in Orangeboard.py. |
@dkoslicki I've updated Q1Utils.py to remove the transaction statement there. Hope that's OK. |
Hi Steve & David, It's NOT OK to remove the transaction statement. We discussed this issue before.
It's true. This is the so called auto-commit mode. HOWEVER, it is not guaranteed that the transaction will be created and committed before your program ends.
with session.begin_transaction() as tx:
tx.run("query") is equivalent to try:
context_manager = session.begin_transaction()
tx = context_manager.__enter__()
tx.run("query")
finally:
context_manager.__exit__() Usually Best, |
In worst case, we would lose some unbegun transactions when a python script runs to its end. |
Yao, I switched to auto-commit as we were occasionally seeing some severe slowdowns in pushing content to Neo4j. These slowdowns have gone away after switching to auto-commit. Let’s leave the code as it currently is and discuss transactioning, after Wednesday. It is fine that you opened this issue again, as we do want to (eventually) do what is technically correct. But I am not entirely convinced that we should be using a transaction. Steve |
It's OK if we are not pushing tons of nodes into neo4j right now. However Zheng and I both reproduced errors of nodes not being pushed completely. |
On a related note, we should try transaction vs session in Q1Solution and Q2Solution. For example, when you run |
@erikyao Do you have details about the nodes that weren’t pushed or error messages returned? |
To avoid the slowdowns, we can merge all queries into ONE manually managed transaction. Obviously, not for now. |
@dkoslicki can you attach a stack backtrace? Better yet maybe a new issue since it is not (yet) clear that it is due to auto-commit? |
@dkoslicki actually there won't be any error when a transaction is lost. In our previous tests of pushing nodes twice, the last time push was always lost. |
@erikyao Did you see the SO posting linked by @dkoslicki ? It specifically says not to use a Transaction unless you need the rollback potential. If you have reproducible code showing lost node pushes to Neo4j, please post a MWE here. |
Oh, and make sure you do your testing by pushing nodes to your local Neo4j; we don't want to be modifying the Neo4j on ncats.saramsey.org. |
It's a very subtle bug to find. If you have some other function calls after the last push, and it takes enough time for the auto-commit transaction to be done, everything is OK. |
See issue #120 for the issue with |
@saramsey Yes, I've read that SO thread. I issued a complaint to the python driver developers about the auto-commit mode, like 3 weeks ago. One suggestion I got is:
|
@erikyao So it should show the error if we call the cypher query and then immediately exit, right? Should be straightforward to come up with a MWE then... Correct me if I'm wrong |
Ah, I see @erikyao, in the code I wrote, I always consume the session run results before closing the session (eg res = session.run(query)
res = [i for i in res] So that's why I haven't seen the issue you mention come up. |
@erikyao you said
You said always lost. Now you're saying it is a subtle bug to trip. I am confused. Furthermore, your quote from the driver devs:
... states that the transaction is guaranteed to post if we close the session. So we just need to add a python statement closing the session.... right? |
@dkoslicki you are right. If you consumed the result from your query, it would be fine. |
@erikyao so can you post the code that always caused the issue? Otherwise, I assume we could trip the error with something like: |
@erikyao Once again, if you have code that consistently trips this bug, please post it so we can reproduce the problem using local Neo4j. |
@saramsey @dkoslicki let me look for my commits first. Please wait a second. |
@saramsey @dkoslicki I am writing a new test since I cannot find my previous one. I'll test by myself first. Should be quick. |
Frustrated, I cannot reproduce this error with the latest code. Previously we had only one session to manage all the transactions (no matter manually or not); now we create a new session every time we run a query. I think it would suffer the same problem especially when the transaction is way too big and lagging behind. However I cannot run big tests like before on my workstation because of memory issue. I'll look deeper into it tonight. |
@erikyao any luck in tracking down this MWE? |
Yao,
Do we need the begin_transaction() call in this line in Orangeboard? (line 429)
Note: please do not modify Orangeboard without extensive testing including pushing a large graph from Orangeboard to a local Neo4j. And we are currently using ncats.saramsey.org and lysine.ncats.io for integration testing. So this Issue can wait until after the 11/29 demo.
Steve
The text was updated successfully, but these errors were encountered: