New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: retry loop for exception caused by deadlock on badge node #404
Conversation
… with badge node, still a draft Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
05c2b8e
to
b7dba75
Compare
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
I have some open questions on the code comments that I need help with so please leave some comments with your thoughts @dikshathakur3119 @feng-tao @jinhyukchang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me, and it solves the issue we are facing currently when we try to index hive tables in multiple processes. Do you think we should add note in our badge documents to make sure people understand that why sometimes a task of ingesting nodes may fail even after retries if they are trying to add same Badge node in parallel tasks?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
…nto asm-deadlock-fix
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
CI fails? |
Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>
Summary of Changes
Explanation from @dikshathakur3119 :
We index Hive tables using index_table_mateadata and to expedite indexing all the hive table we are leveraging multiprocessing.
When a relationship is created, its endpoint nodes are write-locked. If multiple threads attempt to create relationships involving the same set of endpoint nodes, deadlocks can occur.
Earlier this deadlock could not take place since we are creating threads on the basis of shema name, so never used to process the same table node from different threads.
But we have recently added a new model - Badge, and we create a single badge for unique badge names, whether it is table level badge or column level badge. E.g there will be only one node with badge name ‘deprecated’ or ‘partition column’. We have n:n relationship between badge-table and badge-column. Now when we try to index table metadata, our task try to access the same badge node from different threads which causes a deadlock.
Tests
N/A
Documentation
N/A
CheckList
Make sure you have checked all steps below to ensure a timely review.
make test