fix: retry loop for exception caused by deadlock on badge node #404

allisonsuarez · 2020-11-09T21:28:54Z

Summary of Changes

Explanation from @dikshathakur3119 :
We index Hive tables using index_table_mateadata and to expedite indexing all the hive table we are leveraging multiprocessing.
When a relationship is created, its endpoint nodes are write-locked. If multiple threads attempt to create relationships involving the same set of endpoint nodes, deadlocks can occur.
Earlier this deadlock could not take place since we are creating threads on the basis of shema name, so never used to process the same table node from different threads.
But we have recently added a new model - Badge, and we create a single badge for unique badge names, whether it is table level badge or column level badge. E.g there will be only one node with badge name ‘deprecated’ or ‘partition column’. We have n:n relationship between badge-table and badge-column. Now when we try to index table metadata, our task try to access the same badge node from different threads which causes a deadlock.

Tests

N/A

Documentation

N/A

CheckList

Make sure you have checked all steps below to ensure a timely review.

PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
PR includes a summary of changes.
PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does
PR passes make test

… with badge node, still a draft Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

allisonsuarez · 2020-11-10T23:10:25Z

I have some open questions on the code comments that I need help with so please leave some comments with your thoughts @dikshathakur3119 @feng-tao @jinhyukchang

databuilder/publisher/neo4j_csv_publisher.py

dikshathakur3119

Mostly looks good to me, and it solves the issue we are facing currently when we try to index hive tables in multiple processes. Do you think we should add note in our badge documents to make sure people understand that why sometimes a task of ingesting nodes may fail even after retries if they are trying to add same Badge node in parallel tasks?

databuilder/publisher/neo4j_csv_publisher.py

stale · 2020-11-26T09:06:17Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

…nto asm-deadlock-fix

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

databuilder/publisher/neo4j_csv_publisher.py

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

feng-tao · 2020-12-02T17:53:56Z

CI fails?

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

allisonsuarez added 3 commits November 9, 2020 13:25

renamed for spellcheck, and added some logic to handle deadlock issue…

9c17a67

… with badge node, still a draft Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

fixed lint issues

c00b392

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

resolved merge conflict

b7dba75

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

allisonsuarez force-pushed the asm-deadlock-fix branch from 05c2b8e to b7dba75 Compare November 10, 2020 21:48

allisonsuarez added 2 commits November 10, 2020 14:39

logic oopsie

d8a0721

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

lint...

fd32406

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

allisonsuarez marked this pull request as ready for review November 10, 2020 23:08

allisonsuarez requested review from dikshathakur3119, feng-tao and jinhyukchang as code owners November 10, 2020 23:08

feng-tao reviewed Nov 12, 2020

View reviewed changes

databuilder/publisher/neo4j_csv_publisher.py Outdated Show resolved Hide resolved

dikshathakur3119 reviewed Nov 12, 2020

View reviewed changes

databuilder/publisher/neo4j_csv_publisher.py Outdated Show resolved Hide resolved

stale bot added the stale stalebot believes this issue/PR is no longer active label Nov 26, 2020

changed retries removed comments

f2d0541

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

stale bot removed the stale stalebot believes this issue/PR is no longer active label Nov 30, 2020

allisonsuarez added 2 commits November 30, 2020 10:09

Merge branch 'master' of github.com:amundsen-io/amundsendatabuilder i…

4823160

…nto asm-deadlock-fix

added config key and set

ac55d01

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

allisonsuarez requested review from feng-tao and dikshathakur3119 November 30, 2020 21:35

lint

5564526

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

feng-tao reviewed Dec 1, 2020

View reviewed changes

databuilder/publisher/neo4j_csv_publisher.py Outdated Show resolved Hide resolved

feng-tao reviewed Dec 1, 2020

View reviewed changes

databuilder/publisher/neo4j_csv_publisher.py Outdated Show resolved Hide resolved

feng-tao reviewed Dec 1, 2020

View reviewed changes

databuilder/publisher/neo4j_csv_publisher.py Outdated Show resolved Hide resolved

constants and rename

054c132

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

allisonsuarez requested a review from feng-tao December 1, 2020 16:46

feng-tao added the keep fresh Disables stalebot from closing an issue label Dec 2, 2020

feng-tao approved these changes Dec 3, 2020

View reviewed changes

im a dumb dumb

a0fb4a1

Signed-off-by: Allison Suarez Miranda <asuarezmiranda@lyft.com>

allisonsuarez merged commit 9fd1513 into master Dec 3, 2020

allisonsuarez deleted the asm-deadlock-fix branch December 3, 2020 20:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: retry loop for exception caused by deadlock on badge node #404

fix: retry loop for exception caused by deadlock on badge node #404

allisonsuarez commented Nov 9, 2020

allisonsuarez commented Nov 10, 2020

dikshathakur3119 left a comment

stale bot commented Nov 26, 2020

feng-tao commented Dec 2, 2020

fix: retry loop for exception caused by deadlock on badge node #404

fix: retry loop for exception caused by deadlock on badge node #404

Conversation

allisonsuarez commented Nov 9, 2020

Summary of Changes

Tests

Documentation

CheckList

allisonsuarez commented Nov 10, 2020

dikshathakur3119 left a comment

Choose a reason for hiding this comment

stale bot commented Nov 26, 2020

feng-tao commented Dec 2, 2020