Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect cascading deletes in cases of non-primary foreign keys #4

Closed
dimitri-yatsenko opened this issue Jan 3, 2013 · 7 comments
Closed
Assignees
Labels

Comments

@dimitri-yatsenko
Copy link
Member

Currently, DataJoint implements its own cascading deletes instead of relying on MySQL's native cascading deletes.

This solution

  • allows reviewing the contents to be deleted
  • compensates for MySQL's multiple-path problem for cascading deletes

dj.BaseRelvar/del generates the lists of all dependent tables and deletes from them, starting from the bottom. Each delete is restricted by the top relation. This rule works well when all foreign key fields are also primary key fields in the tables involved but can result in deleting non-dependent tuples if one of the tables along the cascade includes non-primary fields in its foreign key.

This issue may be fixed by simply not cascading down from any table that does not include all its foreign key fields in their primary key.

@ghost ghost assigned dimitri-yatsenko Jan 3, 2013
@aecker
Copy link
Contributor

aecker commented Jan 3, 2013

Can you give an example? I don't understand how something can have a foreign key constraint but be non-dependent.

Does your solution imply that under certain conditions the delete will not cascade properly and one will have to issue multiple deletes starting from different tables?

@dimitri-yatsenko
Copy link
Member Author

Here is an example with three tables x.A, x.B, and x.C.

%{
x.A (manual)   # test dummy
a  : int  #  A's id
%}

%{
x.B (manual)  #  test dummy
b  : int  #  B's id
----
-> x.A
%}

%{
x.C (manual)  # test dummy
-> x.B
%}

Currently, del(x.A & 'a=1') will first recurse down the hierarchy to delete x.C & (x.A & 'a=1') and x.B & (x.A & 'a=1'). Since x.A and x.C do not share any fields in common, deleting x.C & (x.A & 'a=1') will delete everything in x.C.

The problem arises because x.B makes a non-primary reference to x.A. New users rarely use non-primary references, so the problem hasn't come up despite being there all along.

Proposed fix

I will fix this by making the delete explicitly refuse to cascade across non-primary references. I just need to come up with a very clear message, something like "Table x.B has dependent tuples. Please delete them first before deleting from x.A".

An alternative solution is to rely on MySQLs cascading delete functionality. Unfortunately, it does not seem to provide a way to list the deleted contents before committing. It also has problems resolving multiple dependence paths. So for now we are forced to use our own cascading mechanisms.

@aecker
Copy link
Contributor

aecker commented Jan 5, 2013

Can we not use the correct restriction for cascading in such a case? Wouldn't x.C & (x.B & (x.A & 'a=1')) work? In other words, every time we encounter a non-PK reference, we use that table for any restrictions down the hierarchy. Potential caveat: restrictions can become quite complex if the hierarchy is deep. However, the complexity should depend on the number of non-PK references, which are usually small. Am I missing something?

@dimitri-yatsenko
Copy link
Member Author

Yes, that solution would work correctly but in many cases could be prohibitively slow. We do provide non-cascading delete dj.Relvar/delQuick, which could be used in cases where the overhead associated with cascading deletes is excessive.

So we have three alternatives:

  1. A solution that relies on MySQL's cascade functionality. Fast and correct but has issues. Requires upgrading existing schema to use ON DELETE CASCADE.
  2. Refuse to cascade across non-PK references. Prompts the user to explicitly delete from the dependent table.
  3. Cascade across non-PK reference by adjusting the restriction. Correct but potentially slow.

Does option 3 make the most sense?

@aecker
Copy link
Contributor

aecker commented Jan 6, 2013

Yes I think so. We can issue a warning that this may be slow but I would argue it's the logical solution since it doesn't remove any functionality.

Also, think about what you would do if DJ told you it can't cascade the delete. You'd manually issue the exact query option 3 would do, in which case there would be no speed benefit for option 2 anyway.

I'm pretty sure option 1 isn't going to work in pretty much any of our schemas.

@dimitri-yatsenko
Copy link
Member Author

okay, implementing solution 3.

dimitri-yatsenko added a commit that referenced this issue Jan 25, 2013
@dimitri-yatsenko
Copy link
Member Author

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants