Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Disable/enable constraints for required tables only #5801

Closed
wants to merge 2 commits into from

Conversation

@akaariai
Copy link
Member

akaariai commented Dec 9, 2015

The idea is to speed up tests on mssql by disabling/enabling constraints only for those tables that have actually changed.

I believe this could give roughly two orders of magnitude better performance for constraint checking. The reason is that currently mssql checks every table for every disable/enable constraint checks call. This is around 1500 tables * 225 calls, that is roughly 330 000 disable/enable cycles. With the patch, the amount of disable/enable cycles is 473. (EDIT: Actually, currently there are roughly 750 calls to disable constraints, of which around 500 are from a loop in tests.serializers.test_data).

With this approach, it is possible that a constraint in some other table might fail if we delete a row from a referenced table, but I don't believe this is a problem for Django's test suite, and it can be a problem in fixture loading only in extreme cases (where one updates an already existing row's column pointed by a row in some other table). Of course, this could be fixed simply by adding the other table to the list of tables to disable/enable constraints for.

To get the speed benefit for mssql, the mssql backend should only disable/enable constraints for the tables required. That is, to see the full speed benefit of this change, the mssql backend needs to be updated.

This is an alternate to #5438.

I am not intending to continue work on this pull request in the immediate future - others are welcome to take over my work.

instance_count = {}
for (func, pk, klass, datum) in test_data:
with connection.constraint_checks_disabled():

This comment has been minimized.

Copy link
@akaariai

akaariai Dec 9, 2015

Author Member

Note that changing the placement of this call outside the loop drops the call count for constraint_checks_disabled() by roughly 500. In other words, this place is responsible for 2/3 of all calls.

@akaariai akaariai changed the title Disable/enable constraints for required tables only [WIP] Disable/enable constraints for required tables only Dec 9, 2015
@akaariai

This comment has been minimized.

Copy link
Member Author

akaariai commented Dec 28, 2015

I've ran the test suite with django-mssql and this patch (mssql requires a couple of minor changes for the tests to run).

The results are very promising, I got a runtime of 1800 seconds (with profiling enabled). I'm currently running the test suite without this patch to compare the runtime, I'll report the results tomorrow.

The patch requires a bit more work, as there are some new failures. I don't believe those to be anything fundamental, instead they just show a couple of places where we need to be a bit more careful and make sure the tables passed to disable_constraints() actually exist.

@akaariai

This comment has been minimized.

Copy link
Member Author

akaariai commented Dec 29, 2015

On my laptop the runtime of master branch without the patch is "only" about an hour (with the patch it was around half an hour). So, a nice improvement, but nothing fundamental.

@@ -432,26 +436,34 @@ def validate_no_broken_transaction(self):
# ##### Foreign key constraints checks handling #####

@contextmanager
def constraint_checks_disabled(self):
def constraint_checks_disabled(self, table_names):

This comment has been minimized.

Copy link
@timgraham

timgraham Dec 29, 2015

Member

For backwards compatibility, should table_names be optional? (at least for the backported versions?)

This comment has been minimized.

Copy link
@manfre

manfre Dec 29, 2015

Contributor

It needs to be optional for backwards compatibility. I can't decide whether it would be better to be optional in master too, or force its usage to specify specific tables to avoid unintentional performance hits.

@akaariai

This comment has been minimized.

Copy link
Member Author

akaariai commented Dec 30, 2015

I'll try to write a conservative version for Django 1.8.

@akaariai

This comment has been minimized.

Copy link
Member Author

akaariai commented Dec 30, 2015

I think I know the real reason for the slowdown.

Django-mssql treats empty list in table_names argument of check_constraints() similarly to None. That is, when Django requests the backend to check constraints for no tables, django-mssql checks all tables. The check for empty list happens a lot in 1.8, so for that reason the tests are very slow.

I'll post a patch to django-mssql separately, and I'll open a separate PR for the work I've done on 1.8.

@akaariai

This comment has been minimized.

Copy link
Member Author

akaariai commented Dec 30, 2015

See #5894 for 1.8 version

@akaariai

This comment has been minimized.

Copy link
Member Author

akaariai commented Dec 30, 2015

We should likely include only the changes in the 1.8 version in master, but without the database feature (that is, all databases must support disable/enable constraints with tables argument).

@akaariai

This comment has been minimized.

Copy link
Member Author

akaariai commented Dec 31, 2015

See #5905 for a minimal yet effective solution to mssql's test speed problem.

@akaariai akaariai closed this Dec 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.