Do not blindly delete duplicate schedules #389

izimobil · 2021-03-02T12:34:32Z

Because there are no unique constraints on schedule models (except for
SolarSchedule), schedules tables can contain duplicates, those
duplicate schedules should not be blindly deleted when encountered
because they can be linked to existing tasks and would cause the tasks
to be also deleted (on delete cascade).

Instead we now just return the first duplicate found, just to avoid
creating further duplicates.

This fixes issue #322.

Because there are no unique constraints on schedule models (except for SolarSchedule), schedules tables can contain duplicates, those duplicate schedules should not be blindly deleted when encountered because they can be linked to existing tasks and would cause the tasks to be also deleted (on delete cascade). Instead we now just return the first duplicate found, just to avoid creating further duplicates. This fixes issue celery#322.

auvipy · 2021-03-02T13:17:19Z

can you check if this PR https://github.com/celery/django-celery-beat/pull/269/files is somewhat related?

izimobil · 2021-03-02T13:44:06Z

can you check if this PR https://github.com/celery/django-celery-beat/pull/269/files is somewhat related?

It is related in the sense that it would prevent future duplicates, the idea of adding unique constraints is good, but there should be a warning in the changelog because it's not fully backwards compatible, existing code may try to create duplicate schedules (which currently works) and get a IntegrityError exception with the constraints added.

Furthermore, because existing databases can already contain duplicate schedules, the from_schedule methods should not be modified.

IMO you can merge my PR first (the bug is really critical in the sense that it deletes existing tasks !), release a bugfix version, and add constraints on a future major version.

(If the PR is reworked to only add constraints, the tests on my PR will have to be removed.)

auvipy · 2021-03-02T14:00:30Z

@arnau126 can you check this please

auvipy

hope return cls.objects.filter(**spec).first() won't create any regression as bug it is bug fix

izimobil · 2021-03-02T14:04:50Z

hope return cls.objects.filter(**spec).first() won't create any regression as bug it is bug fix

Frankly, it can't be worse than it is now, I got tasks that disappeared on my production environment and it's really difficult to repair when you have hundreds of tasks.

Also, since the cls.objects.filter(**spec).first() is on the block handling the MultipleObjectsReturned exception, you are guaranteed to retrieve the schedule instance and not None.

auvipy · 2021-03-02T14:09:30Z

just waiting for another eye for cross-checking. will merge soon after that

hartwork

Hi!

This improves over the existing code for sure. 👍
What I am wondering is:

Why are we using cls() over cls.objects.create() so that those functions sometimes return model instances that are persisted already and sometimes return instances that are not yet persisted in the database. Is that a bug or a feature?
It seems like we effectively have a tailor-made get_or_create here. Why not use what Django offers and simplify most of this code away for good?
The added test cases are effectively 3 times the same test. Personally, I would use parameterized.expand and a single parametrized test to avoid duplication, but it's just my two cents and I'm aware there there are multiple schools on that subject.

Best, Sebastian

hartwork · 2021-03-02T16:40:02Z

t/unit/test_models.py

+        # create 2 duplicates schedules
+        sched1 = CrontabSchedule.objects.create(hour="4")
+        CrontabSchedule.objects.create(hour="4")
+        self.assertEqual(CrontabSchedule.objects.count(), 2)


Just an idea:
I think adding .filter(hour="4") here and in line 93 would be a cheap way to make this test more robust to any potential cross-test effects. It would help fight future flakiness. What do you think?
PS: This applies to the other two tests as well.

hartwork · 2021-03-02T16:46:56Z

t/unit/test_models.py

+    def test_duplicate_schedules(self):
+        # Duplicates cannot be tested for solar schedules because of the
+        # unique constraints in the SolarSchedule model
+        pass


I'm not a big fan of adding dead code, to be honest. There are other things that we cannot test for either, right? Does this code really add value?

See next comment

I checked all. Which one?

Given the constraint is here from the very beginning (ec46367) we can indeed drop the MultipleObjectsReturned block (as well as the test method of course)

So the test would be dropped as well, excellent. 👍

hartwork · 2021-03-02T16:52:26Z

django_celery_beat/models.py

        except MultipleObjectsReturned:
-            cls.objects.filter(**spec).delete()
-            return cls(**spec)
+            # unique_together constraint should not permit reaching this code,
+            # but just in case, we return the first schedule found
+            return cls.objects.filter(**spec).first()


In the interest of avoiding dead code, why not drop these lines altogether?

Given the constraint is here from the very beginning (ec46367) we can indeed drop the MultipleObjectsReturned block (as well as the test method of course)

- removed MultipleObjectsReturned code block for solar schedules as unique_togther constraint safely prevents duplicates - removed related test method

izimobil · 2021-03-02T17:47:29Z

Hi,

I've updated the PR.

* Why are we using `cls()` over `cls.objects.create()` so that those functions sometimes return model instances that _are_ persisted already and sometimes return instances that _are not_ yet persisted in the database. Is that a bug or a feature?

* It seems like we effectively have a tailor-made `get_or_create` here.  Why not use what Django offers and simplify most of this code away for good?

I asked myself the same question regarding why the schedule isn't saved, but keep in mind I don't want to break anything with this bug fix PR. Improving the code should be the role of another PR IMHO.

* The added test cases are effectively 3 times the same test.  Personally, I would use [`parameterized.expand`](https://pypi.org/project/parameterized/) and a single parametrized test to avoid duplication, but it's just my two cents and I'm aware there there are multiple schools on that subject.

I won't take the responsability of adding yet another requirement just for that, instead I've factorized the code in a mixin class.

Thanks for your review, regards.

hartwork

@izimobil thanks for taking this further! 👍 🙏

hartwork · 2021-03-02T18:13:53Z

Travis CI taking ages to start builds as usual… 🤷‍♂️

izimobil · 2021-03-02T22:25:55Z

@auvipy I think you can safely merge this one, I can't believe no one else came across the dramatic impacts of previous code !
Cheers.

auvipy · 2021-03-03T12:58:43Z

caught by cold flu. but reviewing and merging it tomorrow for sure

hartwork · 2021-03-03T14:00:55Z

caught by cold flu. but reviewing and merging it tomorrow for sure

All the best!

izimobil added 2 commits March 2, 2021 13:20

Fixed flake8 warnings

b8ca709

izimobil mentioned this pull request Mar 2, 2021

Improve upon (currently dangerous) .from_schedule code #269

Closed

auvipy reviewed Mar 2, 2021

View reviewed changes

hartwork reviewed Mar 2, 2021

View reviewed changes

izimobil added 3 commits March 2, 2021 18:23

More robust tests

1069c4e

Removed MultipleObjectsReturned block

1c00ed5

- removed MultipleObjectsReturned code block for solar schedules as unique_togther constraint safely prevents duplicates - removed related test method

DRY

8d846c8

hartwork approved these changes Mar 2, 2021

View reviewed changes

auvipy approved these changes Mar 3, 2021

View reviewed changes

auvipy merged commit bab79d9 into celery:master Mar 3, 2021

hartwork mentioned this pull request Mar 3, 2021

The task celery.backend_cleanup will delete my tasks added before if have the same crontab time #322

Closed

auvipy mentioned this pull request Mar 4, 2021

Removing tasks from celery_beat config doesn't remove them from database. #248

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not blindly delete duplicate schedules #389

Do not blindly delete duplicate schedules #389

izimobil commented Mar 2, 2021

auvipy commented Mar 2, 2021

izimobil commented Mar 2, 2021

auvipy commented Mar 2, 2021

auvipy left a comment

izimobil commented Mar 2, 2021 •

edited

auvipy commented Mar 2, 2021

hartwork left a comment

hartwork Mar 2, 2021

izimobil Mar 2, 2021

hartwork Mar 2, 2021

izimobil Mar 2, 2021

hartwork Mar 2, 2021

izimobil Mar 2, 2021

hartwork Mar 2, 2021

hartwork Mar 2, 2021

izimobil Mar 2, 2021 •

edited

izimobil commented Mar 2, 2021 •

edited

hartwork left a comment

hartwork commented Mar 2, 2021

izimobil commented Mar 2, 2021

auvipy commented Mar 3, 2021

hartwork commented Mar 3, 2021

Do not blindly delete duplicate schedules #389

Do not blindly delete duplicate schedules #389

Conversation

izimobil commented Mar 2, 2021

auvipy commented Mar 2, 2021

izimobil commented Mar 2, 2021

auvipy commented Mar 2, 2021

auvipy left a comment

Choose a reason for hiding this comment

izimobil commented Mar 2, 2021 • edited

auvipy commented Mar 2, 2021

hartwork left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izimobil Mar 2, 2021 • edited

Choose a reason for hiding this comment

izimobil commented Mar 2, 2021 • edited

hartwork left a comment

Choose a reason for hiding this comment

hartwork commented Mar 2, 2021

izimobil commented Mar 2, 2021

auvipy commented Mar 3, 2021

hartwork commented Mar 3, 2021

izimobil commented Mar 2, 2021 •

edited

izimobil Mar 2, 2021 •

edited

izimobil commented Mar 2, 2021 •

edited