Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce mutex to ensure ordering consistency #218

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mkgs
Copy link

@mkgs mkgs commented May 10, 2020

I'm opening this PR to spark a discussion on how to potentially fix the race condition and order corruption issues that can frequently occur when using this library at scale. This PR is definitely not production-ready in any capacity.

I propose an OrderedMutex model, where a db lock on a mutex instance is required before modifying any order. The key for a mutex instance is compiled from a hash of db_table and the order_with_respect_to fields.

class OrderedMutex(models.Model):
    """
    Mutex model that ensures ordering integrity
    """

    key = models.PositiveIntegerField(editable=False, primary_key=True)
class OrderedModelBase(models.Model):

    ....

    def _get_mutex(self):
        hashes = [hash(self._meta.db_table)]
        if self.order_with_respect_to:
            order_with_respect_to = [self.order_with_respect_to] if isinstance(self.order_with_respect_to, str) \
                else self.order_with_respect_to
            hashes.extend([hash(get_lookup_value(self, attr)) for attr in order_with_respect_to])
        return OrderedMutex.objects.select_for_update().get_or_create(key=reduce(xor, hashes) & 0xFFFFFFFF)[0]

What do y'all think of this approach?

@imomaliev
Copy link
Collaborator

I am not sure the mutex is a way to go in this case

@imomaliev
Copy link
Collaborator

Maybe something along those lines https://github.com/viewflow/django-fsm/blob/master/django_fsm/__init__.py#L422. I need to do some research on this

@mkgs
Copy link
Author

mkgs commented Oct 15, 2020

Any other thoughts on this? Perhaps another backend (redis?) for the mutex?

We are moving to a fork of the library, as we continue to encounter order corruption problems. Whatever the implementation, the library should be responsible for order integrity imo.

@shuckc
Copy link
Member

shuckc commented Apr 15, 2021

I doubt taking select ... for update row locks over a dedicated mutex table would be better than the approach in #190 which takes them in the OrderedModel table partition itself. Although of note you have covered more of the API calls - #190 fixes only to(), you covered save, delete, swap, to, above and below.

Using another storage system (like cache system) for the locks isn't great in a hosted environment where one store goes away for a bit and you don't want to cascade failures. Did you make any further findings as to what worked for you?

@mkgs
Copy link
Author

mkgs commented Apr 15, 2021

#190 is a good improvement but I believe there still could be some strange behavior in instances where a new row has been added while a simultaneous ordering operation is occurring.

The dedicated mutex table feels hacky, and I proposed redis as an alternative since it is better suited to grabbing a temporary lock on an arbitrary key value. It also would remove the need for adding any database migrations to the package. However, I definitely hear you on the potential for introducing additional points of failure.

We'll keep using this fork as, however hacky it seems, it is definitely working. We may just change the key field on OrderedMutex to PositiveSmallIntegerField to reduce the key space size.

If I have some spare time I'll take a look at starting another fork that uses redis.

@shuckc
Copy link
Member

shuckc commented Apr 16, 2021

Hi @mkgs as I mentioned on #190 I do think additions where the ordering is calculated on the client side are the weak point of the row locks. Since you are using this code I will aim for the changes in #190 to be applied in such a way that they can be subclassed/switched to an external synchroniser. Perhaps something like providing the synchroniser instance in the Model Meta data would be suitable, and we can ship a default taking row locks.

I'm not against having a cache-based synchroniser upstream, probably not default though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants