Improve the deduplication id. #149

ryanfox1985 · 2023-02-28T16:28:17Z

No description provided.

ryanfox1985 · 2023-02-28T16:29:40Z

lib/active_job/queue_adapters/active_elastic_job_adapter.rb

            }
          }
        end

        def fifo_required_params(serialized_job)
          parsed_job = JSON.parse(serialized_job)
+          deduplication_id = OpenSSL::Digest::MD5.hexdigest(parsed_job['job_class'] + parsed_job['arguments'].to_s)


basically this is the main change and the motivation of this PR (other changes are caused by Rubocop), I think the job_id is a random uuid and the deduplication id should be something deterministic like my suggestion.

samsonjs · 2023-03-06T23:22:27Z

It seems like this is really dependent on a specific use-case. Let's say that I have a queue of transactions for a bank account and someone withdraws $100 twice resulting in 2 jobs with the class WithdrawalJob and arguments ['some-account-id', 100] then those should definitely not be de-duplicated.

ryanfox1985 · 2023-03-07T07:05:56Z

In my opinion, your example is missing an argument with timestamp. I think for a bank is essential when the withdrawal is performed.

samsonjs · 2023-03-07T09:57:33Z

Sorry that wasn’t very clear. The specifics aren’t important, my point is only that because it depends on the application it should probably be optional.

ryanfox1985 · 2023-03-07T10:19:59Z

In my opionion, is a wrong usage having a random UUID for deduplication_id, as it is now is not possible to take adventajes of AWS FIFO queues.

The motivation deduplication_id of FIFO queues is to not to process duplicated message for a certain window time (I think is 5 minutes).

Your withdraw example using FIFO queues has a design problem, if you want to use FIFO then I think you should call WithdrawalJob with ['some-account-id', 100, timestamp] as I proposed.

ryanfox1985 · 2023-03-07T10:21:56Z

as FIFO queues is mandatory the deduplication_id, having a random UUID sounds a quick patch to be able to use FIFO queues.

ryanfox1985 · 2023-03-07T10:45:15Z

~~my bad, seems not mandatory.~~

https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/using-messagededuplicationid-property.html

~~So then, I think is the same use Random UUID than without deduplication_id.~~

ryanfox1985 · 2023-03-07T11:35:37Z

~~perhaps, a good approach could be to not set this field by default and have a property in the library to enable to set this field.~~

~~So, the user could enable/disable the deduplication_id, what do you think?~~

ryanfox1985 · 2023-03-07T13:36:40Z

Sorry, but I think both (message_group_id and deduplication_id) are mandatory for FIFO queues.

ryanfox1985 · 2023-03-07T13:52:05Z

do you prefer this approach? 48ba66d

btw I dislike, I think is a wrong usage of FIFO queues...

samsonjs · 2023-03-08T04:16:57Z

do you prefer this approach? 48ba66d

I'm not sure what the best approach is to configure this behaviour. Maybe some of the maintainers more familiar with the codebase can chime in on that. Maybe the deduplication ID could be an optional param that's set on each job somehow because it sounds like you may want/need different behaviour per job. The parameter could be a symbol describing the strategy so you don't have to calculate the hash yourself and pass that in.

Looking at the SQS docs though I think they basically state that depending on your specific use-case you may want to provide a deduplication ID in order to make sure messages with the same content are treated as unique, or on the other hand to make sure that messages with slightly different content are treated as the same.

The producer should provide message deduplication ID values for each message in the following scenarios:

Messages sent with identical message bodies that Amazon SQS must treat as unique.

Messages sent with identical content but different message attributes that Amazon SQS must treat as unique.

Messages sent with different content (for example, retry counts included in the message body) that Amazon SQS must treat as duplicates.

The next section about single producer/consumer also has more info about what to do in different circumstances, which really makes me think it should be parameterized somehow.

ryanfox1985 · 2023-03-08T07:28:31Z

In my opinion it shouldn't be parametrized because deduplication_id is mandatory and Fifo queues has this behaviour.

Then the implementation should cover this deduplication thing otherwise this deduplication thing is totally useless.

samsonjs · 2023-03-08T16:37:03Z

Using the ActiveJob ID isn't useless. It serves the purpose of preventing 2 jobs with identical parameters from being detected as a duplicate by mistake. Other strategies are valid too but it depends on your specific use-case.

ryanfox1985 · 2023-03-08T20:26:50Z

Not depends in user case, it's a design/implementation problem.

Your theory doesn't have sense because if it has then the message_deduplication_id would be optional and not mandatory.

Improve the deduplication id.

e6574cd

ryanfox1985 commented Feb 28, 2023

View reviewed changes

SHA256 instead of md5

aa8c8ce

array with deduplication queues based in content.

48ba66d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the deduplication id. #149

Improve the deduplication id. #149

ryanfox1985 commented Feb 28, 2023

ryanfox1985 Feb 28, 2023 •

edited

Loading

samsonjs commented Mar 6, 2023

ryanfox1985 commented Mar 7, 2023

samsonjs commented Mar 7, 2023

ryanfox1985 commented Mar 7, 2023

ryanfox1985 commented Mar 7, 2023

ryanfox1985 commented Mar 7, 2023 •

edited

Loading

ryanfox1985 commented Mar 7, 2023 •

edited

Loading

ryanfox1985 commented Mar 7, 2023

ryanfox1985 commented Mar 7, 2023

samsonjs commented Mar 8, 2023

ryanfox1985 commented Mar 8, 2023

samsonjs commented Mar 8, 2023

ryanfox1985 commented Mar 8, 2023 •

edited

Loading

Improve the deduplication id. #149

Are you sure you want to change the base?

Improve the deduplication id. #149

Conversation

ryanfox1985 commented Feb 28, 2023

ryanfox1985 Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

samsonjs commented Mar 6, 2023

ryanfox1985 commented Mar 7, 2023

samsonjs commented Mar 7, 2023

ryanfox1985 commented Mar 7, 2023

ryanfox1985 commented Mar 7, 2023

ryanfox1985 commented Mar 7, 2023 • edited Loading

ryanfox1985 commented Mar 7, 2023 • edited Loading

ryanfox1985 commented Mar 7, 2023

ryanfox1985 commented Mar 7, 2023

samsonjs commented Mar 8, 2023

ryanfox1985 commented Mar 8, 2023

samsonjs commented Mar 8, 2023

ryanfox1985 commented Mar 8, 2023 • edited Loading

ryanfox1985 Feb 28, 2023 •

edited

Loading

ryanfox1985 commented Mar 7, 2023 •

edited

Loading

ryanfox1985 commented Mar 7, 2023 •

edited

Loading

ryanfox1985 commented Mar 8, 2023 •

edited

Loading