-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the deduplication id. #149
base: master
Are you sure you want to change the base?
Conversation
} | ||
} | ||
end | ||
|
||
def fifo_required_params(serialized_job) | ||
parsed_job = JSON.parse(serialized_job) | ||
deduplication_id = OpenSSL::Digest::MD5.hexdigest(parsed_job['job_class'] + parsed_job['arguments'].to_s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically this is the main change and the motivation of this PR (other changes are caused by Rubocop), I think the job_id is a random uuid and the deduplication id should be something deterministic like my suggestion.
It seems like this is really dependent on a specific use-case. Let's say that I have a queue of transactions for a bank account and someone withdraws $100 twice resulting in 2 jobs with the class |
In my opinion, your example is missing an argument with timestamp. I think for a bank is essential when the withdrawal is performed. |
Sorry that wasn’t very clear. The specifics aren’t important, my point is only that because it depends on the application it should probably be optional. |
In my opionion, is a wrong usage having a random UUID for deduplication_id, as it is now is not possible to take adventajes of AWS FIFO queues. The motivation Your withdraw example using FIFO queues has a design problem, if you want to use FIFO then I think you should call |
as FIFO queues is mandatory the |
|
|
Sorry, but I think both (message_group_id and deduplication_id) are mandatory for FIFO queues. |
do you prefer this approach? 48ba66d btw I dislike, I think is a wrong usage of FIFO queues... |
I'm not sure what the best approach is to configure this behaviour. Maybe some of the maintainers more familiar with the codebase can chime in on that. Maybe the deduplication ID could be an optional param that's set on each job somehow because it sounds like you may want/need different behaviour per job. The parameter could be a symbol describing the strategy so you don't have to calculate the hash yourself and pass that in. Looking at the SQS docs though I think they basically state that depending on your specific use-case you may want to provide a deduplication ID in order to make sure messages with the same content are treated as unique, or on the other hand to make sure that messages with slightly different content are treated as the same.
The next section about single producer/consumer also has more info about what to do in different circumstances, which really makes me think it should be parameterized somehow. |
In my opinion it shouldn't be parametrized because deduplication_id is mandatory and Fifo queues has this behaviour. Then the implementation should cover this deduplication thing otherwise this deduplication thing is totally useless. |
Using the ActiveJob ID isn't useless. It serves the purpose of preventing 2 jobs with identical parameters from being detected as a duplicate by mistake. Other strategies are valid too but it depends on your specific use-case. |
Not depends in user case, it's a design/implementation problem. Your theory doesn't have sense because if it has then the message_deduplication_id would be optional and not mandatory. |
No description provided.