Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory-store incoming messages for some time, to give ability to gather all messages in a flow upon DLQ #48

Open
stolsvik opened this issue Jan 15, 2022 · 0 comments
Labels
thoughts Issues describing some thoughts around a subject

Comments

@stolsvik
Copy link
Contributor

In a DLQ situation, it would be awesome if a monitor-system could upon the DLQ instantly send out a "broadcast" on the Mats fabric for all messages that have been part of the flow, to store these to aid in debugging and resolution.

The idea is that all Stage Processors would then add their incoming messages to a central memory store in the MatsFactory - where the MatsFactory would try to keep a good set of messages for a reasonable time until it "should have DLQed", e.g. 10 minutes - or for a max of e.g. 100 MB. This would purely be best effort; If the max was reached, it would ditch messages. Large messages would pose a problem, so some "sanity" would have to be implemented, e.g. that large messages simply was ditched, or max 1 outstanding per stage, or similar. (In my understanding, this might not pose too big of a problem: Large messages are typically the result of some query, not a part of a "process this transaction" flow - where the latter are the ones that make for difficult/interesting DLQs, and the former (queries) both typically doesn't have complex business logic (and thus don't DLQ), and aren't really important wrt. debugging).
It would be possible to include a "this flow is finished" broadcast (when a stage doesn't have an outgoing message, and itself finishes ok), to empty out the stores on the different MatsFactories for that flowId. But this might not be worth the chatter, compared to just "best effort" and some max time limit.

For debugging, one could then "step through" the entire flow, from initiation to and including the DLQ point. (This is also the intention of the KeepTrace.FULL, but that solution is pretty high overhead in that absolutely all flows keeps all info about previous steps on the wire. Unless you explicitly downgrade to a lower KeepTrace level. At time of writing, the COMPACT is default, which do not give the actual message contents.)

For resolution, one could then choose to restart the flow from an earlier point by simply sending the older / a previous message back onto its queue, instead of reissuing the actual DLQ.

@stolsvik stolsvik added the thoughts Issues describing some thoughts around a subject label Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
thoughts Issues describing some thoughts around a subject
Projects
None yet
Development

No branches or pull requests

1 participant