-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#135 #141 3 event related fixes (deduplication bug, configurable deduplication lookback, and larger handler size) #151
Conversation
Previously events would age out even when being de-duplicated frequently, resulting in spurious identical events making their way to the next agent in the pipeline. This isn't the most efficient code (same as previously), but not worth optimizing further until we've decided what to do with a more generic deduplication module
This isn't a complete fix because if the event.payload is (say) exactly 16MB then the handler will be >16MB and the same problem will occur. I'll push this in for now though, in practice it means it's much less likely to occur.
… a large number) I think this is preferable default behavior? Potential performance issues should be addressed via an efficient deduplication library using indexed DB fields
oh wait 1), i didn't notice you had a test suite for the website_agent ... let me add some test cases to that first oh wait 2), for some reason my sync doesn't seem to have worked, these don't have the recent basic auth changes Lemme close this and sort my issues out.... |
@@ -41,7 +43,7 @@ class WebsiteAgent < Agent | |||
|
|||
default_schedule "every_12h" | |||
|
|||
UNIQUENESS_LOOK_BACK = 30 | |||
UNIQUENESS_LOOK_BACK = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the right idea, although I think 10000 is too big as a default. Loading 10,000 events and their payloads could be quite slow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"although I think 10000 is too big as a default. Loading 10,000 events and their payloads could be quite slow."
Here's the way I'm thinking about it...
If you have 10,000 events in your buffer then one of two things is going to happen:
- it's a bit slow to deduplicate (if lookback=10,000)
- you get bombarded by nearly 10,000 events in your email, if lookback is a small number (say 1000)
Neither case is ideal, but functionally the first case is clearly preferable. The performance issue should be resolved longer term by doing deduplication centrally using an indexed field.
fwiw i have an agent with 3000 events and it seems reasonably snappy still
Let me know if I can help! |
@@ -0,0 +1,19 @@ | |||
# Increase handler size to 16MB (consistent with events.payload) | |||
|
|||
class ChangeHandlerToMediumText < ActiveRecord::Migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you send a PR for this migration too? It's helpful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see PR #152?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, somehow missed it. Thx!
Fix 1) #135 update events' expiry date on deduplication
Previously events would age out even when being de-duplicated
frequently, resulting in spurious identical events making their way to
the next agent in the pipeline. This isn't the most efficient code (same
as previously), but not worth optimizing further until we've decided
what to do with a more generic deduplication module
Fix 2) #141 change delayed_job.handler to mediumtext (16MB) like event.payload
This isn't a complete fix because if the event.payload is (say) exactly
16MB then the handler will be >16MB and the same problem will occur.
I'll push this in for now though, in practice it means it's much less
likely to occur.
Fix 3) #135 set deduplication "look back" to be configurable (and default to a large number)
I think this is preferable default behavior? Potential performance
issues should be addressed via an efficient deduplication library using
indexed DB fields