Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#135 #141 3 event related fixes (deduplication bug, configurable deduplication lookback, and larger handler size) #151

Closed
wants to merge 3 commits into from

Conversation

Alex-Ikanow
Copy link
Contributor

Fix 1) #135 update events' expiry date on deduplication
Previously events would age out even when being de-duplicated
frequently, resulting in spurious identical events making their way to
the next agent in the pipeline. This isn't the most efficient code (same
as previously), but not worth optimizing further until we've decided
what to do with a more generic deduplication module

Fix 2) #141 change delayed_job.handler to mediumtext (16MB) like event.payload
This isn't a complete fix because if the event.payload is (say) exactly
16MB then the handler will be >16MB and the same problem will occur.
I'll push this in for now though, in practice it means it's much less
likely to occur.

Fix 3) #135 set deduplication "look back" to be configurable (and default to a large number)
I think this is preferable default behavior? Potential performance
issues should be addressed via an efficient deduplication library using
indexed DB fields

Previously events would age out even when being de-duplicated
frequently, resulting in spurious identical events making their way to
the next agent in the pipeline. This isn't the most efficient code (same
as previously), but not worth optimizing further until we've decided
what to do with a more generic deduplication module
This isn't a complete fix because if the event.payload is (say) exactly
16MB then the handler will be >16MB and the same problem will occur.
I'll push this in for now though, in practice it means it's much less
likely to occur.
… a large number)

I think this is preferable default behavior? Potential performance
issues should be addressed via an efficient deduplication library using
indexed DB fields
@Alex-Ikanow
Copy link
Contributor Author

oh wait 1), i didn't notice you had a test suite for the website_agent ... let me add some test cases to that first

oh wait 2), for some reason my sync doesn't seem to have worked, these don't have the recent basic auth changes

Lemme close this and sort my issues out....

@@ -41,7 +43,7 @@ class WebsiteAgent < Agent

default_schedule "every_12h"

UNIQUENESS_LOOK_BACK = 30
UNIQUENESS_LOOK_BACK = 10000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the right idea, although I think 10000 is too big as a default. Loading 10,000 events and their payloads could be quite slow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"although I think 10000 is too big as a default. Loading 10,000 events and their payloads could be quite slow."

Here's the way I'm thinking about it...

If you have 10,000 events in your buffer then one of two things is going to happen:

  • it's a bit slow to deduplicate (if lookback=10,000)
  • you get bombarded by nearly 10,000 events in your email, if lookback is a small number (say 1000)

Neither case is ideal, but functionally the first case is clearly preferable. The performance issue should be resolved longer term by doing deduplication centrally using an indexed field.

fwiw i have an agent with 3000 events and it seems reasonably snappy still

@cantino
Copy link
Member

cantino commented Jan 30, 2014

Let me know if I can help!

@@ -0,0 +1,19 @@
# Increase handler size to 16MB (consistent with events.payload)

class ChangeHandlerToMediumText < ActiveRecord::Migration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you send a PR for this migration too? It's helpful!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see PR #152?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, somehow missed it. Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants