Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Algorithm] Drain cache look up optimization, evaluate accuracy impacts #12

Open
Superskyyy opened this issue Jul 19, 2022 · 3 comments
Assignees
Labels
Algorithm The work is on the algorithm side analysis: log enhancement New feature or request upstream A issue that could be submitted to upstream repos first
Milestone

Comments

@Superskyyy
Copy link
Member

Superskyyy commented Jul 19, 2022

Now, @Liangshumin has a prototype to use cache look-up to speed up Drain significantly. I changed the lookup to after masking since

  • I intend to ingest raw log, and the raw log has unique timestamps, cache will always miss before masking :)

The algorithm sped up at least 40%, by reducing tree traversal almost to neglectable time. (then it's the divide and conquer the problem of masking task).

We should keep this optimization in mind and conduct further testing. If it's stable, we should probably contribute back upstream as it's a general purpose optimization.

This thread tracks our case by testing and theoretical evaluation in case unwanted side-effects emerge.

We also need to evaluate the choice of cache size to limit memory usage, it most likely should be near max_cluster limit.

@Superskyyy Superskyyy added enhancement New feature or request Algorithm The work is on the algorithm side analysis: log upstream A issue that could be submitted to upstream repos first labels Jul 19, 2022
@Superskyyy Superskyyy added this to the 0.1.0 milestone Jul 19, 2022
@Superskyyy
Copy link
Member Author

Caching technique has been proven to be effective. Implementation is on going.

@Superskyyy
Copy link
Member Author

Addressed in #23

@Superskyyy
Copy link
Member Author

Maybe LFU cache works better in some cases? It should be provided as an option depending on the data.

@Superskyyy Superskyyy reopened this Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algorithm The work is on the algorithm side analysis: log enhancement New feature or request upstream A issue that could be submitted to upstream repos first
Projects
Status: Done
Development

No branches or pull requests

2 participants