Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory consumption optimization #14

Closed
FGRibreau opened this issue Apr 5, 2013 · 3 comments
Closed

Memory consumption optimization #14

FGRibreau opened this issue Apr 5, 2013 · 3 comments

Comments

@FGRibreau
Copy link

Hi,

I've been using bloomd in production since yesterday and I must say I'm impressed by its stability and low CPU consumption. You did a really good job there, congrats!

However, I've got some questions regarding memory consumption. Currently bloomd memory consumption is constantly increasing (RES: 106M, SHR: 105M, VIRT 243M).

Here are my bloom filters after one day.

f1 0.000100 300046 100000 91793
f2 0.000100 105575294 34100000 13919873
f3 0.000100 300046 100000 72710
f3 0.000100 1509656 500000 291040
f4 0.000100 1509656 500000 124098

Note that they are going to increase like that at nearly constant rate. And since Scalable Bloom Filters work by adding new bloom filters when the size ratio is reached, the memory consumption will indefinitely increase.

I'm not an expert in C, but I wondered if you could update the readme to give some input on how the "Automatically faults cold filters out of memory to save resources" feature works, in order to take advantage of it.

If I understand it well, since here my filters won't be ever cold (new data is added constantly), I thought maybe I could create filters with composed name like "f{filterid}{weekoftheyear}{year}" where "{weekoftheyear}{year}" are informations extracted and available from every data that the filters test against. That way, filters with older data could be removed from memory but still available just in case.

Is this the right approach? What do you think?

@armon
Copy link
Owner

armon commented Apr 5, 2013

Hey Francois,

I'm glad to hear it is working well for you! You are correct in your thinking about how the
filters will work, and in the current setup your memory use will continue to grow unbounded.

The automatic cold filter faulting is pretty simple. In the config it is possible to specify a value
called cold_interval, which defaults to 3600 seconds. Basically, if a filter is not accessed
(no checks / sets) for this interval, then it is removed from memory and kept on disk.

The way we setup our filters at Kiip, is they are named something like:

  • ..

So we have filters like:

  • sessions.daily.2013-04-05
  • sessions.monthly.2013-04-01

This way, as you suggested, eventually it is possible for the filters
to go cold. Once the day is over, all of our daily filters get faulted out
automatically, same with the month, etc.

This is basically the same as what you suggested, so I expect that
it will work quite well for you!

Let me know if you have any other questions.

Best Regards,

Armon Dadgar

On Friday, April 5, 2013 at 3:36 AM, Francois-Guillaume Ribreau wrote:

Hi,
I've been using bloomd in production since yesterday and I must say I'm impressed by its stability and low CPU consumption. You did a really good job there, congrats!
However, I've got some questions regarding memory consumption. Currently bloomd memory consumption is constantly increasing (RES: 106M, SHR: 105M, VIRT 243M).
Here are my bloom filters after one day.
f1 0.000100 300046 100000 91793 f2 0.000100 105575294 34100000 13919873 f3 0.000100 300046 100000 72710 f3 0.000100 1509656 500000 291040 f4 0.000100 1509656 500000 124098

Note that they are going to increase like that at nearly constant rate. And since Scalable Bloom Filters works by adding new bloom filters when the size ratio is reached, the memory consumption will indefinitely increase.
I'm not an expert in C, but I wondered if you could update the readme to give some input on how the "Automatically faults cold filters out of memory to save resources" feature works, in order to take advantage of it.
If I understand it well, since here my filters won't be ever cold (new data is added constantly), I thought maybe I could create filters with composed name like "f{filterid}{dayoftheweek}{year}" where "{dayoftheweek}{year}" are informations extracted and available from every data that the filters test against. That way, filters with older data could be removed from memory but still available just in case.
Is this the right approach? What do you think?


Reply to this email directly or view it on GitHub (#14).

@FGRibreau
Copy link
Author

Thanks for your feedback!

I updated my code (with filters like .DD-MM-YYYY) and set initial_capacity to 10001 in order to keep memory as low as possible and it worked!

I'll keep you posted if anything weird happens

Cheers

@armon
Copy link
Owner

armon commented Apr 8, 2013

Great glad it worked! I would advise against just setting initial capacity to the smallest possible value.
Due to the way the scalable filters work (stacking multiple bloom filters), as then number of filters grows it will
get slower (marginally, but still). It is better if you can select a capacity that you think will be enough right off the bat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants