-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement request: For perl-GDBM_File to support gdbm_open "GDBM_NOMMAP" flag #19306
Comments
@graygnuorg, are you available to take a loot at this GDBM-related issue? |
Hi James,
@graygnuorg, are you available to take a loot at this GDBM-related
issue?
Sure, I am. I'll be able to take a look at it in a couple of days.
Best,
Sergey
|
Hello, To begin with, thanks for a detailed bug report. It helped to find an inefficient routine in the GDBM library. Before returning to it, let me first address the questions posed in your initial posting. First of all, the memory mapping support was added not "in the transition from gdbm 1.18 to 1.19", as you assumed, but much earlier: in GDBM version 1.9, dated 2011-08-12. It remains the default since then, Secondly, the support for the GDBM_NOMMAP flag has already been added in the commit 1d7b7043625.. Additional documentation was added by commit 8b8b12225a4. Now let's return to the problem itself. The observed slow downs during insertions happen when extensive updates of a big database file cause splitting of several key buckets in sequence (this explains the wave-like pattern). I have fixed it in GDBM commit b8c3d13fd8. To measure the performance, I have created a benchmark suite, which was used to generate the following graph, that shows comparative results for GDBM versions 1.18, 1.22 and git master (recent commit b8c3d13fd8): I will release the new GDBM version as soon as I finish additional testing. In the meanwhile, you can give a try to the git version. Using 1.22, you can mitigate the adverse effect of the bug by setting the minimal cache size, e.g.:
(the |
The implementation of _gdbm_cache_flush becomes prohibitively inefficient during extensive updates of large databases. The bug was reported at Perl/perl5#19306. To fix it, make sure that all changed cache entries are placed at the head of the cache_mru list, forming a contiguous sequence. This way a potentially long iteration over all cache entries can be cut off at the first entry with ca_changed == FALSE. This commit also gets rid of several superfluous fields in struct gdbm_file_info: - cache_entry Not needed, because the most recently used cache entry (cache_mru) is always the current one. - bucket_changed dbf->cache_mru->ca_changed reflects the status of the current bucket. - second_changed Not needed because _gdbm_cache_flush, which flushes all changed buckets, is now invoked unconditionally by _gdbm_end_update (and also whenever dbf->cache_mru changes). * src/gdbmdefs.h (struct gdbm_file_info): Remove cache_entry. The current cache entry is cache_mru. Remove bucket_changed, and second_changed. All uses changed. * src/proto.h (_gdbm_current_bucket_changed): New inline function. * src/bucket.c (_gdbm_cache_flush): Assume all changed elements form a contiguous sequence beginning with dbf->cache_mru. (set_cache_entry): Remove. All callers changed. (lru_link_elem,lru_unlink_elem): Update dbf->bucket as necessary. (cache_lookup): If the obtained bucket is not changed and is going to become current, flush all changed cache elements. * src/update.c (_gdbm_end_update): Call _gdbm_cache_flush unconditionally. * src/findkey.c: Use dbf->cache_mru instead of the removed dbf->cache_entry. * src/gdbmseq.c: Likewise. * tools/gdbmshell.c (_gdbm_print_bucket_cache): Likewise. * src/falloc.c: Use _gdbm_current_bucket_changed to mark the current bucket as changed. * src/gdbmstore.c: Likewise. * src/gdbmdelete.c: Likewise. Use _gdbm_current_bucket_changed. * tests/gtcacheopt.c: Fix typo. * tests/gtload.c: New option: -cachesize
Thank you, Thank you, Thank you! |
I am having an issue with gdbm, and I would like to try to prove whether mmap is or is not a factor.
This is a follow-on to issue#18884. In the transition from gdbm 1.18 to 1.19, mem-mapped db support was added, and db pre-read was enabled by default. Or maybe mmap was already present and only pre-read was added? I'm not sure. I think mmap was new with gdbm 1.19.
Anyway, based on performance issues raised in issue#18884, a gdbm_open flag was added (gdbm 1.20), GDBM_PREREAD, and it is no longer enabled by default, it must now be specified. But if I understand correctly, mem-mapping is still the default even though pre-read is not.
There is another gdbm_open flag to disable mem-mapping, GDBM_NOMMAP, but perl-GDBM_File does not export this flag. I'm asking for GDBM_NOMMAP to be exported so I can run experiments with and without mmap. The gdbm docs say disabling mmap will degrade performance, but I'm having two issues:
(1) DB rebuild: I saw about 150-200x performance drop from gdbm 1.18 to 1.19, and only about 10x perf regain from gdbm 1.19 to 1.22. That is, my application is still suffering about a 15x or more perf drop since gdbm 1.18. I want to see if NOMMAP will give me back this other 15x loss. Note this only concerns the time to simply rebuild the db from a key/value text file. Does not consider the db performance in-use.
(2) In-Use: After only about a day of use (and sometimes much less), something suddenly consumes all of my mem and swap, and the program crashes. I suspect a memory leak in gdbm, but I don't have proof. I haven't been able to construct a simple test program to demo this. It is not a slow, gradual memory consumption leading to eventual out-of-mem, it is a sudden and massive mem consumption of approx 200GB within a few seconds. I want to see if NOMMAP has any effect.
I'm not reporting any bug at this time; I don't have the data. I'm just asking for perl-GDBM_File to export GDBM_NOMMAP so I can try more experiments.
(At the same time, there may be other users who could benefit from the GDBM_PREREAD flag. It is not useful to me, but maybe both could be exported?)
The text was updated successfully, but these errors were encountered: