-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe performance drop of GDBM_File, from perl v5.30.0 to v5.32.1 #18884
Comments
This would have to be related to #18435, perhaps @graygnuorg has a clue? |
As I read the original post, the deterioration occurred somewhere between perl-5.30.0 and perl-5.32.1. The commits proposed in #18435 were merged into blead on Jan 06 2021, i.e., during the 5.33 dev cycle. So I don't think they can explain the performance drop. |
I've just tried this on non-threaded builds of v5.30.0 and v5.32.1, on Debian, not Fedora. The reported slowdown did not show up.
I'll try threaded builds now. |
No slowdown with threaded builds here either:
|
Note that the OP's "machine_a" and my Debian box are running packaged versions of GDBM v1.18.1, but the "machine_b" which exhibits the slowdown is using a version of GDBM v.1.19. That might be pertinent, but I don't have any time now to bump my GDBM version and rerun the test cases. FYI. changelog for GDBM v1.19:
|
Thanks for letting me know. The problem is not related to Perl in any way. I have identified the offending commit (4fb2326a4a). The patch will be available today. The new GDBM version, which is going to be released this week, will incorporate the fix. |
@graygnuorg - thanks for the quick diagnosis & response. |
I have committed the fix: https://git.gnu.org.ua/gdbm.git/commit/?id=d69a106c04. Will let you know when GDBM 1.20 is available. |
The commit 4fb2326 introduced pre-reading of memory mapped regions. While speeding up searches, it has a negative impact on write operatons, since every remapping effectively re-reads the entire database. See Perl/perl5#18884 for details. * NEWS: Document changes. * doc/gdbm.texi: Document the GDBM_PREREAD flag. * src/gdbm.h.in (GDBM_PREREAD): New flag. * src/gdbmdefs.h (gdbm_file_info): New member: mmap_preread. * src/gdbmopen.c (gdbm_fd_open): Set mmap_preread if requested. * src/gdbmsetopt.c (setopt_gdbm_getflags): Report GDBM_PREREAD flag, if dbf->mmap_preread is set. * src/mmap.c (_gdbm_internal_remap): Use pre-fault reading only if dbf->mmap_preread is set.
Thanks everyone, that's excellent! What is the procedure for closing a ticket? Is it my (OP) responsibility to close? Should I wait until I've had a chance to try the fix? |
On 6/14/21 9:36 PM, kjohnstn wrote:
Thanks everyone, that's excellent!
What is the procedure for closing a ticket? Is it my (OP) responsibility
to close? Should I wait until I've had a chance to try the fix?
Simply notify us in this Issue when you've got the fix. We'll take care
of closing the ticket.
|
GDBM 1.20 has been released: https://ftp.gnu.org/gnu/gdbm/gdbm-1.20.tar.gz |
Based on the discussion in this ticket, we'll assume that this release addresses the original poster's concerns and close the ticket. Any new problems with GDBM 1.20 should be raised in a new ticket. |
Version 1.20, 2021-06-17 * New bucket cache The bucket cache support has been rewritten from scratch. The new bucket cache code provides for significant speed up of search operations. * Change mmap prereading strategy Pre-reading of the memory mapper regions, introduced in version 1.19 can be advantageous only when doing intensive look-ups on a read-only database. It degrades performance otherwise, especially if doing multiple inserts. Therefore, this version introduces a new flag to gdbm_open: GDBM_PREREAD. When given, it enables pre-reading of memory mapped regions. See Perl/perl5#18884 for details.
Module: GDBM_File
Description
I have a machine that I recently upgraded from FC31 to FC33. I have a program that (re)builds a GDBM db from a text file. On FC31, a db rebuild took about an hour; on FC33 it takes over 200 hours (9 days). The db contains (currently) about 95M keys. The db's only purpose is to check for key existence; $db{$key} is 1 for every $key.
I don't have any machines still running FC31, but I have a second machine running Ubuntu 20, which has the same perl MAJOR.MINOR version (v5.30.3 on fc31, v5.30.0 on Ubuntu 20). The performance drop is plainly evident between these two machines. The FC33 machine is the newer and faster machine, but db (re)build is significantly slower.
machine_a: Older, slower:
machine_b: Newer, faster:
Steps to Reproduce
I ran two benchmarks on each machine: One benchmark creates an ordinary (in-mem) perl hash with about 12M keys. The second benchmark creates a tied GDBM db hash with the same 12M keys. The newer machine is significantly faster building the in-mem hash, but significantly slower building the tied hash.
Expected behavior
I expect a tied hash to be slower than an in-mem hash, always. That's not the issue. The 10x difference seen on machine_a is totally acceptable. However, the 500x slowdown seen on machine_b is troubling, and I think it just keeps getting worse as more keys are stored in the db. The 200x degradation of the 95M-key hash rebuild from FC31 to FC33 is between tied hashes in both cases, not tied vs mem.
Perl configuration
The text was updated successfully, but these errors were encountered: