Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Severe performance drop of GDBM_File, from perl v5.30.0 to v5.32.1 #18884

Closed
kjohnstn opened this issue Jun 13, 2021 · 12 comments
Closed

Severe performance drop of GDBM_File, from perl v5.30.0 to v5.32.1 #18884

kjohnstn opened this issue Jun 13, 2021 · 12 comments
Labels
ext/GDBM_File Not Core Perl This is not an issue that can be addressed by this repo

Comments

@kjohnstn
Copy link

kjohnstn commented Jun 13, 2021

Module: GDBM_File

Description
I have a machine that I recently upgraded from FC31 to FC33. I have a program that (re)builds a GDBM db from a text file. On FC31, a db rebuild took about an hour; on FC33 it takes over 200 hours (9 days). The db contains (currently) about 95M keys. The db's only purpose is to check for key existence; $db{$key} is 1 for every $key.

I don't have any machines still running FC31, but I have a second machine running Ubuntu 20, which has the same perl MAJOR.MINOR version (v5.30.3 on fc31, v5.30.0 on Ubuntu 20). The performance drop is plainly evident between these two machines. The FC33 machine is the newer and faster machine, but db (re)build is significantly slower.

machine_a: Older, slower:

$ uname -a
Linux machine_a 5.8.0-50-generic #56~20.04.1-Ubuntu SMP Mon Apr 12 21:46:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

$ perl -v
This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-linux-gnu-thread-multi

$ perl -e'use strict; use GDBM_File; print "$GDBM_File::VERSION\n";'
1.18

$ apt list --installed | grep -i gdbm
libgdbm6/focal,now 1.18.1-5 amd64 [installed,automatic]

machine_b: Newer, faster:

$ uname -a
Linux machine_b 5.11.18-200.fc33.x86_64 #1 SMP Mon May 3 15:05:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

$ perl -v
This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-thread-multi

(Full perl -V output below.)

$ perl -e'use strict; use GDBM_File; print "$GDBM_File::VERSION\n";'
1.18

$ rpm -qa | grep -i gdbm
gdbm-1.19-1.fc33.x86_64
gdbm-libs-1.19-1.fc33.x86_64

Steps to Reproduce

I ran two benchmarks on each machine: One benchmark creates an ordinary (in-mem) perl hash with about 12M keys. The second benchmark creates a tied GDBM db hash with the same 12M keys. The newer machine is significantly faster building the in-mem hash, but significantly slower building the tied hash.

# machine_a, in-mem hash:
$ time -p perl -e'use strict; use integer; my %h; $h{$_} = 1 for ("a" .. "zzzzz");'
real 24.25
user 23.11
sys 0.91

# machine_a, tied gdbm hash:
$ time -p perl -e'use strict; use integer; use GDBM_File; my ($h, %h); $h = "gdbm_test"; tie(%h, "GDBM_File", $h, GDBM_NEWDB, 0600); $h{$_} = 1 for ("a" .. "zzzzz"); untie(%h);'
real 239.70
user 108.06
sys 91.56

# machine_b, in-mem hash:
$ time -p perl -e'use strict; use integer; my %h; $h{$_} = 1 for ("a" .. "zzzzz");'
real 12.27
user 11.70
sys 0.53

# machine_b, tied gdbm hash:
$ time -p perl -e'use strict; use integer; use GDBM_File; my ($h, %h); $h = "gdbm_test"; tie(%h, "GDBM_File", $h, GDBM_NEWDB, 0600); $h{$_} = 1 for ("a" .. "zzzzz"); untie(%h);'
real 6352.56
user 52.96
sys 6176.20

Expected behavior
I expect a tied hash to be slower than an in-mem hash, always. That's not the issue. The 10x difference seen on machine_a is totally acceptable. However, the 500x slowdown seen on machine_b is troubling, and I think it just keeps getting worse as more keys are stored in the db. The 200x degradation of the 95M-key hash rebuild from FC31 to FC33 is between tied hashes in both cases, not tied vs mem.

Perl configuration

Summary of my perl5 (revision 5 version 32 subversion 1) configuration:
   
  Platform:
    osname=linux
    osvers=5.10.15-200.fc33.x86_64
    archname=x86_64-linux-thread-multi
    uname='linux buildvm-x86-18.iad2.fedoraproject.org 5.10.15-200.fc33.x86_64 #1 smp wed feb 10 17:46:55 utc 2021 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=none -Dccflags=-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -Dldflags=-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld  -Dccdlflags=-Wl,--enable-new-dtags -Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld  -Dlddlflags=-shared -Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld  -Dshrpdir=/usr/lib64 -DDEBUGGING=-g -Dversion=5.32.1 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5/5.32 -Dsitearch=/usr/local/lib64/perl5/5.32 -Dprivlib=/usr/share/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Darchlib=/usr/lib64/perl5 -Dvendorarch=/usr/lib64/perl5/vendor_perl -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Dusedtrace=/usr/bin/dtrace -Duselargefiles -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin -Dusesitecustomize -Duse64bitint'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='gcc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fwrapv -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='  -g'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fwrapv -fno-strict-aliasing -I/usr/local/include'
    ccversion=''
    gccversion='10.2.1 20201125 (Red Hat 10.2.1-9)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='gcc'
    ldflags ='-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld  -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib64 /lib64 /usr/lib64 /usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib
    libs=-lpthread -lresolv -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc -lgdbm_compat
    perllibs=-lpthread -lresolv -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.32.so
    so=so
    useshrplib=true
    libperl=libperl.so
    gnulibc_version='2.32'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,--enable-new-dtags -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld '
    cccdlflags='-fPIC'
    lddlflags='-lpthread -shared -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld  -L/usr/local/lib -fstack-protector-strong'

Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_IMPLICIT_CONTEXT
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_SITECUSTOMIZE
    USE_THREAD_SAFE_LOCALE
  Locally applied patches:
    Fedora Patch1: Removes date check, Fedora/RHEL specific
    Fedora Patch2: support for libdir64
    Fedora Patch3: use libresolv instead of libbind
    Fedora Patch4: USE_MM_LD_RUN_PATH
    Fedora Patch5: Provide MM::maybe_command independently (bug #1129443)
    Fedora Patch6: Dont run one io test due to random builder failures
    Fedora Patch8: Define SONAME for libperl.so
    Fedora Patch9: Install libperl.so to -Dshrpdir value
    Fedora Patch10: Make *DBM_File desctructors thread-safe (RT#61912)
    Fedora Patch11: Replace EU::MakeMaker dependency with EU::MM::Utils in IPC::Cmd (bug #1129443)
    Fedora Patch12: Link XS modules to pthread library to fix linking with -z defs
    Fedora Patch13: Pass the correct CFLAGS to dtrace
    Fedora Patch14: Do not use C compiler reserved identifiers
    Fedora Patch15: Fix SvUV_nomg() macro definition
    Fedora Patch16: Fix SvTRUE() documentation
    Fedora Patch17: Fix ext/XS-APItest/t/utf8_warn_base.pl tests
    Fedora Patch18: Fix IO::Handle::error() to report write errors (GH#6799)
    Fedora Patch19: Fix IO::Handle::error() to report write errors (GH#6799)
    Fedora Patch21: Fix setting a non-blocking mode in IO::Socket::UNIX (GH#17787)
    Fedora Patch22: Fix running actions after stepping in a debugger (GH#17901)
    Fedora Patch23: Fix running actions after stepping in a debugger (GH#17901)
    Fedora Patch24: Fix running actions after stepping in a debugger (GH#17901)
    Fedora Patch25: Fix a buffer size for asctime_r() and ctime_r() functions
    Fedora Patch26: Prevent from an integer overflow in RenewDouble() macro
    Fedora Patch28: Fix a number of arguments passed to a BOOT XS subroutine (GH#17755)
    Fedora Patch29: Fix an IO::Handle spurious error reported for regular file handles (GH#18019)
    Fedora Patch30: Fix inheritance resolution of lexial objects in a debugger (GH#17661)
    Fedora Patch35: Fix sorting with a block that calls return (GH#18081)
    Fedora Patch38: Fix sv_collxfrm macro to respect locale
    Fedora Patch39: Fix an iterator signedness in handling an mro exception (GH#18155)
    Fedora Patch40: Fix a code flow in Perl_sv_inc_nomg()
    Fedora Patch41: Fix an undefined behavior in Perl_custom_op_get_field()
    Fedora Patch42: Fix Config variable names in in t/op tests
    Fedora Patch43: Fix fetching a magic on the stacked file test operators
    Fedora Patch44: Fix a crash in optimizing split() (GH#18232)
    Fedora Patch45: Fix a crash in optimizing split() (GH#18232)
    Fedora Patch46: Fix a crash in optimizing split() (GH#18232)
    Fedora Patch47: Fix a crash in optimizing split() (GH#18232)
    Fedora Patch48: Make accessing environment by DynaLoader thread-safe
    Fedora Patch49: Use duplocale() if available
    Fedora Patch50: Fix fc() in Turkish locale
    Fedora Patch51: Fix croaking on "my $_" when "use utf8" is in effect (GH#18449)
    Fedora Patch52: Fix PERL_UNUSED_ARG() definition in XSUB.h
    Fedora Patch53: Add missing entries to perldiag (GH#18276)
    Fedora Patch54: Protect locale tests from LANGUAGE environment variable
    Fedora Patch55: Prevent the number of buckets in a hash from getting too large
    Fedora Patch56: Fix a memory leak when compiling a regular expression (GH#18604)
    Fedora Patch57: Fix dumping a hash entry of PL_strtab type
    Fedora Patch57: Fix an arithmetic left shift of a minimal integer value (GH#18639)
    Fedora Patch200: Link XS modules to libperl.so with EU::CBuilder on Linux
    Fedora Patch201: Link XS modules to libperl.so with EU::MM on Linux
  Built under linux
  Compiled at Mar 31 2021 00:00:00
  @INC:
    /usr/local/lib64/perl5/5.32
    /usr/local/share/perl5/5.32
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5
@Leont
Copy link
Contributor

Leont commented Jun 13, 2021

This would have to be related to #18435, perhaps @graygnuorg has a clue?

@jkeenan
Copy link
Contributor

jkeenan commented Jun 13, 2021

This would have to be related to #18435, perhaps @graygnuorg has a clue?

As I read the original post, the deterioration occurred somewhere between perl-5.30.0 and perl-5.32.1. The commits proposed in #18435 were merged into blead on Jan 06 2021, i.e., during the 5.33 dev cycle. So I don't think they can explain the performance drop.

@richardleach
Copy link
Contributor

I've just tried this on non-threaded builds of v5.30.0 and v5.32.1, on Debian, not Fedora. The reported slowdown did not show up.

----
# dpkg -l | grep gdbm
ii  gdbm-l10n                      1.18.1-4                     all          GNU dbm database routines (translation files) 
ii  libgdbm-compat4:amd64          1.18.1-4                     amd64        GNU dbm database routines (legacy support runtime version) 
ii  libgdbm-dev:amd64              1.18.1-4                     amd64        GNU dbm database routines (development files) 
ii  libgdbm6:amd64                 1.18.1-4                     amd64        GNU dbm database routines (runtime version) 
----

# ./perl5.30.0 -v

This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-linux

# time -p ./perl5.30.0 -e'use strict; use integer; my %h; $h{$_} = 1 for ("a" .. "zzzzz");'
real 23.83
user 22.94
sys 0.87

# time -p ./perl5.30.0 -e'use strict; use integer; use GDBM_File; my ($h, %h); $h = "gdbm_test"; tie(%h, "GDBM_File", $h, GDBM_NEWDB, 0600); $h{$_} = 1 for ("a" .. "zzzzz"); untie(%h);'
real 284.47
user 91.93
sys 122.20

# ./perl5.32.1 -v

This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux

# time -p ./perl5.32.1 -e'use strict; use integer; my %h; $h{$_} = 1 for ("a" .. "zzzzz");'
real 26.49
user 25.30
sys 1.16


# time -p ./perl5.32.1 -e'use strict; use integer; use GDBM_File; my ($h, %h); $h = "gdbm_test"; tie(%h, "GDBM_File", $h, GDBM_NEWDB, 0600); $h{$_} = 1 for ("a" .. "zzzzz"); untie(%h);'
real 293.58
user 92.70
sys 121.52

I'll try threaded builds now.

@richardleach
Copy link
Contributor

No slowdown with threaded builds here either:

# ./perl5.30.0 -v

This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-linux-thread-multi

# time -p ./perl5.30.0 -e'use strict; use integer; my %h; $h{$_} = 1 for ("a" .. "zzzzz");'
real 21.94
user 21.10
sys 0.83

# time -p ./perl5.30.0 -e'use strict; use integer; use GDBM_File; my ($h, %h); $h = "gdbm_test"; tie(%h, "GDBM_File", $h, GDBM_NEWDB, 0600); $h{$_} = 1 for ("a" .. "zzzzz"); untie(%h);'
real 287.61
user 93.53
sys 124.57

# ./perl5.32.1 -v

This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-thread-multi

# time -p ./perl5.32.1 -e'use strict; use integer; my %h; $h{$_} = 1 for ("a" .. "zzzzz");'
real 22.66
user 21.73
sys 0.91

# time -p ./perl5.32.1 -e'use strict; use integer; use GDBM_File; my ($h, %h); $h = "gdbm_test"; tie(%h, "GDBM_File", $h, GDBM_NEWDB, 0600); $h{$_} = 1 for ("a" .. "zzzzz"); untie(%h);'
real 289.64
user 97.12
sys 123.82

@richardleach
Copy link
Contributor

Note that the OP's "machine_a" and my Debian box are running packaged versions of GDBM v1.18.1, but the "machine_b" which exhibits the slowdown is using a version of GDBM v.1.19. That might be pertinent, but I don't have any time now to bump my GDBM version and rerun the test cases.

FYI. changelog for GDBM v1.19:

  • Pre-read the memory mapped regions on systems that support it.
  • gdbmtool: tagged initialization of structured data.
  • Preserve locking type during database reorganization.

@graygnuorg
Copy link
Contributor

Thanks for letting me know. The problem is not related to Perl in any way. I have identified the offending commit (4fb2326a4a). The patch will be available today. The new GDBM version, which is going to be released this week, will incorporate the fix.

@richardleach
Copy link
Contributor

@graygnuorg - thanks for the quick diagnosis & response.

@richardleach richardleach added Closable? We might be able to close this ticket, but we need to check with the reporter Not Core Perl This is not an issue that can be addressed by this repo and removed Needs Triage labels Jun 14, 2021
@graygnuorg
Copy link
Contributor

I have committed the fix: https://git.gnu.org.ua/gdbm.git/commit/?id=d69a106c04. Will let you know when GDBM 1.20 is available.

jollaitbot pushed a commit to sailfishos-mirror/gdbm that referenced this issue Jun 14, 2021
The commit 4fb2326 introduced pre-reading of memory mapped
regions.  While speeding up searches, it has a negative impact
on write operatons, since every remapping effectively re-reads
the entire database.  See Perl/perl5#18884
for details.

* NEWS: Document changes.
* doc/gdbm.texi: Document the GDBM_PREREAD flag.
* src/gdbm.h.in (GDBM_PREREAD): New flag.
* src/gdbmdefs.h (gdbm_file_info): New member: mmap_preread.
* src/gdbmopen.c (gdbm_fd_open): Set mmap_preread if requested.
* src/gdbmsetopt.c (setopt_gdbm_getflags): Report GDBM_PREREAD
flag, if dbf->mmap_preread is set.
* src/mmap.c (_gdbm_internal_remap): Use pre-fault reading only
if dbf->mmap_preread is set.
@kjohnstn
Copy link
Author

Thanks everyone, that's excellent!

What is the procedure for closing a ticket? Is it my (OP) responsibility to close? Should I wait until I've had a chance to try the fix?

@jkeenan
Copy link
Contributor

jkeenan commented Jun 15, 2021 via email

@jkeenan jkeenan self-assigned this Jun 16, 2021
@graygnuorg
Copy link
Contributor

GDBM 1.20 has been released: https://ftp.gnu.org/gnu/gdbm/gdbm-1.20.tar.gz

@jkeenan
Copy link
Contributor

jkeenan commented Jun 21, 2021

GDBM 1.20 has been released: https://ftp.gnu.org/gnu/gdbm/gdbm-1.20.tar.gz

Based on the discussion in this ticket, we'll assume that this release addresses the original poster's concerns and close the ticket. Any new problems with GDBM 1.20 should be raised in a new ticket.

@jkeenan jkeenan closed this as completed Jun 21, 2021
@jkeenan jkeenan removed the Closable? We might be able to close this ticket, but we need to check with the reporter label Jun 21, 2021
@jkeenan jkeenan removed their assignment Jun 21, 2021
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jun 28, 2021
Version 1.20, 2021-06-17

* New bucket cache

The bucket cache support has been rewritten from scratch.  The new
bucket cache code provides for significant speed up of search
operations.

* Change mmap prereading strategy

Pre-reading of the memory mapper regions, introduced in version 1.19
can be advantageous only when doing intensive look-ups on a read-only
database.  It degrades performance otherwise, especially if doing
multiple inserts.  Therefore, this version introduces a new flag
to gdbm_open: GDBM_PREREAD.  When given, it enables pre-reading of
memory mapped regions.

See Perl/perl5#18884 for details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ext/GDBM_File Not Core Perl This is not an issue that can be addressed by this repo
Projects
None yet
Development

No branches or pull requests

5 participants