Enhancement request: For perl-GDBM_File to support gdbm_open "GDBM_NOMMAP" flag #19306

kjohnstn · 2021-12-31T00:24:15Z

I am having an issue with gdbm, and I would like to try to prove whether mmap is or is not a factor.

This is a follow-on to issue#18884. In the transition from gdbm 1.18 to 1.19, mem-mapped db support was added, and db pre-read was enabled by default. Or maybe mmap was already present and only pre-read was added? I'm not sure. I think mmap was new with gdbm 1.19.

Anyway, based on performance issues raised in issue#18884, a gdbm_open flag was added (gdbm 1.20), GDBM_PREREAD, and it is no longer enabled by default, it must now be specified. But if I understand correctly, mem-mapping is still the default even though pre-read is not.

There is another gdbm_open flag to disable mem-mapping, GDBM_NOMMAP, but perl-GDBM_File does not export this flag. I'm asking for GDBM_NOMMAP to be exported so I can run experiments with and without mmap. The gdbm docs say disabling mmap will degrade performance, but I'm having two issues:

(1) DB rebuild: I saw about 150-200x performance drop from gdbm 1.18 to 1.19, and only about 10x perf regain from gdbm 1.19 to 1.22. That is, my application is still suffering about a 15x or more perf drop since gdbm 1.18. I want to see if NOMMAP will give me back this other 15x loss. Note this only concerns the time to simply rebuild the db from a key/value text file. Does not consider the db performance in-use.

(2) In-Use: After only about a day of use (and sometimes much less), something suddenly consumes all of my mem and swap, and the program crashes. I suspect a memory leak in gdbm, but I don't have proof. I haven't been able to construct a simple test program to demo this. It is not a slow, gradual memory consumption leading to eventual out-of-mem, it is a sudden and massive mem consumption of approx 200GB within a few seconds. I want to see if NOMMAP has any effect.

I'm not reporting any bug at this time; I don't have the data. I'm just asking for perl-GDBM_File to export GDBM_NOMMAP so I can try more experiments.

(At the same time, there may be other users who could benefit from the GDBM_PREREAD flag. It is not useful to me, but maybe both could be exported?)

jkeenan · 2021-12-31T01:27:03Z

@graygnuorg, are you available to take a loot at this GDBM-related issue?

graygnuorg · 2022-01-02T16:22:44Z

Hi James,

@graygnuorg, are you available to take a loot at this GDBM-related issue?

Sure, I am. I'll be able to take a look at it in a couple of days. Best, Sergey

kjohnstn · 2022-01-03T05:12:13Z

After thinking about it a bit more, I had a new idea for a test program, and I think it worked surprisingly well. This program does not demo the out-of-mem crash that I've seen with my real application, but it does demo a difference in DB build time. It's slightly different from a rebuild scenario, since a rebuild would only do writes whereas this test program does both reads and writes. But the test program doesn't require a pre-generated multi-gigabyte key/value text file to load from, which would take literally days for me to upload since I have a really terrible internet provider. On the other hand, the test program creates a key/value text file as it goes, which could be used independently to experiment with a pure rebuild scenario. So here's the program:

#! /usr/bin/perl

package torture;
use strict;

use integer;
use POSIX;
use GDBM_File;

our $torture = "torture.gdb";
our %torture;
tie(%torture, "GDBM_File", $torture, GDBM_NEWDB, 0600);

open(N, ">", "torture.n")
  or die "Failed to open > torture.n: $!";
select(N); $| = 1;

my $n = 0;
my $t = 0;

while (++$n > 0) {
  print STDOUT "n=$n\n" if (($n & 0xfffff) == 0);
  my $k =  $n ;
  my @k = ($n);
  my $K;
  my @K = ();
  while ($k > 1) {
    if ($k & 1) {
      $k += (($k << 1) + 1);
      push(@k, $k);
      push(@K, 'O');
    }
    $k >>= 1;
    push(@k, $k);
    push(@K, 'E');
  }
  push(@K, '1');
  while (@k) {
    $K = join("", @K);
    $k = shift(@k);
    if (my $tK = $torture{$K}) {
      ($tK == $k)
        or die "DB fail: key=$K expect=$k actual=$tK\n";
    }
    else {
      $torture{$K} = $k;
      print N "$k=$K\n";
      print STDOUT "t=$t\n" if ((++$t & 0xfffff) == 0);
    }
    shift(@K);
  }
}

close(N); # not gonna get here anytime soon.
untie(%torture);

The high-level structure is 3 while loops:

while (++$n > 0) {
  while ($k > 1) {
  }
  while (@k) {
  }
}

The outer loop is just a forever loop for all practical purposes. It would theoretically quit when $n increments from 0x7fff...ffff to 0x8000...0000, but it would take a really long time. All it does is increment $n.

In the top inner loop, $k starts as a copy of $n, and then $k is stepped along its Collatz path until it reaches 1. The Collatz path is determined by:

If $k is Odd, multiply by 3 and add 1.
If $k is Even, divide it by 2.
If the new value is 1, stop; else repeat.

At each step, either an 'O' or an 'E' character is added to @K to indicate the result of the Odd/Even test, and the new value of $k is added to @k. When it reaches 1, a final '1' character is added to the path. (Every value less than 2**64 is known to reach 1 eventually. Some take a long path to get there.) Every unique value $n has a unique Collatz path, so the path can be used as a hash key to lookup the value $n. The paths are "interesting" in the sense that nearby numbers can follow vastly different paths, of vastly different lengths.

In the lower inner loop, the key of every value that $k had along its path is looked up in the hash. We take values of $k from the front of @k, and get their key by joining all the elements of @K. In the case of a hit, we check that the hash returns the correct value $k; in the case of a miss we add this $k to the DB by its key, and also write this $k and its key to the torture.n file. (For a pure "rebuild" test, take this torture.n file and read a value and key from each line.) Once we're done with each value of $k from @k, we strip one character off the front of @K, because the next $k has a one-step-shorter path.

I ran this test on two machines, running fc31 (gdbm1.18) and fc35 (gdbm1.22) on both machines. To get the performance graphs, I ran the test program in one terminal and a simple shell loop in another:

(while sleep 60; do (ls -l; free) >> torture.log; done) &

In the first terminal, the program prints a progress update for each million values of $n, and for each million values added to the DB. Values are added to the DB faster than $n counts, because it adds new values encountered along the path taken by each $n.

In the second terminal, I can tail the log file anytime to see how big the files are and how much memory the program is using. I was hoping to catch an out-of-mem condition, but never did.

But from the log file, I could extract the size of the torture.n file at 1 minute intervals. This doesn't tell me how far $n has got, but does allow me to compare the time taken to reach a given size. I read those size records into a spreadsheet, and calculated the incremental size minute by minute, and graphed the total size and increments.

Chart "torture-charts-x02-fc35-x03-fc31" shows machine "x02" on the left running fc35/gdbm1.22, vs machine "x03" on the right running fc31/gdbm1.18:

Column A (blue line, right-side vertical scale) is the total size of the torture.n file, and Column B (orange line, left-side vertical scale) is the incremental change in size per minute. The X axis is simply the number of minutes of data. All axes are the same for the two charts; it is quite clear how much faster the file is growing in the right hand chart, x03 running fc31/gdbm1.18.

Just in case "x03" is just a vastly faster machine than "x02", I reversed the conditions: Chart "torture-charts-x02-fc31-x03-fc35" shows x02 still on the left, but now running fc31/gdbm1.18, and x03 still on the right but now running fc35/gdbm1.22:

I didn't run this case nearly as long, only 6h vs over 30h before, but it clearly shows the same distinctive pattern of "swoops" in the growth rate (orange line) of the file in fc35/gdbm1.22 (the right hand chart this time), so I think it is safe to assume it will continue to play out over time in the same way as before if left running.

I know this is pretty primitive data, just graphing the size of a file over time, but the construction of the file is a constant, so the time taken to reach any given size is a rough performance metric. By the end of the 30h run, gdbm1.22 (first chart, x02, on the left) had only stored 96M keys (incidentally, just about the number of keys currently in my real application DB), but gdbm1.18 (first chart, x03, on the right) had stored 96M keys in just 10h, and went on to store over 250M keys in the full 30h.

By the end of 30h, the growth rate with gdbm1.18 has dropped pretty close to gdbm1.22. I don't know if 1.18 is really slowing down, or maybe the vast majority of values of $n in this region are simply already in the DB. Remember, the path taken by earlier values can grow (3n+1) as well as shrink (n/2), so it is possible encounter long ranges of $n values that have already been seen on previous paths. The curves might look quite different if I had only added each new $n value in counting order.

graygnuorg · 2022-01-06T14:34:04Z

Hello,

To begin with, thanks for a detailed bug report. It helped to find an inefficient routine in the GDBM library. Before returning to it, let me first address the questions posed in your initial posting.

First of all, the memory mapping support was added not "in the transition from gdbm 1.18 to 1.19", as you assumed, but much earlier: in GDBM version 1.9, dated 2011-08-12. It remains the default since then,

Secondly, the support for the GDBM_NOMMAP flag has already been added in the commit 1d7b7043625.. Additional documentation was added by commit 8b8b12225a4.

Now let's return to the problem itself. The observed slow downs during insertions happen when extensive updates of a big database file cause splitting of several key buckets in sequence (this explains the wave-like pattern). I have fixed it in GDBM commit b8c3d13fd8.

To measure the performance, I have created a benchmark suite, which was used to generate the following graph, that shows comparative results for GDBM versions 1.18, 1.22 and git master (recent commit b8c3d13fd8):

I will release the new GDBM version as soon as I finish additional testing. In the meanwhile, you can give a try to the git version. Using 1.22, you can mitigate the adverse effect of the bug by setting the minimal cache size, e.g.:

  $dbf = tie %hash, "GDBM_File", $dbname, GDBM_NEWDB, 0600;
  $dbf->cache_size(10);

(the cache_size method was introduced by Perl commit e03e7cd).

The implementation of _gdbm_cache_flush becomes prohibitively inefficient during extensive updates of large databases. The bug was reported at Perl/perl5#19306. To fix it, make sure that all changed cache entries are placed at the head of the cache_mru list, forming a contiguous sequence. This way a potentially long iteration over all cache entries can be cut off at the first entry with ca_changed == FALSE. This commit also gets rid of several superfluous fields in struct gdbm_file_info: - cache_entry Not needed, because the most recently used cache entry (cache_mru) is always the current one. - bucket_changed dbf->cache_mru->ca_changed reflects the status of the current bucket. - second_changed Not needed because _gdbm_cache_flush, which flushes all changed buckets, is now invoked unconditionally by _gdbm_end_update (and also whenever dbf->cache_mru changes). * src/gdbmdefs.h (struct gdbm_file_info): Remove cache_entry. The current cache entry is cache_mru. Remove bucket_changed, and second_changed. All uses changed. * src/proto.h (_gdbm_current_bucket_changed): New inline function. * src/bucket.c (_gdbm_cache_flush): Assume all changed elements form a contiguous sequence beginning with dbf->cache_mru. (set_cache_entry): Remove. All callers changed. (lru_link_elem,lru_unlink_elem): Update dbf->bucket as necessary. (cache_lookup): If the obtained bucket is not changed and is going to become current, flush all changed cache elements. * src/update.c (_gdbm_end_update): Call _gdbm_cache_flush unconditionally. * src/findkey.c: Use dbf->cache_mru instead of the removed dbf->cache_entry. * src/gdbmseq.c: Likewise. * tools/gdbmshell.c (_gdbm_print_bucket_cache): Likewise. * src/falloc.c: Use _gdbm_current_bucket_changed to mark the current bucket as changed. * src/gdbmstore.c: Likewise. * src/gdbmdelete.c: Likewise. Use _gdbm_current_bucket_changed. * tests/gtcacheopt.c: Fix typo. * tests/gtload.c: New option: -cachesize

kjohnstn · 2022-01-07T23:34:17Z

Thank you, Thank you, Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement request: For perl-GDBM_File to support gdbm_open "GDBM_NOMMAP" flag #19306

Enhancement request: For perl-GDBM_File to support gdbm_open "GDBM_NOMMAP" flag #19306

kjohnstn commented Dec 31, 2021

jkeenan commented Dec 31, 2021

graygnuorg commented Jan 2, 2022 via email

kjohnstn commented Jan 3, 2022

graygnuorg commented Jan 6, 2022

kjohnstn commented Jan 7, 2022

Enhancement request: For perl-GDBM_File to support gdbm_open "GDBM_NOMMAP" flag #19306

Enhancement request: For perl-GDBM_File to support gdbm_open "GDBM_NOMMAP" flag #19306

Comments

kjohnstn commented Dec 31, 2021

jkeenan commented Dec 31, 2021

graygnuorg commented Jan 2, 2022 via email

kjohnstn commented Jan 3, 2022

graygnuorg commented Jan 6, 2022

kjohnstn commented Jan 7, 2022