Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BBC: Blead Breaks Config::Model #18667

Closed
eserte opened this issue Mar 28, 2021 · 14 comments
Closed

BBC: Blead Breaks Config::Model #18667

eserte opened this issue Mar 28, 2021 · 14 comments
Labels
BBC Blead Breaks CPAN - changes in blead broke a cpan module(s) Closable? We might be able to close this ticket, but we need to check with the reporter hasPatch hasTest target-5.34

Comments

@eserte
Copy link
Contributor

eserte commented Mar 28, 2021

DDUMONT/Config-Model-2.141.tar.gz fails on some systems since 5.33.5 or so.
Discussion happened so far in the CPAN module: dod38fr/config-model#26
but it's likely to be a bleadperl issue.

@eserte eserte added BBC Blead Breaks CPAN - changes in blead broke a cpan module(s) Needs Triage affects-5.34 labels Mar 28, 2021
@hvds
Copy link
Contributor

hvds commented Mar 29, 2021

Not sure whether to comment here or at config-model, but I've started trying to look into this.

(I tried running the test script with -Dm, but the address for which "attempt to free unreferenced scalar" was reported did not appear in any of the alloc or free diagnostics, so I may be misunderstanding what -Dm does.)

Investigating one of the SVs for which it reports "attempt to free unreferenced scalar", it appears that this is part of a hash that allows lookup by the numification of a Config::Model object:

(gdb) print Perl_sv_dump(PL_tmps_stack[14])
SV = IV(0x5555581efef0) at 0x5555581eff00
  REFCNT = 1
  FLAGS = (ROK)
  RV = 0x5555581e9148
    SV = PVHV(0x55555884e2e0) at 0x5555581e9148
      REFCNT = 1
      FLAGS = (OOK,SHAREKEYS)
      AUX_FLAGS = 0
      ARRAY = 0x555558281180  (0:6, 1:2)
      hash quality = 125.0%
      KEYS = 2
      FILL = 2
      MAX = 7
      RITER = -1
      EITER = 0x0
      RAND = 0xd738dcb0
        Elt "93825001673328" HASH = 0x55b53aca
        SV = IV(0x555558265758) at 0x555558265768
          REFCNT = 1
          FLAGS = (IOK,pIOK)
          IV = 1
        Elt "93825001672968" HASH = 0xbf143ddf
        SV = IV(0x555558265aa0) at 0x555558265ab0
          REFCNT = 1
          FLAGS = (IOK,pIOK)
          IV = 1
$13 = void
(gdb) p /x 93825001673328
$14 = 0x555555e55270
(gdb) print Perl_sv_dump((SV*)0x555555e55270)
SV = PVHV(0x555555e41f40) at 0x555555e55270
  REFCNT = 1
  FLAGS = (OBJECT,OOK,SHAREKEYS)
  STASH = 0x55555675dd00        "Config::Model"
  AUX_FLAGS = 0
  ARRAY = 0x5555581ee4d0  (0:9, 1:5, 2:2)
  hash quality = 103.8%
  KEYS = 9
  FILL = 7
  MAX = 15
  RITER = -1
  EITER = 0x0
  RAND = 0xbaf865e8
  BACKREFS = 0x5555581f4908
    SV = PVAV(0x5555581d97a0) at 0x5555581f4908
      REFCNT = 2
      FLAGS = ()
      ARRAY = 0x555558260700
      FILL = 7
      MAX = 14
      FLAGS = ()

@dod38fr's suggestion that this may be related to use of weakrefs looks quite likely at the moment.

But the 67 module dependencies do make approaches such as bisection a challenge; I may have a go at bisecting manually later.

@hvds
Copy link
Contributor

hvds commented Mar 29, 2021

I may have a go at bisecting manually later.

There are only 'skip'ped commits left to test.
The first bad commit could be any of:
440c185
399fef9

I had a build failure on the first of those (440c185) which I don't understand - it wants 'char *', but that's in the list:

make[1]: Entering directory '/src/perl/git/ext/DynaLoader'
"../../miniperl" "-I../../lib" "../../lib/ExtUtils/xsubpp" -noprototypes -typemap '/src/perl/git/ext/DynaLoader/../../lib/ExtUtils/typemap'  DynaLoader.xs > DynaLoader.xsc
Could not find a typemap for C type 'char *'.
The following C types are mapped by the current typemap:
'AV *', 'Boolean', 'CV *', 'FILE *', 'FileHandle', 'HV *', 'I16', 'I32', 'I8', 'IV', 'InOutStream', 'InputStream', 'NV', 'OutputStream', 'PerlIO *', 'Result', 'STRLEN', 'SV *', 'SVREF', 'SysRet', 'SysRetLong', 'Time_t *', 'U16', 'U32', 'U8', 'UV', 'bool', 'bool_t', 'caddr_t', 'char', 'char *', 'char **', 'const char *', 'double', 'float', 'int', 'long', 'short', 'size_t', 'ssize_t', 'time_t', 'unsigned', 'unsigned char', 'unsigned char *', 'unsigned int', 'unsigned long', 'unsigned long *', 'unsigned short', 'void *', 'wchar_t', 'wchar_t *'
 in DynaLoader.xs, line 176
Makefile:385: recipe for target 'DynaLoader.c' failed
make[1]: *** [DynaLoader.c] Error 1

@richardleach do you have any clue how one of these commits could have caused an instability, possibly relating to weak refs?

@hvds
Copy link
Contributor

hvds commented Mar 29, 2021

I had a build failure on the first of those (440c185) which I don't understand

I guess this must be caused by corruption of *maxp in av_extend_guts(), since for that one commit it can get set from an uninitialized value.

@hvds
Copy link
Contributor

hvds commented Mar 29, 2021

@richardleach I can confirm that if I take blead@4e82c85b and replace the content of av_extend_guts() with the code as it was before 440c185, I can run Config::Model's tests without a problem. Unfortunately the full test file (t/array_id.t) hits that function 57,024 times, so I'm going to have to see if I can cut it down some.

@hvds
Copy link
Contributor

hvds commented Mar 30, 2021

I managed to cut it down some; this is very clearly now a perl bug:

% ./perl -we '
  my @a = (undef) x 4;
  for (0 .. 3) {
    $a[$_] = undef;
    splice @a, $_, 1;
  }
'
Attempt to free unreferenced scalar: SV 0x560bc9121fb0 at -e line 3.
% 

[Update] Unrolling the loop allows a slight further reduction:

my @data = (undef) x 4;
splice @data, 1, 1;
splice @data, 2, 1;
$data[3] = undef;
splice @data, 3, 1;

Curiously, if the first two splices are combined into splice @data, 1, 2 the error disappears.

@hvds
Copy link
Contributor

hvds commented Mar 31, 2021

I won't have time to dig further for the next 24 hours or so, but with the reduced test case it should be relatively easy to diagnose now.

@hvds
Copy link
Contributor

hvds commented Apr 1, 2021

@richardleach @tonycoz maybe one of you two could take a look?

Ok, it appears the problem is that in Perl_av_extend_guts in the case where the array is shifted, we always Move() the content up to the front of the array; it appears that with the updated code, the SVs copied down but not overwritten by later ones are no longer cleared - eg we move (0, sv1, sv2, 0) to (sv1, sv2, sv2, 0), but no longer clear the now-duplicate copy of sv2.

In the test case (the updated, unrolled one above), the second call after Perl_run() starts is of this nature. With the old extend_guts code, we had:

# on entry
(gdb) p allocp[0][0]@4
$1 = {0x0, 0x555555dfb5a8, 0x555555dfb770, 0x0}
# after the Move()
(gdb) p allocp[0][0]@4
$2 = {0x555555dfb5a8, 0x555555dfb770, 0x555555dfb770, 0x0}
# after the immediate clear
(gdb) p allocp[0][0]@4
$3 = {0x555555dfb5a8, 0x555555dfb770, 0x0, 0x0}
# after the subsequent renew and second clear
(gdb) p allocp[0][0]@7 
$4 = {0x555555dfb5a8, 0x555555dfb770, 0x0, 0x0, 0x0, 0x0, 0x0}

With the new extend_guts code, the first clear never happens, and we get:

# on entry
(gdb) p allocp[0][0]@4
$1 = {0x0, 0x555555dfb5a8, 0x555555dfb770, 0x0}
# after the Move()
(gdb) p allocp[0][0]@4
$2 = {0x555555dfb5a8, 0x555555dfb770, 0x555555dfb770, 0x0}
# after the renew and Zero()
(gdb) p allocp[0][0]@7
$3 = {0x555555dfb5a8, 0x555555dfb770, 0x555555dfb770, 0x0, 0x0, 0x0, 0x0}

.. and that now duplicate SV* is the one we end up double-freeing.

It looks like when shifting the array contents the new code is correctly adjusting the number of elements to clear (to_null), but failing to adjust ary_offset which tells it from what offset to clear. However if I try to fix this with:

--- a/av.c
+++ b/av.c
@@ -109,6 +109,7 @@ Perl_av_extend_guts(pTHX_ AV *av, SSize_t key, SSize_t *maxp, SV ***allocp,
 
         if (av && *allocp != *arrayp) { /* a shifted SV* array exists */
             to_null = *arrayp - *allocp;
+            ary_offset -= to_null;
             *maxp += to_null;
 
             Move(*arrayp, *allocp, AvFILLp(av)+1, SV*);

.. it crashes trying to build lib/buildcustomize.pl, so I guess it's not as simple as that.

(I'm also confused why we check if (key > *maxp - 10) - in this case, we have key <= *maxp so we could get away without reallocing at all. This seems like it will only ever cause unnecessary reallocs as well as needing to copy the array contents twice each time, but it has (impressively) been in there since a687059, when this file was still called array.c.)

@hvds
Copy link
Contributor

hvds commented Apr 2, 2021

I've been trying to analyse the effects of the changes in 440c185 and 399fef9, but with little success. After some preamble, the core of the code is:

  if (av && *allocp != *arrayp) {
    /* case 1: a shifted SV* array exists */
  } else if (*allocp) {
    /* case 2: a full SV* array exists */
  } else {
    /* case 3: there is no SV* array yet */
  }

As far as I can see, the functionality of case 2 is correctly preserved through the changes. Case 3 differs by zeroing the newly allocated array even when not (av && AvREAL(av)); that doesn't seem useful, and the difference could be reverted by removing the goto zero; and the associated label. Case 1 is the one I've struggled to analyse: as noted in my previous message the new code is certainly failing to clear the top of the shifted list of SVs, which is enough to cause the reported issue, but I've been unable to come up with a working remedy for that.

If nobody comes up with a better idea, I plan to revert those two commits to restore it to a working state. Optionally we could also then apply a minimal commit to replace each occurrence of while (tmp) ary[--tmp] = NULL with Zero(ary, tmp, SV*), which I think will deliver most of the intent of the original pair of patches.

@richardleach
Copy link
Contributor

@hvds - thanks for looking into this, I've very limited bandwidth at the moment but will try to take a look over the Easter weekend.

@richardleach
Copy link
Contributor

richardleach commented Apr 2, 2021

(I'm also confused why we check if (key > *maxp - 10) - in this case, we have key <= *maxp so we could get away without reallocing at all. This seems like it will only ever cause unnecessary reallocs as well as needing to copy the array contents twice each time, but it has (impressively) been in there since a687059, when this file was still called array.c.)

Yes, I had also wondered why that was the case. IIRC, I queried it in the PR (#18072) for the changes above, but noone perked up with suggestions as to why that is as it is. So I left it as-is.

@richardleach
Copy link
Contributor

It looks like when shifting the array contents the new code is correctly adjusting the number of elements to clear (to_null), but failing to adjust ary_offset which tells it from what offset to clear.

Yes, looks like ary_offset isn't set correctly when the array is both unshifted and extended. Seems like it should be possible to correct, I'll have more of a look later.

(I'm also confused why we check if (key > *maxp - 10) - in this case, we have key <= *maxp so we could get away without reallocing at all. This seems like it will only ever cause unnecessary reallocs as well as needing to copy the array contents twice each time, but it has (impressively) been in there since a687059, when this file was still called array.c.)

Yes, I had also wondered why that was the case. IIRC, I queried it in the PR (#18072) for the changes above, but noone perked up with suggestions as to why that is as it is. So I left it as-is.

Oh, one possibility: if the shift & unshift/push happens lots in a loop, extending it might reduce the number of times that av_extend() has to be called.

@hvds
Copy link
Contributor

hvds commented Apr 3, 2021

(I'm also confused why we check if (key > *maxp - 10) - in this case, we have key <= *maxp so we could get away without reallocing at all. This seems like it will only ever cause unnecessary reallocs as well as needing to copy the array contents twice each time, but it has (impressively) been in there since a687059, when this file was still called array.c.)

Yes, I had also wondered why that was the case. IIRC, I queried it in the PR (#18072) for the changes above, but noone perked up with suggestions as to why that is as it is. So I left it as-is.

Oh, one possibility: if the shift & unshift/push happens lots in a loop, extending it might reduce the number of times that av_extend() has to be called.

I've created a new issue #18681 to consider this in more detail. It's mostly questions right now ...

@richardleach richardleach added the Closable? We might be able to close this ticket, but we need to check with the reporter label Apr 5, 2021
@hvds
Copy link
Contributor

hvds commented Apr 6, 2021

If someone can confirm that Config-Model-2.141 passes tests with some blead after ce9f3c9, this ticket can be closed.

@jkeenan
Copy link
Contributor

jkeenan commented Apr 7, 2021

If someone can confirm that Config-Model-2.141 passes tests with some blead after ce9f3c9, this ticket can be closed.

Linux

$ uname -mrs
Linux 5.10.13-x86_64-linode141 x86_64

$ ./bin/perl -v | head -2 | tail -1
This is perl 5, version 33, subversion 9 (v5.33.9 (v5.33.8-26-g60eec70)) built for x86_64-linux

$ ./bin/perl -MConfig::Model -E 'say $Config::Model::VERSION;'
2.141

FreeBSD-12

$ uname -mrs
FreeBSD 12.2-STABLE amd64

$ ./bin/perl -v | head -2 | tail -1
This is perl 5, version 33, subversion 9 (v5.33.9 (v5.33.8-26-g60eec70fe6)) built for amd64-freebsd-thread-multi

$ ./bin/perl -MConfig::Model -E 'say $Config::Model::VERSION;'
2.141

LGTM. Closing ticket.

@jkeenan jkeenan closed this as completed Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BBC Blead Breaks CPAN - changes in blead broke a cpan module(s) Closable? We might be able to close this ticket, but we need to check with the reporter hasPatch hasTest target-5.34
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants