Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround XS that wants non-STRLEN for PV len #20037

Closed
wants to merge 2 commits into from

Conversation

khwilliamson
Copy link
Contributor

 This should fix GH #19983.
 
 Some of the macros that extract a PV from an SV also will set a 'len'
 parameter to how many bytes long it is.  The len parameter is supposed
 to be declared as a STRLEN (or equivalently, Size_t).  But there is a
 significant amount of code that declares the parameter wrongly, such as
 an int, and this code generally has worked.  I do believe that warnings
 are generated.  With 1ef9039 such code broke.
 
 One could view this as similar to the hash key retrieval order problem
 from years past, where we viewed the breakage as a "good thing" to catch
 real bugs early.  But in this case, an int may be large enough so that
 the issue wouldn't ever arise in practice.
 
 What this commit does is to see if the 'len' parameter is the same size
 and sign as STRLEN.  If so, it follows the code in 1ef9039.  I believe
 this is technically undefined behavior, as the only defined behavior is
 if the pointers point to the same object type, but we do such things all
 the time without negative consequences.
 
 If 'len' isn't equivalent to STRLEN, the implementation falls back to
 using gcc brace groups, when available, to only evaluate the passed in
 SV once.  If not available, it uses temporary variables for the same
 effect.

@sisyphus
Copy link
Contributor

sisyphus commented Aug 4, 2022

Allowing this mistake of passing an int* to a function that expects a STRLEN* sits a bit uncomfortably with me when the STRLEN is bigger than int && the architecture is big-endian.
On my big-endian ppc64 machine, if a C sub that expects STRLEN* is given an int*, then the 'int' ends up being zero unless the value overflows the 32 low bits.
That is, the high 32- bits of the 64-bit value is assigned to the int, not the low 32 bits.
(On little-endian machines it seems that the desired "low" bits are assigned, so it's less of a concern with them.)

Below is an XS (Inline::C) demo script that exhibits such behaviour on a big-endian machine with intsize=4 and sizesize=8.
However, I've been unable to reproduce this issue on that machine by specifying an int as the second arg to SvPV() - so perhaps there's no need to worry about this usage wrt SvPV().
Perhaps the perl internals take care of it (via casts ?), or maybe there's something wrong with my demo, or maybe this is just the beauty of undefined behaviour.

Here's the demo script:

use strict;
use warnings;
use Config;

use Inline C => Config =>
  BUILD_NOISY => 1,
;

use Inline C => <<'EOC';

void fu(STRLEN * in, STRLEN * out) {
  *out = *in;
}

void foo(SV * in) {
  dXSARGS;
  int i, count = 0, x = 0;
  STRLEN a = (STRLEN)SvUV(in);

  fu(&a, &x);

  for(i = 0; i < x; i++) {
    count++;
  }

  printf("count: %d\n", count);

  ST(0) = sv_2mortal(newSVuv(x)); /* the int */
  ST(1) = sv_2mortal(newSVuv(a)); /* the STRLEN */
  XSRETURN(2);
}

EOC

print "BYTEORDER:   $Config{byteorder}\n";
print "STRLEN size: $Config{sizesize} \n";
print "INT SIZE   : $Config{intsize}  \n\n";

for(1, 1000, 4294967295, 4294967296, 8589934591, 8589934592) {
  my @ret = foo($_);
  print "$ret[1] becomes $ret[0]\n\n";
}

And here's the output:

BYTEORDER:   87654321
STRLEN size: 8
INT SIZE   : 4

count: 0
1 becomes 0

count: 0
1000 becomes 0

count: 0
4294967295 becomes 0

count: 1
4294967296 becomes 1

count: 1
8589934591 becomes 1

count: 2
8589934592 becomes 2

On a little-endian machine, I find that there's no problem until the value of the STRLEN overflows 32-bits - which is going to be a pretty rare case where SvPV() is concerned.
The only warnings issued during compilation are the expected ones:

strlen_pl_c68c.xs: In function ‘foo’:
strlen_pl_c68c.xs:15:3: warning: passing argument 2 of ‘fu’ from incompatible pointer type [enabled by default]
strlen_pl_c68c.xs:6:6: note: expected ‘STRLEN *’ but argument is of type ‘int *’

Cheers,
Rob

@dur-randir
Copy link
Member

I'd be really, really wary for commiting workarounds that may break silently - as they're the worst ones to debug. Luckily, I can't imagine any easy way to break this (get magic looks like the only one, and they aren't common on cpan), but conceptually introducing those scratch_* variables really bothers me. Does the original idea of "evaluate SV in a macro once" really warrants this?

PS: shouldn't scratch_* vars be a subject to the same ifdef as theirs usage?

@khwilliamson
Copy link
Contributor Author

To @sisyphus, I used U16 to test this.

To @RandiR. I'm unsure as to if it is worth it or not. But this implementation is less susceptible to magic and overloading issues than the one it replaces.

But we can make it as loud as we want, including refusing to compile, or deprecating calling with a too-small len parameter, or issuing a runtime warning at the first wrong call of these at each calling instance, as examples

The core has used PL_na and PL_Sv for many years to implement things like this. But they are API. I personally spent too many hours finding a bug that turned out to be caused by having one function that used them calling another function that used them. The core needs to stop using those API ones regardless of this PR. This would be a step towards that, and so the variables are not #ifdefd Of course, each such potential use would have to be examined to be sure that the same call stack error couldn't occur. But having core-only ones would allow us to control things better than currently.

Other options include:

  1. Doing the single evaluation only for the nolen forms, and documenting that you can do a SvCUR yourself afterwards if you want that behavior

  2. Reverting the behavior to the old way of doing things if the len parameter isn't declared STRLEN, and documenting that. So XS writers would get single-evaluation if they used the proper declaration.

@tonycoz
Copy link
Contributor

tonycoz commented Aug 17, 2022

However, I've been unable to reproduce this issue on that machine by specifying an int as the second arg to SvPV() - so perhaps there's no need to worry about this usage wrt SvPV().

The problem is that when SvPV() was more than a thin wrapper around a function, it could safely be called with a non-STRLEN len parameter for many cases:

#define SvPV_flags(sv, len, flags) \
    (SvPOK_nog(sv) \
     ? ((len = SvCUR(sv)), SvPVX(sv)) : sv_2pv_flags(sv, &len, flags))

But now to avoid repeated evaluation of macro parameters it directly calls a function:

#define SvPV_flags(sv, len, flags)                                          \
   Perl_SvPV_helper(aTHX_ sv, &len, flags, SvPVnormal_type_,                \
                    Perl_sv_2pv_flags, FALSE, 0)

which causes "stack smashing" as described in #19983 and similar problems even for non-magical POK SVs, which the older definition allowed to "work".

Both versions caused compilers to produce a type mismatch diagnostic, which unfortunately some CPAN authors ignored.

@jkeenan
Copy link
Contributor

jkeenan commented Aug 24, 2022

@tonycoz, @khwilliamson, Can we get an update on the status of this p.r. (which now has merge conflicts)?

I can confirm that we are still getting CPANtester failure reports (below, Linux, unthreaded, gcc-4.9.2) on some of the distros cited in #19983:

test FAIL Locale-Hebrew-1.05 (perl-5.37.4) x86_64-linux 5.17.5-x86_64-linode154
test FAIL Digest-OAT-0.04 (perl-5.37.4) x86_64-linux 5.17.5-x86_64-linode154
...
test FAIL Math-FastGF2-0.07 (perl-5.37.4) x86_64-linux 5.17.5-x86_64-linode154

Thank you very much.

@khwilliamson
Copy link
Contributor Author

khwilliamson commented Oct 11, 2022 via email

@khwilliamson
Copy link
Contributor Author

We don't seem to be getting new failures from this, so it looks like the problems are limited in scope.

These are for internal use only, and won't collide with external uses of
the public API' PL_na'.
This should fix GH Perl#19983.

Some of the macros that extract a PV from an SV also will set a 'len'
parameter to how many bytes long it is.  The len parameter is supposed
to be declared as a STRLEN (or equivalently, Size_t).  But there is a
significant amount of code that declares the parameter wrongly, such as
an int, and this code generally has worked.  I do believe that warnings
are generated.  With 1ef9039 such code broke.

One could view this as similar to the hash key retrieval order problem
from years past, where we viewed the breakage as a "good thing" to catch
real bugs early.  But in this case, an int may be large enough so that
the issue wouldn't ever arise in practice.

What this commit does is to see if the 'len' parameter is the same size
and sign as STRLEN.  If so, it follows the code in 1ef9039.  I believe
this is technically undefined behavior, as the only defined behavior is
if the pointers point to the same object type, but we do such things all
the time without negative consequences.

If 'len' isn't equivalent to STRLEN, the implementation falls back to
using gcc brace groups, when available, to only evaluate the passed in
SV once.  If not available, it uses temporary variables for the same
effect.
@demerphq
Copy link
Collaborator

demerphq commented Feb 8, 2023

I think we should merge this. @tonycoz, @sisyphus, @dur-randir any feedback on this?

@tonycoz
Copy link
Contributor

tonycoz commented Feb 12, 2023

  1. Doing the single evaluation only for the nolen forms, and documenting that you can do a SvCUR yourself afterwards if you want that behavior

This wouldn't be correct, SvCUR() might not be set (when SvPV() calls an overload) or for a different representation (SvPVutf8() on a read only SV).

@tonycoz
Copy link
Contributor

tonycoz commented Feb 12, 2023

I think we should merge this. @tonycoz, @sisyphus, @dur-randir any feedback on this?

I'm inclined away from this, the type mismatches in the failing modules are diagnosed by the compiler - the modules were always broken.

But that's a soft rejection, I wouldn't merge it, but I won't object if someone else does.

@khwilliamson
Copy link
Contributor Author

I would push for merging this if we had continued to get failure reports. But the lack of them after the initial flurry, indicates to me that there aren't that many more modules out there that failed to act on the compiler warnings they had presented to them.

@sisyphus
Copy link
Contributor

Another alternative, and one that I would much prefer to see, is for someone to take ownership or co-maintenance of these modules and release new versions of them with the proper fix in place.
I'm quite happy to be that someone. I'm also quite happy for someone else to instead take on that role for one or more of the affected modules.

AFAICS, there are 3 modules affected:

  1. Locale::Hebrew (which is up for adoption)
  2. Digest::OAT
  3. Math::FastGF2

Are there any other affected modules with non-responsive owners/maintainers that are similarly affected ?

I find it a bit sad that we're even considering covering up these coding errors simply because we're having trouble finding someone to apply the correct fixes to them.
However, I won't complain if the cover-up patch is committed.

Cheers,
Rob

@dur-randir
Copy link
Member

I agree with Tony. I can't say this'd break anything, just my feelings.

@khwilliamson
Copy link
Contributor Author

I would push for merging this if we had continued to get failure reports. But the lack of them after the initial flurry, indicates to me that there aren't that many more modules out there that failed to act on the compiler warnings they had presented to them.

@khwilliamson
Copy link
Contributor Author

I'm reopening this for further comments, based on new information from Debian. reported by @ntyni

I can't find the details just now. They had to do a workaround for a module that is important but isn't being maintained, and with little possibility of it getting maintained. Maybe Niko can fill in some details.

@tonycoz
Copy link
Contributor

tonycoz commented Feb 14, 2024

@khwilliamson
Copy link
Contributor Author

There doesn't seem to be a need for this still, so reclosing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants