Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing tests on non-x86 archs #8

Closed
nileshpatra opened this issue Dec 19, 2021 · 6 comments
Closed

Failing tests on non-x86 archs #8

nileshpatra opened this issue Dec 19, 2021 · 6 comments

Comments

@nileshpatra
Copy link

Hi,

In the latest version 2.2.7, one test is failing on archs other than amd64 and i386 in debian, as can be seen here

It chokes on

expect_known_value(readLines(tmp_ads), "xampl_bib2ads.rds", update = FALSE)

more specifically, in the resultant bib file, the last U is missing on two lines:

readLines(tmp_ads) has changed from known value recorded in 'xampl_bib2ads.rds'.
2/313 mismatches
x[293]: "%R ..................."
y[293]: "%R ..................U"

x[300]: "%R 1988..............."
y[300]: "%R 1988..............U"

Could you please fix the problem?

====================================================================================================

CC: @GeoBosh

@GeoBosh
Copy link
Owner

GeoBosh commented Dec 19, 2021

Thanks for the report and pinpointing the wrong expectation. Do you know if the machines on which the test is failing have different big/little-endian conventions from the others?

I am writing the comments below for myself but I would appreciate help, since I don't know the logic of adsout.c and the part relevant to this bug is baffling to me.
The entry causing the trouble is

%R ..................U
%A Ünderwood, Ulrich; Ñet, Ned; {\=P}ot, Paul
%T Lower Bounds for Wishful Research Results
%X Talk at Fanstord University (this is a minimal UNPUBLISHED entry)
%W PHY
%G AUTHOR

The second one is a variant of it and I am not showing it here. Note that the Author's name starts with Ü. The code giving the U after the dots above is converting Ü to ASCII, if possible.

I traced the errors to

rbibutils/src/adsout.c

Lines 383 to 386 in 5adeb50

b1 = name[0]+256;
b2 = name[1]+256;
switch( b1 ) {

So, the switch is on b1 but the if-else conditionals are on b2. b1 and b2 may be bytes of a character (encoding?).
On my machines (amd64/ubuntu 20.04) the offending entries call initial_ascii and b1 and b2 have values b1 = 0xc3, b2 = 0x9c. This triggers the execution of the following line:
else if ( b2 >= 0x99 && b2 <= 0x9c ) return 'U';

So, 'U' is returned. The UTF-8 hex code c39c is indeed U-umlaut (https://www.charset.org/utf-8).
Also, c39a to c39c are UTF-8 points corresponding to U-accented (grave, etc).

So, one possible source of the bug is that the above depends on the big-endian/low-endian order.

@nileshpatra
Copy link
Author

@GeoBosh thanks for replying!

Thanks for the report and pinpointing the wrong expectation. Do you know if the machines on which the test is failing have different big/little-endian conventions from the others?

All the machines where test is failing are little endian.
But since you mentioned the relevant code below :-

I traced the errors to

rbibutils/src/adsout.c

Lines 383 to 386 in 5adeb50

b1 = name[0]+256;
b2 = name[1]+256;
switch( b1 ) {

Actually, a very common phenomenon in these archs for autopkgtest is because of char being interpreted differently. On x86 it is signed, on arm/others it is unsigned. So any time you want to work with numerical values of a char, you must specify if you want signed or unsigned - which the code is clearly doing, appending the values to int's and comparing, so maybe the value of b1 gets out of the cases it seeks, and it chokes there. I am not certain about this, but it is just a hunch

@GeoBosh
Copy link
Owner

GeoBosh commented Dec 19, 2021

Thanks, that's a very sound explanation. I noticed that the function wants b1, b2 to be positive and less than 256 (just before the lines I printed above), so clearly adding 256 almost assumes they are negative. But it didn't occur to me that char could be unsigned.

rbibutils passes the CRAN check on arm64 but that is on macos.

It looks like a fix will be fairly straightforward. Finding a suitable platform to test it might be problematic.

@GeoBosh
Copy link
Owner

GeoBosh commented Dec 20, 2021

I committed a fix that I believe should solve the issue if the cause is the signed/unsigned char.

@nileshpatra
Copy link
Author

Thank you, I have uploaded the package with that patch applied. I will report the status when the CI runs on it.

@nileshpatra
Copy link
Author

@GeoBosh CI is green now. Thank you very much for fixing it so quickly! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants