Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debian testing issue on VM #7

Closed
eddelbuettel opened this issue Feb 22, 2020 · 40 comments
Closed

Debian testing issue on VM #7

eddelbuettel opened this issue Feb 22, 2020 · 40 comments

Comments

@eddelbuettel
Copy link
Owner

While testing reverse dependencies for Rcpp, I noticed that RcppSimdJson is flagged as having an issue. And I can repeat that issue on the test box.

The machine is a VM running Debian testing, generally up-to-date, with g++ at version 9.2.1-28 (9.2.1 20200203).

What happens is that when I run example(validateJSON) which just invokes validateJSON() with the included (unmodified) twitter.json file just like the basic simdjson example does:

 simdjson::padded_string p = get_corpus(filename.c_str());
  simdjson::ParsedJson pj = simdjson::build_parsed_json(p); // do the parsing
  if ( !pj.is_valid() ) {
    // something went wrong
    Rcpp::stop(pj

We end up in the 'not valid' branch, but do not get useable error message. All we get is 'uninitialized'.

I verified that the same fate happens with the sligtly earlier RcppSimdJson 0.0.1 so it is not recent. Could this possibly be compiler settings we get from the distribution? Invoking R CMD INSTALL on the sources leads (on that machine, with its Debian settings) to

g++ -std=gnu++17 -I"/usr/share/R/include" -DNDEBUG  -I"/usr/local/lib/R/site-library/Rcpp/include"  -I../inst/include -fopenmp -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-MP6Q4u/r-base-3.6.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c RcppExports.cpp -o RcppExports.o
g++ -std=gnu++17 -I"/usr/share/R/include" -DNDEBUG  -I"/usr/local/lib/R/site-library/Rcpp/include"  -I../inst/include -fopenmp -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-MP6Q4u/r-base-3.6.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c simdjson_example.cpp -o simdjson_example.o
g++ -std=gnu++17 -shared -L/usr/lib/R/lib -Wl,-z,relro -o RcppSimdJson.so RcppExports.o simdjson_example.o -fopenmp -L/usr/lib/R/lib -lR

Is there a possible interaction from -fstack-protector-strong -Wformat -Werror=format-security?

Or is it a likely interaction from the fact that I am ... in a wimpy VM and simdjson wants to talk more directly to the underlying, but virtualized away, metal? This may be the more likely culprit in which case I will just whitelist the package for the reverse dependencies checks on this VM.

@lemire Any hints from you would be most welcome, as always ;-)

@eddelbuettel eddelbuettel changed the title Debian testing issue Debian testing issue on VM Feb 22, 2020
@lemire
Copy link
Collaborator

lemire commented Feb 22, 2020

Ping @jkeiser

@lemire
Copy link
Collaborator

lemire commented Feb 22, 2020

If the hardware was not supported, you would get "UNSUPPORTED_ARCHITECTURE".

@eddelbuettel
Copy link
Owner Author

Right on. I recall scanning the C++ code and seeing a fallback for chipset detection.

For completeness, I had sprayed a few prints in. With those I get

dirk@cloud-devel:~/tmp/prrd/RcppSimdJson.Rcheck/00_pkg_src/RcppSimdJson$ Rscript -e 'library(RcppSimdJson); example(validateJSON)'

vlJSONR> jsonfile <- system.file("jsonexamples", "twitter.json", package="RcppSimdJson")

vlJSONR> validateJSON(jsonfile)
File is /usr/local/lib/R/site-library/RcppSimdJson/jsonexamples/twitter.json
String is padded 
String is parsed: with error
Uninitialized
Error in .validateJSON(Sys.glob(jsonfile)) : Uninitialized
Calls: example ... withVisible -> eval -> eval -> validateJSON -> .validateJSON
Execution halted
dirk@cloud-devel:~/tmp/prrd/RcppSimdJson.Rcheck/00_pkg_src/RcppSimdJson$ 

So I end up with a bad validation (only on that hardware platform so far; CRAN actually tests with Debian testing too) but the error printing help message comes back with the indicated lack of context / initialization. And, just as a reminder, I am running off the singleheader/ directory code from your repo.

@lemire
Copy link
Collaborator

lemire commented Feb 22, 2020

Can you make this VM accessible to the simdjson team, via ssh... if only temporarily?

@lemire
Copy link
Collaborator

lemire commented Feb 22, 2020

No matter what, it should not report "uninitialized" when it fails to parse... this is bad. We need to investigate.

@eddelbuettel
Copy link
Owner Author

Can you make this VM accessible to the simdjson team, via ssh... if only temporarily?

I doubt it. It is in a university data center in Europe and I just a guest myself so there is a fair amount of tape of a certain color ...

@eddelbuettel
Copy link
Owner Author

But I can ask. It may take a few (work) days though.

@jkeiser
Copy link

jkeiser commented Feb 22, 2020

Yes, UNINITIALIZED indicates an internal error.

I doubt it is directly related, but if you could add a printf("implementation: %s\n", simdjson::active_implementation.name().c_str()); that would at least tell us what it detected your machine as.

@jkeiser
Copy link

jkeiser commented Feb 22, 2020

Er. Checking: what simdjson are you using? I was sort of assuming master, because UNSUPPORTED_ARCHITECTURE and active_implementation were introduced in master only yesterday.

@lemire we do have potential for dueling error codes where sometimes ParsedJson.error could be set to one thing while json_parse returns something else. Could possibly be that one or the other has the real code ...

@eddelbuettel
Copy link
Owner Author

eddelbuettel commented Feb 22, 2020

That doesn't build -- I am using a current version of the two files in singleheader/ which themselves go back to back to a Feb 3 commit by you from #480.

@jkeiser
Copy link

jkeiser commented Feb 22, 2020

Oh! Let me update those files. We generally don't do that until there is a release.

I don't know if latest master will fix your problem, but we should probably at least try to get there so we're all working off the same code. I know I did make changes to how error codes are passed around, and that will also let you do the active_implementation code to rule out unsupported architecture issues.

@jkeiser
Copy link

jkeiser commented Feb 22, 2020

OK, headers in master are up to date.

@jkeiser
Copy link

jkeiser commented Feb 22, 2020

Let me know if there are issues building against it, too: I've been changing up the interfaces a bit. I kept backwards compatibility shims to keep existing code building, but it's always possible there are use cases somewhere not covered by our tests :)

@eddelbuettel
Copy link
Owner Author

That worked. (It needed ->name() not .name() but small fries...) Now I get

implementation: unsupported
simdjson does not have an implementation supported by this CPU architecture (perhaps it's a non-SIMD CPU?).

@jkeiser
Copy link

jkeiser commented Feb 22, 2020

OK, that does help. Can you run lscpu or cat /etc/cpuinfo on it?

@eddelbuettel
Copy link
Owner Author

dirk@cloud-devel:~/tmp/prrd/RcppSimdJson.Rcheck/00_pkg_src/RcppSimdJson$ lscpu 
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   36 bits physical, 48 bits virtual
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       2
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           15
Model name:                      Intel(R) Xeon(R) CPU            5140  @ 2.33GHz
Stepping:                        6
CPU MHz:                         1996.695
CPU max MHz:                     2331.0000
CPU min MHz:                     1998.0000
BogoMIPS:                        4655.18
Virtualization:                  VT-x
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        8 MiB
NUMA node0 CPU(s):               0-3
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX EPT disabled
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT disabled
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, STIBP disabled, RSB filling
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good 
                                 nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm pti tpr_shadow dtherm
dirk@cloud-devel:~/tmp/prrd/RcppSimdJson.Rcheck/00_pkg_src/RcppSimdJson$

@eddelbuettel
Copy link
Owner Author

eddelbuettel commented Feb 22, 2020


processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU            5140  @ 2.33GHz
stepping        : 6
microcode       : 0x44
cpu MHz         : 1998.536
cache size      : 4096 KB
physical id     : 3
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 7
initial apicid  : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm pti tpr_shadow dtherm
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds
bogomips        : 4655.13
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

(Just the fourth of four.)

From what I recall from conversations with the data center, it is a somewhat dated box. Which, now that I think about it, makes perfect sense. It's probably a decade-old Xeon cpu.

Now I feel sheepish for wasting everybody's time :-/

Shall we add a simple predicate testing that we don't find a 386 or alike?

@lemire
Copy link
Collaborator

lemire commented Feb 22, 2020

You are not wasting anybody's time. The machine in question is indeed quite old (going back to 2006). We do not support such old systems.

However, we need to have better error reporting.

@eddelbuettel
Copy link
Owner Author

(Portable) hardware abstraction is hard. I know there is a sub-library within OpenMPI that does that. Is there something 'cheaper' we can do?

@jkeiser
Copy link

jkeiser commented Feb 22, 2020

Yep. I think master now covers the case pretty well as long as the user reports the error message.

Is the idea to make it not compile (at least by default) if you are on a machine that couldn't run it? That's not a bad idea at all.

@eddelbuettel
Copy link
Owner Author

Or compile but have a true/false flag indicating what we're at -- like capabilities[] vector.

@eddelbuettel
Copy link
Owner Author

Also seeing it in the macOS machine(s) at CRAN: https://www.r-project.org/nosvn/R.check/r-release-osx-x86_64/RcppSimdJson-00check.html

I guess my added check for C++17 wasn't enough. :-/

@eddelbuettel
Copy link
Owner Author

What about code like the one in simdjson::include/simdjson/isadetection.h ? Can we not derive a 'yes/no' heuristic from this about whether we have snowball's chance in hell to compile on the present machine or not?

@lemire
Copy link
Collaborator

lemire commented Feb 24, 2020

@eddelbuettel Do you mean "compile" or "run"?

You write compile. Let me first handle decisively the "compile" issue. You need C++17. If you have a 64-bit system and a compiler that supports C++17 and simdjson does not compile, then report a bug.

I guess my added check for C++17 wasn't enough. :-/

I am not 100% sure, but I think that the error that we are seeing is at runtime... correct? That is, it tries to parse file twitter.json and then fails.

If you mean run, and I think you do, then this is entirely different from "compiing" because whether you can run the code or not is a runtime determination. You can fail to compile simdjson on a machine because it does not have the proper compiler, and yet you could be able to run a binary (compiled elsewhere) and vice versa. So it is not at the compilation that you should check support, it is at runtime.

You are getting an "Unitialized" error. I expect that it is a runtime error (maybe the CPU is too old?).

Unlike @jkeiser, I don't think this is covered in simdjson/master. I think that there is an issue if only a matter of documentation. I am investigating.

@eddelbuettel
Copy link
Owner Author

I may have been sloppy -- I meant "compile" as in "install". I understand when something can't run, the C++17 requirements are clear to me. Yet the CRAN setup only allows me to exclude Windows (as it is currently hamstruck by an old g++-4.9.3, hopefully replace by g++-8.* in a few weeks if all goes well) but not the other. And if you go to https://cloud.r-project.org/web/checks/check_results_RcppSimdJson.html you see

image

and I am essentially reasoning about ways to work around the limitations of the ERRORing systems.
Plus ones like the old Xeon from 2006 that confused me on Saturday.

Testing for __cplusplus is one good way. It would be really nice to have a 'cheap' way to check hardware as well.

@lemire
Copy link
Collaborator

lemire commented Feb 24, 2020

Back to the original issue. So as much as I'd like to fault @jkeiser, I cannot reproduce the issue with master. Here is the program that I test...

  simdjson::padded_string p = simdjson::get_corpus(filename);
  simdjson::ParsedJson pj = simdjson::build_parsed_json(p); // do the parsing
  if(!pj.is_valid()) {
    std::cout << pj.get_error_message() << std::endl;
  }

Now, running via the Intel emulator, I can "emulate" the old hardware you have access to:

$ sde  -pnr --  myprogram jsonexamples/twitter.json 
simdjson does not have an implementation supported by this CPU architecture (perhaps it's a non-SIMD CPU?).

It gives the error code 16 as expected.

It is possible that you are testing with older code... possibly... where we have this bug, but as it is, I cannot seem to reproduce the issue.

@lemire
Copy link
Collaborator

lemire commented Feb 24, 2020

and I am essentially reasoning about ways to work around the limitations of the ERRORing systems.

Ok. So for solaris, you have "C++17 standard requested but CXX17 is not defined". That seems straight-forward. I don't know who is using Solaris these days, but it is pretty much legacy at this point, is it not? Can you even buy a Solaris box these days?

On the osx boxes, you get "Uninitialized". If it is a matter of the CPU being unsupported, you should be getting UNEXPECTED_ERROR. I am trying to reproduce the issue, but it is kind of difficult.

@jkeiser
Copy link

jkeiser commented Feb 24, 2020

@lemire he is now getting UNSUPPORTED_ARCHITECTURE, I'm fairly certain. This might clear it up (@eddelbuettel please confirm):

  • When he got "uninitialized", @eddelbuettel was not running the latest code from master--rather, he was using the amalgamated headers, which were last updated on Feb. 8.
  • I then updated the amalgamated headers in master without asking (let me know if you are worried about that, but I was pretty confident you wouldn't be, and it fixed his problem).
  • After updating to the latest code, and the new error message, meaning he must have got UNSUPPORTED_ARCHITECTURE back.

I assumed it was my fault at first, too :) In fact, it still may have been--it's possible some of my code before Feb 8 screwed this up.

@lemire
Copy link
Collaborator

lemire commented Feb 24, 2020

@jkeiser The error logs are dated from today... so I am going with the assumption that the error still remains...???

@jkeiser jkeiser mentioned this issue Feb 24, 2020
11 tasks
@lemire
Copy link
Collaborator

lemire commented Feb 24, 2020

Oh. I see... Though @eddelbuettel ran private tests with updated code, I don't think he updated his repo.

@eddelbuettel Can you update your code to what we have in master under github/.../simdjson ? My hope is that your uninitialized errors will go away.

Note that you probably want to handle gracefully the error code UNSUPPORTED_ARCHITECTURE if the servers you are using a too old. That is, getting UNSUPPORTED_ARCHITECTURE is not a sign of failure per se...

@eddelbuettel
Copy link
Owner Author

Let's slow down for moment, if we could. I have a pretty firm handle on the CRAN system and build approaches. That is what I asked here if you can help me test for particular hardware that is likely to fail. I understand that is hard(ish) problem so if we can't, we can't.

(The Solaris issues of 'C++17 standard requested but CXX17 is not defined' is effectively 100% their problem. I say that I need C++17 (in file DESCRIPTION's field SystemRequirement) but I also hardcode in src/Makevars -- and that created the issue because R-on-Solaris has not C++17 compiler so that var is empty.)

(The CRAN logs are indeed from $CALENDAR-DAILY but they use the last-released file, in out case package 0.0.2 and NOT what may be on GitHub.)

(My repo is indeed behind; I had another project with an issue and parallel discussion so I never pushed that. Done now.)

For a quick glance at how I protected the one C++ entry function from not-having-C++17, see
https://github.com/eddelbuettel/rcppsimdjson/blob/master/src/simdjson_example.cpp#L12-L23

(and uggh just noticed the smell about the using namespace ... and three out of four uses actually qualifying via :: -- guess I convert fully and rename the namespace flattening...)

For a similar glance at run-time / package-load-time reflecting a compile-time find for the R entry level, see
https://github.com/eddelbuettel/rcppsimdjson/blob/master/R/init.R#L2-L7

@eddelbuettel
Copy link
Owner Author

Note that you probably want to handle gracefully the error code UNSUPPORTED_ARCHITECTURE if the servers you are using a too old. That is, getting UNSUPPORTED_ARCHITECTURE is not a sign of failure per se...

So can we build a test predicate? A simple function returning a bool indicating simdjson can run here? Without actually parsing a document? Worst case I could mock it by parsing a throw-away JSON file but that seems silly...

@jkeiser
Copy link

jkeiser commented Feb 24, 2020

@eddelbuettel oh, if you want a runtime test, that's not too hard :) You can do this in main or wherever you want:

if (simdjson::active_implementation->name() == "unsupported") {
  printf("unsupported CPU\n"); 
  abort(); 
}

It's the compile-time test I'm unsure about.

It feels like we should have active_architecture->is_supported() for this so you're not relying on a name, but this will get you where you want to be right now!

EDIT: fixed, thanks to @lemire

@eddelbuettel
Copy link
Owner Author

That is excellent. I add that to the arsenal.

I still think (as an engineering exercise) I need to make the build better at/before compile time. Ie for the idiotic Solaris build, I can ask at 'configure' time if R knows a C++17 compiler. If so, set CXX_STD=C++17 as I do know. Consequence: I will now compile, and create a clean "empty" wrapper function.

Leaves the two stoopid macOS machines with, per an earlier comment, so old a clang version. They compile cleanly, don't produce valid code but thanks to your enhancement to the diagnostics and the 'unsupported arch' I can now flag this. So I can protect the example code, and it should no longer error.

That should do. Traveling later in the week so maybe I'll play with that fix then. Thanks for the feedback!

@eddelbuettel
Copy link
Owner Author

Thanks for the (simulataneous) posting on the compile-time checks. That looks nifty and I should play with that too, time permitting...

@jkeiser
Copy link

jkeiser commented Feb 24, 2020

Well, I realized I was at least partly wrong and deleted my second post :) is_supported() is a runtime thing, I'm not sure how I'd make it a constexpr unless the actual architecture detection method (which you can see in simdjson/src/implementation.cpp) could be a constexpr.

@lemire
Copy link
Collaborator

lemire commented Feb 24, 2020

I think @jkeiser's code won't work due to a typo, try this instead...

if (simdjson::active_implementation->name() == "unsupported") { 
  printf("unsupported CPU\n"); 
  abort(); 
}

eddelbuettel added a commit that referenced this issue Feb 25, 2020
@eddelbuettel
Copy link
Owner Author

I just added that simple in 7c1fb60 -- thanks for the tip!

The "scheme" to test for what R is configured with and only turn on C++17 if we see it promptly failed at the Solaris box it was aimed. Not sure what is happening there but not a biggie either way.

@lemire
Copy link
Collaborator

lemire commented Feb 25, 2020

Looks good to me.

@eddelbuettel
Copy link
Owner Author

eddelbuettel commented Feb 25, 2020

And just checked on the VM in question:

R> library(RcppSimdJson)
This package requires a recent CPU type and supports several  models but not the one 
on this machine. You will likely  experience failure when trying to parse JSON 
documents.
R> 

Protected tests with the same predicate so the package no longer 'fails' checks, it simply runs with reduced^Hno functionality due to lacking hardware support -- which is fine as it is beyond our control.

So closing this ticket now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants