Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C [libtensorflow_framework.so.1+0x744da9] _GLOBAL__sub_I_loader.cc+0x99 #2256

Closed
leeivan opened this issue Feb 7, 2021 · 9 comments
Closed
Assignees
Labels

Comments

@leeivan
Copy link

leeivan commented Feb 7, 2021

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x00007eff1b96eda9, pid=12627, tid=0x00007effe75f1700
#
# JRE version: OpenJDK Runtime Environment (8.0_282-b08) (build 1.8.0_282-b08)
# Java VM: OpenJDK 64-Bit Server VM (25.282-b08 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libtensorflow_framework.so.1+0x744da9]  _GLOBAL__sub_I_loader.cc+0x99
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%207&component=java-1.8.0-openjdk
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
---------------  T H R E A D  ---------------

Current thread (0x00007effd400a000):  JavaThread "Thread-5" [_thread_in_native, id=12674, stack(0x00007effe74f1000,0x00007effe75f2000)]

siginfo: si_signo: 4 (SIGILL), si_code: 2 (ILL_ILLOPN), si_addr: 0x00007eff1b96eda9

Registers:
RAX=0x00007effd9b67500, RBX=0x00007effe75ed550, RCX=0x0000000000000000, RDX=0x0000000000000001
RSP=0x00007effe75ed500, RBP=0x00007effe75ed610, RSI=0x00007effe75ed440, RDI=0x00007effe75ed530
R8 =0x00007effd9b67580, R9 =0x00007effd9b67390, R10=0x0000000000000002, R11=0x000000000000001f
R12=0x00007ffcb5fbe1b8, R13=0x00007eff1cb86018, R14=0x0000000000000001, R15=0x00007eff1cb84510
RIP=0x00007eff1b96eda9, EFLAGS=0x0000000000010202, CSGSFS=0x0000000000000033, ERR=0x0000000000000000
  TRAPNO=0x0000000000000006

Top of Stack: (sp=0x00007effe75ed500)
0x00007effe75ed500:   00007eff1d3038d8 0000000000000007
0x00007effe75ed510:   00007eff1cc3a1a8 00007eff25f02ff0

v  ~StubRoutines::call_stub
j  com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.readGraph(Ljava/lang/String;)Lorg/tensorflow/Graph;+12
j  com.johnsnowlabs.ml.tensorflow.TensorflowWrapper$.read(Ljava/lang/String;ZZ[Ljava/lang/String;Z)Lcom/johnsnowlabs/ml/tensorflow/Tensorflo$
j  com.johnsnowlabs.ml.tensorflow.ReadTensorflowModel$class.readTensorflowModel(Lcom/johnsnowlabs/ml/tensorflow/ReadTensorflowModel;Ljava/la$
j  com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.readTensorflowModel(Ljava/lang/String;Lorg/apache/spark/sql/SparkSession;Ljava/lang/Strin$
j  com.johnsnowlabs.nlp.embeddings.ReadBertTensorflowModel$class.readTensorflow(Lcom/johnsnowlabs/nlp/embeddings/ReadBertTensorflowModel;Lco$
j  com.johnsnowlabs.nlp.embeddings.BertEmbeddings$.readTensorflow(Lcom/johnsnowlabs/nlp/embeddings/BertEmbeddings;Ljava/lang/String;Lorg/apa$
j  com.johnsnowlabs.nlp.embeddings.ReadBertTensorflowModel$$anonfun$4.apply(Lcom/johnsnowlabs/nlp/embeddings/BertEmbeddings;Ljava/lang/Strin$
j  com.johnsnowlabs.nlp.embeddings.ReadBertTensorflowModel$$anonfun$4.apply(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lan$
j  com.johnsnowlabs.nlp.ParamsAndFeaturesReadable$$anonfun$com$johnsnowlabs$nlp$ParamsAndFeaturesReadable$$onRead$1.apply(Lscala/Function3;)$
j  com.johnsnowlabs.nlp.ParamsAndFeaturesReadable$$anonfun$com$johnsnowlabs$nlp$ParamsAndFeaturesReadable$$onRead$1.apply(Ljava/lang/Object;$
J 4635 C2 scala.collection.mutable.ArrayBuffer.foreach(Lscala/Function1;)V (6 bytes) @ 0x00007f00856531a8 [0x00007f0085653120+0x88]
j  com.johnsnowlabs.nlp.ParamsAndFeaturesReadable$class.com$johnsnowlabs$nlp$ParamsAndFeaturesReadable$$onRead(Lcom/johnsnowlabs/nlp/ParamsA$
j  com.johnsnowlabs.nlp.ParamsAndFeaturesReadable$$anonfun$read$1.apply(Lcom/johnsnowlabs/nlp/HasFeatures;Ljava/lang/String;Lorg/apache/spar$
j  com.johnsnowlabs.nlp.ParamsAndFeaturesReadable$$anonfun$read$1.apply(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Ob$
j  com.johnsnowlabs.nlp.FeaturesReader.load(Ljava/lang/String;)Lcom/johnsnowlabs/nlp/HasFeatures;+40
j  com.johnsnowlabs.nlp.FeaturesReader.load(Ljava/lang/String;)Ljava/lang/Object;+2
j  com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(Lorg/apache/spark/ml/util/DefaultParamsReadable;Lcom/johnsnowlabs/nlp/p$
j  com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(Lorg/apache/spark/ml/util/DefaultParamsReadable;Ljava/lang/String;Lscal$
j  com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/$
j  com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;Ljava/lang/S$
v  ~StubRoutines::call_stub
J 9825  sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (0 b$
J 4145 C1 sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (104 bytes) @ 0x00007f0085bfc$
J 5089 C2 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (62 bytes) @ 0x00007f0085cd2cec [0x00007f$
j  py4j.reflection.MethodInvoker.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+91
j  py4j.reflection.ReflectionEngine.invoke(Ljava/lang/Object;Lpy4j/reflection/MethodInvoker;[Ljava/lang/Object;)Ljava/lang/Object;+6
j  py4j.Gateway.invoke(Ljava/lang/String;Ljava/lang/String;Ljava/util/List;)Lpy4j/ReturnObject;+151
j  py4j.commands.AbstractCommand.invokeMethod(Ljava/lang/String;Ljava/lang/String;Ljava/util/List;)Lpy4j/ReturnObject;+10
j  py4j.commands.CallCommand.execute(Ljava/lang/String;Ljava/io/BufferedReader;Ljava/io/BufferedWriter;)V+26
j  py4j.GatewayConnection.run()V+126
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub

---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )
  0x00007effd9be6000 JavaThread "process reaper" daemon [_thread_blocked, id=13067, stack(0x00007f000414a000,0x00007f0004183000)]
  0x00007effbc008800 JavaThread "task-result-getter-3" daemon [_thread_blocked, id=12923, stack(0x00007eff26194000,0x00007eff26295000)]
  0x00007effcc009000 JavaThread "task-result-getter-2" daemon [_thread_blocked, id=12922, stack(0x00007eff26295000,0x00007eff26396000)]
  0x00007effcc007800 JavaThread "task-result-getter-1" daemon [_thread_blocked, id=12903, stack(0x00007eff2779b000,0x00007eff2789c000)]
  0x00007eff3801b800 JavaThread "shuffle-server-7-2" daemon [_thread_in_native, id=12902, stack(0x00007eff27498000,0x00007eff27599000)]
  0x00007effd4011000 JavaThread "ForkJoinPool-1-worker-9" daemon [_thread_blocked, id=12899, stack(0x00007eff27599000,0x00007eff2769a000)]
  0x00007effd9a14000 JavaThread "java-sdk-http-connection-reaper" daemon [_thread_blocked, id=12870, stack(0x00007eff340fc000,0x00007eff341f$
  0x00007effd9842000 JavaThread "IPC Parameter Sending Thread #1" daemon [_thread_blocked, id=12869, stack(0x00007eff400ed000,0x00007eff401e$
  0x00007effbc00f800 JavaThread "task-result-getter-0" daemon [_thread_blocked, id=12867, stack(0x00007effe442f000,0x00007effe4530000)]
  0x00007eff3801a800 JavaThread "shuffle-server-7-1" daemon [_thread_in_native, id=12866, stack(0x00007effe4631000,0x00007effe4732000)]
  0x00007effd8954000 JavaThread "org.apache.hadoop.hdfs.PeerCache@3153baf" daemon [_thread_blocked, id=12865, stack(0x00007eff401ee000,0x000$
  0x00007effac015800 JavaThread "rpc-server-4-8" daemon [_thread_in_native, id=12855, stack(0x00007eff409f0000,0x00007eff40af1000)]
  0x00007effac013800 JavaThread "rpc-server-4-7" daemon [_thread_in_native, id=12854, stack(0x00007eff40af1000,0x00007eff40bf2000)]
  0x00007effac011800 JavaThread "rpc-server-4-6" daemon [_thread_in_native, id=12853, stack(0x00007eff40bf2000,0x00007eff40cf3000)]
  0x00007effac010000 JavaThread "rpc-server-4-5" daemon [_thread_in_native, id=12836, stack(0x00007eff40cf3000,0x00007eff40df4000)]
  0x00007effac00e000 JavaThread "rpc-server-4-4" daemon [_thread_in_native, id=12835, stack(0x00007eff41df5000,0x00007eff41ef6000)]
@leeivan
Copy link
Author

leeivan commented Feb 7, 2021

The above is hs_err_pid12627.log, which was produced when executing:

    val bert = BertEmbeddings.pretrained("biobert_clinical_base_cased", "en")
      .setInputCols("sentence", "token")
      .setOutputCol("bert")
      .setCaseSensitive(false)

This error happened whatever in scala and python.

@leeivan
Copy link
Author

leeivan commented Feb 7, 2021

I had two spark standalone cluster env:
The one is ubuntu20.04, using one VM, this is right,
The other is centos 7, using five VMs, this is wrong.

I guess this problem is at spark cluster environment or OS.

@maziyarpanahi
Copy link
Member

Our TensorFlow version doesn't support Centos 7.

@leeivan
Copy link
Author

leeivan commented Feb 7, 2021

Thanks, but I want to keep trying. Can you give me some suggestion ? @maziyarpanahi

@maziyarpanahi
Copy link
Member

Of course, these might help:

@maziyarpanahi
Copy link
Member

Hi @albertoandreottiATgmail

Is this error related to the libstdc or some other dependencies/packages on the machine? If all happen in CentOS it is fine. (I've seen similar issues reported on TensorFlow GitHub, but not exactly the same since this is via Java/C++)

@albertoandreottiATgmail
Copy link
Contributor

Hi guys,

yes, it seems like a native library version mismatch, take a look at a similar error reported here,

https://stackoverflow.com/questions/46836882/access-tensorflow-from-tomcat-on-centos-linux/49568814

Probably changing the version of the system libraries will help.

Alberto.

@leeivan
Copy link
Author

leeivan commented Feb 12, 2021

@maziyarpanahi , I found the reason about this question, it is that the regency CUP of VM is not support AVX instruction set, AVX is advanced function to new-brand intel cpu. I recompiled tensorflow jar at the native VM with the below guide:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/java

@maziyarpanahi
Copy link
Member

This is very interesting! Didn't know not being compatible with AVX will result in that error. It's hard to find a CPU that doesn't support AVX if they were produced after 2011, but there must be some CPUs or virtualized environments that don't support it.

Many thanks for the update @leeivan, it will be very helpful to the future users 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants