Problem with spark-nlp #995

m-developer96 · 2020-08-05T14:36:09Z

Hi!
I'm using this example to create my own sentiment classifier but when I want to execute the below code, I got an error.

use = BertEmbeddings.load('/home/mahdi/workTable/dataset/bert/') \
                    .setInputCols(["document"])\
                    .setOutputCol("sentence_embeddings")\
                    .setPoolingLayer(-2)

I tested it with UniversalSentenceEncoder but got the same error.

The error:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x00007fac59e78da9, pid=1736, tid=0x00007fad517fb700
#
# JRE version: OpenJDK Runtime Environment (8.0_252-b09) (build 1.8.0_252-8u252-b09-1~18.04-b09)
# Java VM: OpenJDK 64-Bit Server VM (25.252-b09 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libtensorflow_framework.so.1+0x744da9]  _GLOBAL__sub_I_loader.cc+0x99
#
# Core dump written. Default location: /home/mahdi/workTable/core or core.1736

I used standalone cluster mode with one master and 3 slaves with 4G memory and 4 core for each one at first. Then I used one master and one slave with 10G memory and 6 core for each one. But still got the same error.

My spark initialization:

findspark.init()
conf=SparkConf()
conf.set("spark.driver.memory", "19g")
conf.set("spark.cores.max", "16")
conf.set("spark.executor.memory", "9700m")
conf.set("spark.executor.cores", "8")
conf.set("spark.executor.instances", "8")
conf.set("spark.rpc.message.maxSize","1024")
conf.set("spark.driver.extraJavaOptions","-Djava.io.tmpdir=/home/mahdi/workTable/temp/")
conf.set("spark.executor.extraJavaOptions","-Djava.io.tmpdir=/home/mahdi/workTable/temp/")


spark = SparkSession.builder.master("spark://172.18.16.74:7077").appName("Sentiment Analysis").config(conf=conf)\
                            .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.4")\
                            .getOrCreate()
sc = spark.sparkContext
sqlContext = SQLContext(sc)

print("Spark version : " ,spark.version)
print("Spark-NLP version : " ,sparknlp.version())
# Spark version :  2.4.5
# Spark-NLP version :  2.5.4

How can I fix it?

Thanks for your help :)

The text was updated successfully, but these errors were encountered:

maziyarpanahi · 2020-08-05T14:38:25Z

What is /home/mahdi/workTable/dataset/bert/ ?

maziyarpanahi · 2020-08-05T14:38:59Z

Also, please complete the template we provide, we need that information in order to reproduce and help.

m-developer96 · 2020-08-05T16:05:14Z

@maziyarpanahi
That's Bert (bert_base_cased) model that downloaded from here.
It contains:

-bert:
--bert_tensorflow
--fields
--metadate

Unfortunately, I couldn't complete the template and I stuck here.

m-developer96 · 2020-08-05T16:07:51Z

@maziyarpanahi
I even reduced my dataset down to 1500 sentences but got that error again.

maziyarpanahi · 2020-08-05T16:10:37Z

Thanks. What is your Operating System (with distribution and version)?

This seems to be an issue with spark-submit, could you please provide the exact command you are running for your execution?

In the meantime please add these two configs to your Spark session:

spark.kryoserializer.buffer.max 1000M
spark.serializer org.apache.spark.serializer.KryoSerializer

m-developer96 · 2020-08-05T16:18:35Z

@maziyarpanahi
Thanks!
I use Ubuntu 18.04 and jupyter notebook for running this example.
I just added those two configs but still got the same error.

maziyarpanahi · 2020-08-05T16:22:53Z

Got it. So the cluster is up and running and you just run that code inside of a Jupyter? Is there a way to share that coredump with us? (it seems there might be something installed already that has a conflict with the C++ in libtensorflow_framework.so.1)

m-developer96 · 2020-08-05T16:25:57Z

@maziyarpanahi
Yes.
I'm sorry but I'm new in ubuntu. How can I get coredump?

maziyarpanahi · 2020-08-05T16:26:51Z

It says it was written here: /home/mahdi/workTable/core

m-developer96 · 2020-08-05T16:36:15Z

@maziyarpanahi
Yes, thanks!
I read it with file command (I don't know if it's correct or not) and got this result:

core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/lib/jvm/java-8-openjdk-amd64//bin/java -cp /home/mahdi/workTable/spark/con', real uid: 1000, effective uid: 1000, real gid: 1000, effective gid: 1000, execfn: '/usr/lib/jvm/java-8-openjdk-amd64//bin/java', platform: 'x86_64'

maziyarpanahi · 2020-08-05T16:40:14Z

OK, no worries.
A question about your standalone cluster, are all the nodes in dedicated machines? Is spark-nlp PyPI package installed on all of them? And do they all have the same Operating System? And do you have tensorflow installed in the same Python path configured for PySpark? (we don't need it and often this has the conflicts)
PS: please do check if you have protobuf installed as well and what is the version if it is.

m-developer96 · 2020-08-05T16:53:31Z

@maziyarpanahi
Yes, all of slave nodes are in dedicated machines,spark-nlp and PyPI installed on all of them, they have the same operating system (Ubuntu 18.04), and I didn't install tensorflow and protobuf on them.

maziyarpanahi · 2020-08-05T16:54:48Z

Thank you, but just in case other PyPI packages can come with either one of them, could you please check all the PyPI packages in that environment to be sure?

m-developer96 · 2020-08-05T17:01:12Z

@maziyarpanahi
Yes, but I'm sorry I didn't understand exactly. Do you mean that I should check the PyPI in nodes?
If that was, I checked and upgraded PyPI in all of them: pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)

maziyarpanahi · 2020-08-05T17:18:09Z

I meant if you activate the same Python environment, and run pip freeze do you see any tensorflow or protobuf?

m-developer96 · 2020-08-05T17:32:11Z

@maziyarpanahi
Oh sorry.
I checked, none of them were installed.

maziyarpanahi · 2020-08-05T17:33:02Z

Great, thank you very much. We are working on this to reproduce and find a workaround.

m-developer96 · 2020-08-05T17:33:46Z

@maziyarpanahi
Thank you so much.

maziyarpanahi · 2020-08-12T10:28:06Z

@albertoandreottiATgmail It seems Ubuntu 18 has something that conflicts with TensorFlow similar to this issue: tensorflow/tensorflow#24976

@m-developer96 I will try to reproduce this on a fresh Ubuntu 18 updated to the latest today/tomorrow

albertoandreottiATgmail · 2020-08-19T04:30:24Z

Hello @m-developer96 , that signal the process is receiving is supposed to happen when the process runs an instruction that the current architecture cannot handle.
Are all your nodes the same architecture? I have Ubuntu 18, and I use almost exactly the same OpenJDK you use.
Another possibility is that the binaries in your system are somewhat corrupted, can you try doing this,

jose@machine:~/.ivy2$ unzip ./cache/org.tensorflow/libtensorflow_jni/jars/libtensorflow_jni-1.15.0.jar org/tensorflow/native/linux-x86_64/libtensorflow_jni.so

jose@machine:~/.ivy2$ ldd org/tensorflow/native/linux-x86_64/libtensorflow_jni.so
linux-vdso.so.1 (0x00007fff100d8000)
libtensorflow_framework.so.1 => not found
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe14d978000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe14d5da000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe14d3bb000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe14d1b3000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe14ce2a000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe14cc12000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe14c821000)
/lib64/ld-linux-x86-64.so.2 (0x00007fe156ee6000)

and post the paths you get on your system?

rituparna-13 · 2020-10-15T18:02:12Z

I am facing the same issue when trying to train using BertEmbeddings from spark nlp. I am using centOS and spark-nlp version 2.6.2. Is this issue fixed and is there a solution?
@maziyarpanahi @m-developer96 Were you able to solve this problem?

maziyarpanahi · 2020-10-15T18:11:20Z

@phoenix1391 Would you mind creating a new issue with a complete template for us to reproduce this issue? (the more info we have the more chance we can reproduce this, especially the OS and its version). Unfortunately, there is no new update on this issue but I am hoping with your new issue we can reproduce it or say what is not compatible at least.

github-actions · 2022-02-24T00:15:15Z

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days

maziyarpanahi self-assigned this Aug 5, 2020

maziyarpanahi added the Requires more input label Aug 5, 2020

maziyarpanahi assigned albertoandreottiATgmail Aug 6, 2020

maziyarpanahi added bug and removed Requires more input labels Aug 12, 2020

maziyarpanahi added Requires more input and removed bug labels Aug 25, 2020

maziyarpanahi mentioned this issue Feb 7, 2021

C [libtensorflow_framework.so.1+0x744da9] _GLOBAL__sub_I_loader.cc+0x99 #2256

Closed

github-actions bot added the Stale label Feb 24, 2022

github-actions bot closed this as completed Mar 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with spark-nlp #995

Problem with spark-nlp #995

m-developer96 commented Aug 5, 2020 •

edited

maziyarpanahi commented Aug 5, 2020

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020 •

edited

m-developer96 commented Aug 5, 2020 •

edited

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020 •

edited

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020

maziyarpanahi commented Aug 5, 2020 •

edited

m-developer96 commented Aug 5, 2020 •

edited

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020

maziyarpanahi commented Aug 12, 2020

albertoandreottiATgmail commented Aug 19, 2020

rituparna-13 commented Oct 15, 2020 •

edited

maziyarpanahi commented Oct 15, 2020

github-actions bot commented Feb 24, 2022

Problem with spark-nlp #995

Problem with spark-nlp #995

Comments

m-developer96 commented Aug 5, 2020 • edited

maziyarpanahi commented Aug 5, 2020

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020 • edited

m-developer96 commented Aug 5, 2020 • edited

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020 • edited

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020

maziyarpanahi commented Aug 5, 2020 • edited

m-developer96 commented Aug 5, 2020 • edited

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020

maziyarpanahi commented Aug 5, 2020

m-developer96 commented Aug 5, 2020

maziyarpanahi commented Aug 12, 2020

albertoandreottiATgmail commented Aug 19, 2020

rituparna-13 commented Oct 15, 2020 • edited

maziyarpanahi commented Oct 15, 2020

github-actions bot commented Feb 24, 2022

m-developer96 commented Aug 5, 2020 •

edited

m-developer96 commented Aug 5, 2020 •

edited

m-developer96 commented Aug 5, 2020 •

edited

m-developer96 commented Aug 5, 2020 •

edited

maziyarpanahi commented Aug 5, 2020 •

edited

m-developer96 commented Aug 5, 2020 •

edited

rituparna-13 commented Oct 15, 2020 •

edited