Skip to content

Using Sparknlp with ray (https://docs.ray.io/en/latest/data/raydp.html) fails with classcastexception #7003

@SrilekhaIG

Description

@SrilekhaIG

Trying to use sparknlp with Ray framework
Tried to start the sparksession with Ray:
spark = raydp.init_spark(app_name='Sparknlp',
num_executors=2,
executor_cores=2,
executor_memory='4GB',configs={"spark.driver.memory":"16G",
"spark.driver.maxResultSize": "0",
"spark.kryoserializer.buffer.max": "2000M",
"spark.jars.packages": "com.johnsnowlabs.nlp:spark-nlp-spark32_2.12:3.4.1"} )

However after running the sparknlp pipeline I have an issue with displaying the spark dataframe.
I am following one of the basic examples:
https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/NER_EN.ipynb

The exception I get at result.show() is :

Py4JJavaError: An error occurred while calling o294.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4) (10.59.192.207 executor 1): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.objects.LambdaVariable.accessor of type scala.Function2 in instance of org.apache.spark.sql.catalyst.expressions.objects.LambdaVariable

Is there any configurations that I am missing? The error is for any of the complex datatypes in spark dataframe

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions