You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to load a CoNLL file into a spark dataframe using CoNLL.readDataset(spark, "file:///path/to/file) on AWS EMR 6.5, there is an error which seems to indicate that EMR versions of Spark are not supported.
An error was encountered:
invalid literal for int() with base 10: '312-amzn-1'
Traceback (most recent call last):
File "/home/hadoop/spark_venv/lib/python3.7/site-packages/sparknlp/training/conll.py", line 142, in readDataset
dataframe = self.getDataFrame(spark, jdf)
File "/home/hadoop/spark_venv/lib/python3.7/site-packages/sparknlp/internal/extended_java_wrapper.py", line 60, in getDataFrame
if self.spark_version() >= 330:
File "/home/hadoop/spark_venv/lib/python3.7/site-packages/sparknlp/internal/extended_java_wrapper.py", line 57, in spark_version
return int("".join(spark_version))
ValueError: invalid literal for int() with base 10: '312-amzn-1'
Expected Behavior
A CoNLL file should load into a dataframe successfully.
Thanks @ethnhll for reporting this. It's the method we use to take the Apache Spark version to decide what to use in CoNLL(). (the Spark 3.3.x uses a different way of constructing DataFrame internally)
For some reason, the Spark version on Amazon (or this specific EMR) is not a simple all-int number like 3.1.2. It has some string inside as well, I will make sure we count for that so it won't fail.
Description
When trying to load a CoNLL file into a spark dataframe using
CoNLL.readDataset(spark, "file:///path/to/file)
on AWS EMR 6.5, there is an error which seems to indicate that EMR versions of Spark are not supported.Expected Behavior
A CoNLL file should load into a dataframe successfully.
Current Behavior
An error is reported
Steps to Reproduce
Context
I am not able to load a training dataset to train an NER model.
Your Environment
Spark NLP version
4.0.2
Java version
Setup and installation (Pypi, Conda, Maven, etc.):
Pypi
Operating System and version:
The text was updated successfully, but these errors were encountered: