Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get 'JavaPackage' object is not callable error when instantiating annotators #575

Closed
nj-dsg opened this issue Jul 23, 2019 · 6 comments
Closed
Assignees
Labels

Comments

@nj-dsg
Copy link

nj-dsg commented Jul 23, 2019

I get a 'JavaPackage' object is not callable when I try to instantiate a pyspark NLP-related class like: DocumentAssembler() or Finisher().
Instantiating other classes like Pipeline() works fine.
Also, I do not get the above error, when running in a jupyter notebook. the error is raised when running in other python consoles. I installed pyspark on my windows 10, as in https://medium.com/big-data-engineering/how-to-install-apache-spark-2-x-in-your-pc-e2047246ffc3
and I also completed the tutorial in https://changhsinlee.com/install-pyspark-windows-jupyter/

here is my code:

spark = SparkSession.builder.appName("myapp").getOrCreate()
sqlContext = SQLContext(spark)
DocumentAssembler()   # raises 'JavaPackage' object is not callable

I read that there's a spark-nlp.jar file that's important in this process, but I don't have it anywhere on my machine.

Any ideas?
tnx in advance

Description

Expected Behavior

instantiate a class

Current Behavior

Error: 'JavaPackage' object is not callable

Possible Solution

Steps to Reproduce

  1. I do not get the above error, when running in a jupyter notebook. the error is raised when running in other python consoles.
    I installed pyspark on my windows 10, as in https://medium.com/big-data-engineering/how-to-install-apache-spark-2-x-in-your-pc-e2047246ffc3
    and I also completed the tutorial in https://changhsinlee.com/install-pyspark-windows-jupyter/

then I installed pip install pyspark, spark-nlp
here is my code:

spark = SparkSession.builder.appName("myapp").getOrCreate()
sqlContext = SQLContext(spark)
DocumentAssembler()   # raises 'JavaPackage' object is not callable

Context

Your Environment

  • Version used: spark-2.4.3-bin-hadoop2.7
    python 3
    java 1.8
  • Browser Name and version:
  • Operating System and version (desktop or mobile): win 10
  • Link to your project:
@maziyarpanahi
Copy link
Member

Hi, how did you install or use Spark NLP?

@nj-dsg
Copy link
Author

nj-dsg commented Jul 23, 2019

1/ I followed this tutorial https://medium.com/big-data-engineering/how-to-install-apache-spark-2-x-in-your-pc-e2047246ffc3
2/ pip install pyspark, spark-nlp
3/ opened a python console and ran the above code

spark = SparkSession.builder.appName("myapp").getOrCreate()
sqlContext = SQLContext(spark)
DocumentAssembler() # raises 'JavaPackage' object is not callable

@maziyarpanahi
Copy link
Member

maziyarpanahi commented Jul 23, 2019

If you want to use Python, I suggest the following:

  1. Make sure you are using Python 3.6 or above, Python 2.x is deprecated and we won't be supporting it. (if it works in Py2 it's good, but if it doesn't we won't be fixing it)
  2. As you mentioned, installing PySpark and Spark NLP
pip install pyspark==2.4.3
pip install spark-nlp==2.1.0
  1. Run python and inside your python shell:
import sparknlp 

spark = sparknlp.start()

print("Spark NLP version")
sparknlp.version()
print("Apache Spark version")
spark.version

This should start SparkSession with Spark NLP included. Then you can run the next just for testing purpose as an example:

from sparknlp.pretrained import PretrainedPipeline
pipeline = PretrainedPipeline('recognize_entities_dl', 'en')
result = pipeline.annotate('Harry Potter is a great movie')
print(result['entities']) 

If you still receive any Java-related error then it's about how you installing Apache Spark and Java 8 on Windows. You should ask/search in Apache Spark communities.

@nj-dsg
Copy link
Author

nj-dsg commented Jul 24, 2019

tnx maziyar.
1/ I did all that, and then the spark = sparknlp.start() line fails, when run in PyCharm, with this error:

ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "C:\Users\xxx\conda\envs\x\lib\site-packages\py4j\java_gateway.py", line 1152, in send_command
answer = smart_decode(self.stream.readline()[:-1])
File "C:\Users\xxx\conda\envs\x\lib\socket.py", line 589, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\xxx\conda\envs\x\lib\site-packages\py4j\java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "C:\Users\xxx\conda\envs\x\lib\site-packages\py4j\java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
...
spark = sparknlp.start()
File "C:\Users\xxx\conda\envs\x\lib\site-packages\sparknlp_init_.py", line 49, in start
return builder.getOrCreate()
File "C:\Users\xxx.conda\envs\x\lib\site-packages\pyspark\sql\session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\Users\xxx.conda\envs\x\lib\site-packages\pyspark\context.py", line 367, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:\Users\xxx.conda\envs\x\lib\site-packages\pyspark\context.py", line 136, in init
conf, jsc, profiler_cls)
File "C:\Users\xxx.conda\envs\x\lib\site-packages\pyspark\context.py", line 198, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "C:\Users\xxx.conda\envs\x\lib\site-packages\pyspark\context.py", line 306, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "C:\Users\xxx.conda\envs\x\lib\site-packages\py4j\java_gateway.py", line 1525, in call
answer, self._gateway_client, None, self._fqn)
File "C:\Users\xxx.conda\envs\x\lib\site-packages\py4j\protocol.py", line 336, in get_return_value
format(target_id, ".", name))
py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext
The system cannot find the path specified.

2/ I can still replace the sparknlp.start() line with SparkSession.builder.getOrCreate(), and run that. But then I get the above JAVA error when instantiating an annotator class.

3/ I can also run sparknlp.start() successfully in a python console that isn't PyCharm. But I would very much like to continue developing on Pycharm.

4/ When running your code in an isolated python console, it fails on the PretrainedPipeline('recognize_entities_dl', 'en') command, and raises an UnsupportedOperationException, saying that this operation isn't supported on Windows. I'm not concerned about this, because I am not interested in downloading a pretrained model anyway.

Anything else I could try, before I "should ask/search in Apache Spark communities"?

tnx in advance

@maziyarpanahi
Copy link
Member

maziyarpanahi commented Jul 24, 2019

If this works in Python, Jupyter, and PySpark, I think you should follow this with PyCharm community and check your settings for Conda or Java on Windows.

I’m afraid this is not Spark NLP related.

@adornes
Copy link

adornes commented May 11, 2020

@nj-dsg I had this same issue and what solved my problem (and I didn't see anyone mention it here) was parametrizing the package for PySpark and spark-submit in the command line:
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.5
and
spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.5
In summary, Spark's default session will be initiated with this package already loaded. That's my understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants