Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency issues in AWS Glue with spark-excel 11.1 and commons-compress 1.18 #128

Closed
jlscott3 opened this issue Jun 26, 2019 · 1 comment

Comments

@jlscott3
Copy link

I'm creating a development endpoint in AWS Glue with the following dependencies:
spark-excel_2.11-0.11.1.jar
poi-4.1.0.jar
poi-ooxml-4.1.0.jar
xmlbeans-3.1.0.jar
spoiwo_2.12-1.4.1.jar
commons-compress-1.18.jar

When I create a notebook against the dev endpoint and attempt to load an xlsx using the following command:
spark.read.format("com.crealytics.spark.excel").options(sheetName="sheet1").options(useHeader="true").load(s3_path)

I get:

u'InputStream of class class org.apache.commons.compress.archivers.zip.ZipArchiveInputStream is not implementing InputStreamStatistics.'
Traceback (most recent call last):
  File "/mnt/yarn/usercache/livy/appcache/application_1561581531952_0001/container_1561581531952_0001_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 159, in load
    return self._df(self._jreader.load(path))
  File "/mnt/yarn/usercache/livy/appcache/application_1561581531952_0001/container_1561581531952_0001_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/mnt/yarn/usercache/livy/appcache/application_1561581531952_0001/container_1561581531952_0001_01_000001/pyspark.zip/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
IllegalArgumentException: u'InputStream of class class org.apache.commons.compress.archivers.zip.ZipArchiveInputStream is not implementing InputStreamStatistics.'

This appears to be similar to #93, but I've tried both 11.1 and 10.2 and have the same issue.

Not sure where to go from here and I don't think I have a way of shading dependencies in Glue. Glue uses Spark 2.2.1, FWIW.

@nightscape
Copy link
Collaborator

This is actually a duplicate of #93 which I can reproduce now and have reopened.
Closing this one here, please follow the other issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants