[SYSTEMML-2523] Update SystemML to Support Spark 2.3.0#857
[SYSTEMML-2523] Update SystemML to Support Spark 2.3.0#857niketanpansare wants to merge 3 commits into
Conversation
Spark 2.3 (released on February 28, 2018) has updated the Antlr version from 4.3 to 4.7, which throws a warning every time we invoke SystemML.
|
@niketanpansare just trying out using niketanpansare:update_spark23 |
|
@niketanpansare getting the same error on niketanpansare:update_spark23 Py4JJavaError Traceback (most recent call last) /opt/ibm/spark/python/pyspark/sql/dataframe.py in createOrReplaceTempView(self, name) /opt/ibm/spark/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py in call(self, *args) /opt/ibm/spark/python/pyspark/sql/utils.py in deco(*a, **kw) /opt/ibm/spark/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) Py4JJavaError: An error occurred while calling o121.createOrReplaceTempView. |
|
here is the original discussion from March 2018: Back then, we tried updating ANTLR but had to revert it because it would not run on Spark 2.2 and before. So we decided to rather accept the warning and run on all versions. |
|
Thanks @mboehm7 ... I forgot about the discussion. Since Spark 2.4 is already released and Spark 2.3 is almost a year old, we can consider option 1: we directly release for Spark 2.3 and drop 2.2 and 2.1. |
|
@romeokienzler Can you please double-check the jar? It is working for me. Sometimes, the jar compiled using eclipse that uses a different version of Antlr can also cause these issues. Please try either of the following commands again:
|
|
@niketanpansare I love you!!! You've made my day bro! It's working fine. I've missed the So I confirm the JAR is working fine for me and you've saved me and the coursera community by making SystemML running on Spark 2.3, thanks a lot! IMHO you can merge this PR but, of course, I have nothing to say here. |
|
The test failed on travis with |
|
@j143-bot The main problem with that is maintainability. We either (1) have a separate tests for Spark 2.1 and latest version or (2) re-run the tests before release (and push all the fixes in a separate branch), or (3) distribute spark 2.1 jar but don't test it (really bad option). If we don't reach consensus, I am okay with keeping this PR open for people facing issues with SystemML + Spark 2.3. |
|
+1 |
|
@romeokienzler You are getting the error because the setup contains two SystemML (possibly conflicting dependencies) jars. There are two possible solutions to your problem:
Since there is something weird happening here, I am including the logs. I apologize it advance for the long trace. Please ignore the below logs if you agree to the above statements. Setup 1. With only incubating jar (FAILS !!) Setup 2: Put the older incubating jar before the current SystemML 1.2.0 jars (FAILS !!) Setup 3: Put the the current SystemML 1.2.0 jars before the older incubating jar (FAILS !!) Setup 4: Put the jar from the PR before the older incubating jar (SUCCEEDS !!) Setup 5: No jar provided (SUCCEEDS !!) Setup 6: Provide just |
|
Interestingly, running a similar code with Based on the above experiments, here are my thoughts:
|
|
SystemML is working with latest spark version in 2.x series. Shall we close this? cc @mboehm7 |
|
I can confirm that it works on Spark 2.3 and Spark 2.4 |
|
Since we intend to release SystemDS 2.0 late August, it makes sense to update our spark versions as stated in [1]. Do you, @niketanpansare , want to make this change in this PR, or should we open a new Task[4]/PR for this? it might be easier to start over since there have been multiple changes since the commits in this pr. [1] https://mail-archives.apache.org/mod_mbox/systemds-dev/202007.mbox/%3Cf34cbccf-4c5a-479c-71ba-cff8810436d1%40gmail.com%3E Bonus: this would address open CVEs: https://nvd.nist.gov/vuln/detail/CVE-2018-8024 |
|
Niketan may not respond right now!
Hi Sebastian, you are welcome to open a new PR referencing this PR, to
update spark version. 🙂
Thank you,
Janardhan
…On Wed, 15 Jul, 2020, 01:03 Sebastian Baunsgaard, ***@***.***> wrote:
Since we intend to release SystemDS 2.0 late August, it makes sense to
update our spark versions as stated in [1].
The version to update to could be 2.3,1. but maybe we should aim for the
latest one currently .2.4.6 [2].
Together with this update the Hadoop version could be bumped to 2.10.0 [3].
Do you, @niketanpansare <https://github.com/niketanpansare> , want to
make this change in this PR, or should we open a new Task[4]/PR for this?
it might be easier to start over since there have been multiple changes
since the commits in this pr.
[1]
https://mail-archives.apache.org/mod_mbox/systemds-dev/202007.mbox/%3Cf34cbccf-4c5a-479c-71ba-cff8810436d1%40gmail.com%3E
[2] https://spark.apache.org/downloads.html
[3] https://hadoop.apache.org/releases.html
[4] https://issues.apache.org/jira/browse/SYSTEMDS-2523
Bonus:
this would address open CVEs:
https://nvd.nist.gov/vuln/detail/CVE-2018-8024
https://nvd.nist.gov/vuln/detail/CVE-2018-1334
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#857 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMU4H42X3NPIQ4ILDJ6Y5YLR3SXJNANCNFSM4G7UJHJA>
.
|
|
See #992 for continuation, closing this PR. |
Spark 2.3 (released on February 28, 2018) has updated the Antlr version from 4.3 to 4.7, which throws a warning every time we invoke SystemML.
@bertholdreinwald @romeokienzler @mboehm7 @prithvirajsen @nakul02 @j143 Let's use this PR for discussion and raising potential concerns.