SPARKNLP 643 detecting spark version in a safer way #13035

maziyarpanahi · 2022-11-04T09:54:27Z

Some environments will have an alphanumeric version of Spark. This PR will make a safer comparison when it comes to whether or not we are in Spark 3.3.0 environment (needed for some operations).

We could use

cleaned_version = re.findall(r'(?:(\d+\.(?:\d+\.)*\d+))', self.sc.version)

or LooseVersion from distutils.version

LooseVersion(self.spark_version()) >= LooseVersion("3.3.0")

Tested on EMR:

>>> from sparknlp.training import CoNLL
>>> trainingData = CoNLL().readDataset(spark, "file:///home/hadoop/en.test.lemma.conllu")
>>>

maziyarpanahi · 2022-11-04T09:55:22Z

@danilojsl I have 2 commits representing both approaches, they both work even in EMR. Please let me know which one you are more comfortable with

danilojsl · 2022-11-04T11:32:10Z

@maziyarpanahi the one with LooseVersion LGFM

maziyarpanahi added 2 commits November 4, 2022 10:09

SPARKNLP-643 Extract only the spark version(numbers)

146c3b9

SPARKNLP-643 Replace regex with LooseVersion from distutils

ec85f76

maziyarpanahi self-assigned this Nov 4, 2022

maziyarpanahi requested a review from danilojsl November 4, 2022 09:54

maziyarpanahi added the bug-fix label Nov 4, 2022

maziyarpanahi linked an issue Nov 4, 2022 that may be closed by this pull request

Reading CoNLL dataset on AWS EMR 6.5 fails #13033

Closed

maziyarpanahi changed the base branch from master to release/423-release-candidate November 4, 2022 11:10

maziyarpanahi merged commit 375339c into release/423-release-candidate Nov 4, 2022

maziyarpanahi mentioned this pull request Nov 4, 2022

Release/423 release candidate #13036

Merged

maziyarpanahi mentioned this pull request Nov 4, 2022

Reading CoNLL dataset on AWS EMR 6.5 fails #13033

Closed

KshitizGIT deleted the SPARKNLP-643-detecting-spark-version-inside-extended-java-wrapper-py-is-not-safe branch March 2, 2023 11:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARKNLP 643 detecting spark version in a safer way #13035

SPARKNLP 643 detecting spark version in a safer way #13035

maziyarpanahi commented Nov 4, 2022 •

edited

Loading

maziyarpanahi commented Nov 4, 2022

danilojsl commented Nov 4, 2022

SPARKNLP 643 detecting spark version in a safer way #13035

SPARKNLP 643 detecting spark version in a safer way #13035

Conversation

maziyarpanahi commented Nov 4, 2022 • edited Loading

maziyarpanahi commented Nov 4, 2022

danilojsl commented Nov 4, 2022

maziyarpanahi commented Nov 4, 2022 •

edited

Loading