Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7427] [PySpark] Make sharedParams match in Scala, Python #6023

Closed
wants to merge 10 commits into from

Conversation

gweidner
Copy link
Member

@gweidner gweidner commented May 9, 2015

Modified 2 files:
python/pyspark/ml/param/_shared_params_code_gen.py
python/pyspark/ml/param/shared.py

Generated shared.py on Linux using Python 2.6.6 on Redhat Enterprise Linux Server 6.6.
python _shared_params_code_gen.py > shared.py

Only changed maxIter, regParam, rawPredictionCol based on strings from SharedParamsCodeGen.scala. Note warning was displayed when committing shared.py:
warning: LF will be replaced by CRLF in python/pyspark/ml/param/shared.py.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jkbradley
Copy link
Member

ok to test

@gweidner
Copy link
Member Author

gweidner commented May 9, 2015

Thank you - monitoring and available in case any issues.

@jkbradley
Copy link
Member

LGTM pending tests---thanks! codegen gave me the same results

@SparkQA
Copy link

SparkQA commented May 9, 2015

Test build #797 has started for PR 6023 at commit e6a865e.

@SparkQA
Copy link

SparkQA commented May 9, 2015

Test build #797 has finished for PR 6023 at commit e6a865e.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Looking into how to make the Python PEP8 checker ignore that generated file...

@jkbradley
Copy link
Member

OK, found out. Can you please look at the tox.ini file in the root spark directory? Add "shared.py" to the exclude line:

exclude=cloudpickle.py,heapq3.py,shared.py

@gweidner
Copy link
Member Author

gweidner commented May 9, 2015

Definitely - my bad. I just committed shared.py so this Pull Request now shows 3 files changed.

@gweidner
Copy link
Member Author

gweidner commented May 9, 2015

Correction to my last comment - committed tox.ini which now excludes shared.py.

@SparkQA
Copy link

SparkQA commented May 10, 2015

Test build #798 has started for PR 6023 at commit 825e4a9.

@SparkQA
Copy link

SparkQA commented May 10, 2015

Test build #798 has finished for PR 6023 at commit 825e4a9.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gweidner
Copy link
Member Author

In progress updating unit tests to fix failure such as:
Traceback (most recent call last):
File "pyspark/ml/tests.py", line 131, in test_param
self.assertEqual(maxIter.doc, "max number of iterations")

@gweidner
Copy link
Member Author

Committed updated python/pyspark/ml/tests.py to fix two PySpark unit tests failures (lines 131 and 159).

@gweidner
Copy link
Member Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented May 10, 2015

Test build #799 has started for PR 6023 at commit db72e32.

@jkbradley
Copy link
Member

@gweidner Thanks! Btw, only committers can generally give Jenkins instructions via comments, and even those are pretty flaky, so tests sometimes have to be started manually.

@gweidner
Copy link
Member Author

I can't thank you enough Joseph for you patience and support! I will do better next time - my mistake again but I now understand why and how to ./run-tests in my local environment.

[bigdata@bdavm159 python]$ ./run-tests
Running PySpark tests. Output is in python/unit-tests.log.
Testing with Python version:
Python 2.6.6
Run core tests ...
...
...
Running test: pyspark/streaming/tests.py
which: no pypy in (/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/lib/node-v0.10.36-linux-x64/bin:/root/bin:/usr/lib/node-v0.10.36-linux-x64/bin:/usr/lib/node-v0.10.36-linux-x64/bin)
Tests passed.
[bigdata@bdavm159 python]$

@SparkQA
Copy link

SparkQA commented May 10, 2015

Test build #799 has finished for PR 6023 at commit db72e32.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class HasCheckpointInterval(Params):
    • class ALS(JavaEstimator, HasCheckpointInterval, HasMaxIter, HasPredictionCol, HasRegParam, HasSeed):
    • class ALSModel(JavaModel):
    • class ChiSqSelectorModel(JavaVectorTransformer):
    • class ChiSqSelector(object):

@gweidner
Copy link
Member Author

The modified ml/tests.py succeeded but the log shows the following error in last streaming/tests.py:
Run ml tests ...
Running test: pyspark/ml/feature.py ... ok (28s)
Running test: pyspark/ml/classification.py ... ok (14s)
Running test: pyspark/ml/tuning.py ... ok (9s)
Running test: pyspark/ml/tests.py ... ok (7s)
Running test: pyspark/ml/evaluation.py ... ok (11s)
Run streaming tests ...
Running test: pyspark/streaming/util.py ... ok (1s)

Running test: pyspark/streaming/tests.py ... ..................................F.....

FAIL: test_count_by_value_and_window (main.WindowFunctionTests)

Traceback (most recent call last):
File "pyspark/streaming/tests.py", line 418, in test_count_by_value_and_window
self._test_func(input, func, expected)
File "pyspark/streaming/tests.py", line 133, in _test_func
self.assertEqual(expected, result)
AssertionError: Lists differ: [[1], [2], [3], [4], [5], [6], [6], [6], [6], [6]] != [[1], [2], [3], [4], [5], [6], [6], [6]]

First list contains 2 additional elements.
First extra element 8:
[6]

  • [[1], [2], [3], [4], [5], [6], [6], [6], [6], [6]]
    ? ----------
  • [[1], [2], [3], [4], [5], [6], [6], [6]]

Ran 40 tests in 168.147s

Trying to understand how the change could have impacted the streaming test and also see why not happening locally.

@gweidner
Copy link
Member Author

@jkbradley - Could test_count_by_value_and_window() be a "flaky test"?
For example, I found the following recent information:
http://apache-spark-developers-list.1001551.n3.nabble.com/Flaky-streaming-tests-td6237.html
SPARK-7341 - Fix the flaky test: org.apache.spark.streaming.InputStreamsSuite.socket input stream

@gweidner
Copy link
Member Author

Looking at the history of streaming/tests.py noticed "[SPARK-6953] [PySpark] speed up python tests".
I'm checking if adjusting the following values impacts the test result to see if it's timing-related since as far as I can tell sharedParams not involved in streaming:

dstream.countByValueAndWindow(1, .2)
vs
dstream.countByValueAndWindow(5, 1)

@jkbradley
Copy link
Member

@gweidner Thanks for looking into it! There have been flakiness issues we've been trying to fix. For now, I'll just re-run the tests, but I'll check with TD and Davies about this one test.

@SparkQA
Copy link

SparkQA commented May 11, 2015

Test build #800 has started for PR 6023 at commit db72e32.

@jkbradley
Copy link
Member

And btw, happy to help! It's great to have new contributors.

@SparkQA
Copy link

SparkQA commented May 11, 2015

Test build #800 has finished for PR 6023 at commit db72e32.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM merging with master and branch-1.4

asfgit pushed a commit that referenced this pull request May 11, 2015
Modified 2 files:
python/pyspark/ml/param/_shared_params_code_gen.py
python/pyspark/ml/param/shared.py

Generated shared.py on Linux using Python 2.6.6 on Redhat Enterprise Linux Server 6.6.
python _shared_params_code_gen.py > shared.py

Only changed maxIter, regParam, rawPredictionCol based on strings from SharedParamsCodeGen.scala.  Note warning was displayed when committing shared.py:
warning: LF will be replaced by CRLF in python/pyspark/ml/param/shared.py.

Author: Glenn Weidner <gweidner@us.ibm.com>

Closes #6023 from gweidner/br-7427 and squashes the following commits:

db72e32 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
825e4a9 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
e6a865e [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
1eee702 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
1ac10e5 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
cafd104 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
9bea1eb [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
4a35c20 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
9790cbe [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
d9c30f4 [Glenn Weidner] [SPARK-7275] [SQL] [WIP] Make LogicalRelation public

(cherry picked from commit c5aca0c)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@asfgit asfgit closed this in c5aca0c May 11, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Modified 2 files:
python/pyspark/ml/param/_shared_params_code_gen.py
python/pyspark/ml/param/shared.py

Generated shared.py on Linux using Python 2.6.6 on Redhat Enterprise Linux Server 6.6.
python _shared_params_code_gen.py > shared.py

Only changed maxIter, regParam, rawPredictionCol based on strings from SharedParamsCodeGen.scala.  Note warning was displayed when committing shared.py:
warning: LF will be replaced by CRLF in python/pyspark/ml/param/shared.py.

Author: Glenn Weidner <gweidner@us.ibm.com>

Closes apache#6023 from gweidner/br-7427 and squashes the following commits:

db72e32 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
825e4a9 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
e6a865e [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
1eee702 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
1ac10e5 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
cafd104 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
9bea1eb [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
4a35c20 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
9790cbe [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
d9c30f4 [Glenn Weidner] [SPARK-7275] [SQL] [WIP] Make LogicalRelation public
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Modified 2 files:
python/pyspark/ml/param/_shared_params_code_gen.py
python/pyspark/ml/param/shared.py

Generated shared.py on Linux using Python 2.6.6 on Redhat Enterprise Linux Server 6.6.
python _shared_params_code_gen.py > shared.py

Only changed maxIter, regParam, rawPredictionCol based on strings from SharedParamsCodeGen.scala.  Note warning was displayed when committing shared.py:
warning: LF will be replaced by CRLF in python/pyspark/ml/param/shared.py.

Author: Glenn Weidner <gweidner@us.ibm.com>

Closes apache#6023 from gweidner/br-7427 and squashes the following commits:

db72e32 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
825e4a9 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
e6a865e [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
1eee702 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
1ac10e5 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
cafd104 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
9bea1eb [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
4a35c20 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
9790cbe [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
d9c30f4 [Glenn Weidner] [SPARK-7275] [SQL] [WIP] Make LogicalRelation public
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Modified 2 files:
python/pyspark/ml/param/_shared_params_code_gen.py
python/pyspark/ml/param/shared.py

Generated shared.py on Linux using Python 2.6.6 on Redhat Enterprise Linux Server 6.6.
python _shared_params_code_gen.py > shared.py

Only changed maxIter, regParam, rawPredictionCol based on strings from SharedParamsCodeGen.scala.  Note warning was displayed when committing shared.py:
warning: LF will be replaced by CRLF in python/pyspark/ml/param/shared.py.

Author: Glenn Weidner <gweidner@us.ibm.com>

Closes apache#6023 from gweidner/br-7427 and squashes the following commits:

db72e32 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
825e4a9 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
e6a865e [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python
1eee702 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
1ac10e5 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
cafd104 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
9bea1eb [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
4a35c20 [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
9790cbe [Glenn Weidner] Merge remote-tracking branch 'upstream/master'
d9c30f4 [Glenn Weidner] [SPARK-7275] [SQL] [WIP] Make LogicalRelation public
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants