Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6953] [PySpark] speed up python tests #5427

Closed
wants to merge 11 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Apr 8, 2015

This PR try to speed up some python tests:

tests.py                        144s -> 103s      -41s
mllib/classification.py   24s -> 17s          -7s
mllib/regression.py       27s -> 15s          -12s
mllib/tree.py                 27s  -> 13s         -14s
mllib/tests.py                64s -> 31s         -33s
streaming/tests.py       185s -> 84s        -101s

Considering python3, the total saving will be 558s (almost 10 minutes) (core, and streaming run three times, mllib runs twice).

During testing, it will show used time for each test file:

Run core tests ...
Running test: pyspark/rdd.py ... ok (22s)
Running test: pyspark/context.py ... ok (16s)
Running test: pyspark/conf.py ... ok (4s)
Running test: pyspark/broadcast.py ... ok (4s)
Running test: pyspark/accumulators.py ... ok (4s)
Running test: pyspark/serializers.py ... ok (6s)
Running test: pyspark/profiler.py ... ok (5s)
Running test: pyspark/shuffle.py ... ok (1s)
Running test: pyspark/tests.py ... ok (103s)   144s

@SparkQA
Copy link

SparkQA commented Apr 8, 2015

Test build #29883 has finished for PR 5427 at commit be23e1d.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 8, 2015

Test build #29893 has finished for PR 5427 at commit 0c49785.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@davies davies changed the title [WIP] [PySpark] speed up python tests [PySpark] speed up python tests Apr 15, 2015
@davies davies changed the title [PySpark] speed up python tests [SPARK-6953] [PySpark] speed up python tests Apr 16, 2015
@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30386 has finished for PR 5427 at commit 945a2b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30385 has finished for PR 5427 at commit fec2da2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnresolvedAttribute(nameParts: Seq[String])
    • trait CaseConversionExpression
    • final class UTF8String extends Ordered[UTF8String] with Serializable
    • case class Exchange(
    • case class SortMergeJoin(
  • This patch does not change any dependencies.

@mengxr
Copy link
Contributor

mengxr commented Apr 16, 2015

@davies Did you measure the speedup on each component?

@davies
Copy link
Contributor Author

davies commented Apr 16, 2015

@mengxr I had update the difference in the description.

@SparkQA
Copy link

SparkQA commented Apr 16, 2015

Test build #30427 has finished for PR 5427 at commit 55bb451.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

Conflicts:
	python/pyspark/mllib/tests.py
	python/pyspark/mllib/tree.py
	python/pyspark/tests.py
@SparkQA
Copy link

SparkQA commented Apr 17, 2015

Test build #30444 has finished for PR 5427 at commit aa39c55.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 17, 2015

Test build #30458 has finished for PR 5427 at commit 2654bfd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 18, 2015

Test build #30516 has finished for PR 5427 at commit 2654bfd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@mengxr
Copy link
Contributor

mengxr commented Apr 21, 2015

MLlib changes look good to me.

rxin added a commit to rxin/spark that referenced this pull request Apr 21, 2015
[SPARK-6953] [PySpark] speed up python tests

Conflicts:
	python/pyspark/streaming/tests.py
rxin added a commit to rxin/spark that referenced this pull request Apr 21, 2015
[SPARK-6953] [PySpark] speed up python tests

Signed-off-by: Reynold Xin <rxin@databricks.com>

Conflicts:
	python/pyspark/streaming/tests.py

(cherry picked from commit 21b15f5)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@rxin
Copy link
Contributor

rxin commented Apr 21, 2015

I brought this up to date at #5605

asfgit pushed a commit that referenced this pull request Apr 22, 2015
This PR try to speed up some python tests:

```
tests.py                       144s -> 103s      -41s
mllib/classification.py         24s -> 17s        -7s
mllib/regression.py             27s -> 15s       -12s
mllib/tree.py                   27s -> 13s       -14s
mllib/tests.py                  64s -> 31s       -33s
streaming/tests.py             185s -> 84s      -101s
```
Considering python3, the total saving will be 558s (almost 10 minutes) (core, and streaming run three times, mllib runs twice).

During testing, it will show used time for each test file:
```
Run core tests ...
Running test: pyspark/rdd.py ... ok (22s)
Running test: pyspark/context.py ... ok (16s)
Running test: pyspark/conf.py ... ok (4s)
Running test: pyspark/broadcast.py ... ok (4s)
Running test: pyspark/accumulators.py ... ok (4s)
Running test: pyspark/serializers.py ... ok (6s)
Running test: pyspark/profiler.py ... ok (5s)
Running test: pyspark/shuffle.py ... ok (1s)
Running test: pyspark/tests.py ... ok (103s)   144s
```

Author: Reynold Xin <rxin@databricks.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #5605 from rxin/python-tests-speed and squashes the following commits:

d08542d [Reynold Xin] Merge pull request #14 from mengxr/SPARK-6953
89321ee [Xiangrui Meng] fix seed in tests
3ad2387 [Reynold Xin] Merge pull request #5427 from davies/python_tests
@asfgit asfgit closed this in 41ef78a Apr 22, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This PR try to speed up some python tests:

```
tests.py                       144s -> 103s      -41s
mllib/classification.py         24s -> 17s        -7s
mllib/regression.py             27s -> 15s       -12s
mllib/tree.py                   27s -> 13s       -14s
mllib/tests.py                  64s -> 31s       -33s
streaming/tests.py             185s -> 84s      -101s
```
Considering python3, the total saving will be 558s (almost 10 minutes) (core, and streaming run three times, mllib runs twice).

During testing, it will show used time for each test file:
```
Run core tests ...
Running test: pyspark/rdd.py ... ok (22s)
Running test: pyspark/context.py ... ok (16s)
Running test: pyspark/conf.py ... ok (4s)
Running test: pyspark/broadcast.py ... ok (4s)
Running test: pyspark/accumulators.py ... ok (4s)
Running test: pyspark/serializers.py ... ok (6s)
Running test: pyspark/profiler.py ... ok (5s)
Running test: pyspark/shuffle.py ... ok (1s)
Running test: pyspark/tests.py ... ok (103s)   144s
```

Author: Reynold Xin <rxin@databricks.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes apache#5605 from rxin/python-tests-speed and squashes the following commits:

d08542d [Reynold Xin] Merge pull request apache#14 from mengxr/SPARK-6953
89321ee [Xiangrui Meng] fix seed in tests
3ad2387 [Reynold Xin] Merge pull request apache#5427 from davies/python_tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants