Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py file into smaller files #23056

Conversation

BryanCutler
Copy link
Member

What changes were proposed in this pull request?

This PR breaks down the large mllib/tests.py file that contains all Python MLlib unit tests into several smaller test files to be easier to read and maintain.

The tests are broken down as follows:

pyspark
├── __init__.py
...
├── mllib
│   ├── __init__.py
...
│   ├── tests
│   │   ├── __init__.py
│   │   ├── test_algorithms.py
│   │   ├── test_feature.py
│   │   ├── test_linalg.py
│   │   ├── test_stat.py
│   │   ├── test_streaming_algorithms.py
│   │   └── test_util.py
...
├── testing
...
│   ├── mllibutils.py
...

How was this patch tested?

Ran tests manually by module to ensure test count was the same, and ran python/run-tests --modules=pyspark-mllib to verify all passing with Python 2.7 and Python 3.6. Also installed scipy to include optional tests in test_linalg.

@BryanCutler
Copy link
Member Author

Dist by line count:

313 ./test_algorithms.py
201 ./test_feature.py
642 ./test_linalg.py
197 ./test_stat.py
523 ./test_streaming_algorithms.py
115 ./test_util.py

@BryanCutler
Copy link
Member Author

cc @HyukjinKwon @squito

@SparkQA
Copy link

SparkQA commented Nov 16, 2018

Test build #98897 has finished for PR 23056 at commit 2759521.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left both comments but we could do that in a followup.

sys.stderr.write('Please install unittest2 to test with Python 2.6 or earlier')
sys.exit(1)
else:
import unittest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler, actually I remove this because we dropped 2.6 support while we are here. Im pretty sure we can just import unittest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wondered about that but thought it might be better to do in a followup


if __name__ == "__main__":
from pyspark.mllib.tests.test_linalg import *
if not _have_scipy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defined a have_scipy in utils.py but I think we can do the clean up (like pandas_requirement_message under sqlutil.py) in a followup all together.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 16, 2018

Test build #98898 has finished for PR 23056 at commit 2759521.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 16, 2018

Test build #98905 has finished for PR 23056 at commit 2759521.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 16, 2018

Test build #98903 has finished for PR 23056 at commit 2759521.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Nov 16, 2018

Test build #98906 has finished for PR 23056 at commit 2759521.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@BryanCutler, let me merge this. Let's do the ML one and then clean up both comments throughout ML and MLlib at once.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in a2fc48c Nov 16, 2018
@BryanCutler BryanCutler deleted the python-test-breakup-mllib-SPARK-26034 branch November 19, 2018 05:45
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ller files

## What changes were proposed in this pull request?

This PR breaks down the large mllib/tests.py file that contains all Python MLlib unit tests into several smaller test files to be easier to read and maintain.

The tests are broken down as follows:
```
pyspark
├── __init__.py
...
├── mllib
│   ├── __init__.py
...
│   ├── tests
│   │   ├── __init__.py
│   │   ├── test_algorithms.py
│   │   ├── test_feature.py
│   │   ├── test_linalg.py
│   │   ├── test_stat.py
│   │   ├── test_streaming_algorithms.py
│   │   └── test_util.py
...
├── testing
...
│   ├── mllibutils.py
...
```

## How was this patch tested?

Ran tests manually by module to ensure test count was the same, and ran `python/run-tests --modules=pyspark-mllib` to verify all passing with Python 2.7 and Python 3.6. Also installed scipy to include optional tests in test_linalg.

Closes apache#23056 from BryanCutler/python-test-breakup-mllib-SPARK-26034.

Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants