New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py file into smaller files #23056
[SPARK-26034][PYTHON][TESTS] Break large mllib/tests.py file into smaller files #23056
Conversation
Dist by line count:
|
Test build #98897 has finished for PR 23056 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I left both comments but we could do that in a followup.
sys.stderr.write('Please install unittest2 to test with Python 2.6 or earlier') | ||
sys.exit(1) | ||
else: | ||
import unittest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler, actually I remove this because we dropped 2.6 support while we are here. Im pretty sure we can just import unittest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wondered about that but thought it might be better to do in a followup
|
||
if __name__ == "__main__": | ||
from pyspark.mllib.tests.test_linalg import * | ||
if not _have_scipy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defined a have_scipy
in utils.py
but I think we can do the clean up (like pandas_requirement_message
under sqlutil.py
) in a followup all together.
retest this please |
Test build #98898 has finished for PR 23056 at commit
|
retest this please |
Test build #98905 has finished for PR 23056 at commit
|
Test build #98903 has finished for PR 23056 at commit
|
retest this please |
Test build #98906 has finished for PR 23056 at commit
|
@BryanCutler, let me merge this. Let's do the ML one and then clean up both comments throughout ML and MLlib at once. |
Merged to master. |
…ller files ## What changes were proposed in this pull request? This PR breaks down the large mllib/tests.py file that contains all Python MLlib unit tests into several smaller test files to be easier to read and maintain. The tests are broken down as follows: ``` pyspark ├── __init__.py ... ├── mllib │ ├── __init__.py ... │ ├── tests │ │ ├── __init__.py │ │ ├── test_algorithms.py │ │ ├── test_feature.py │ │ ├── test_linalg.py │ │ ├── test_stat.py │ │ ├── test_streaming_algorithms.py │ │ └── test_util.py ... ├── testing ... │ ├── mllibutils.py ... ``` ## How was this patch tested? Ran tests manually by module to ensure test count was the same, and ran `python/run-tests --modules=pyspark-mllib` to verify all passing with Python 2.7 and Python 3.6. Also installed scipy to include optional tests in test_linalg. Closes apache#23056 from BryanCutler/python-test-breakup-mllib-SPARK-26034. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR breaks down the large mllib/tests.py file that contains all Python MLlib unit tests into several smaller test files to be easier to read and maintain.
The tests are broken down as follows:
How was this patch tested?
Ran tests manually by module to ensure test count was the same, and ran
python/run-tests --modules=pyspark-mllib
to verify all passing with Python 2.7 and Python 3.6. Also installed scipy to include optional tests in test_linalg.