[SYSTEMML-1451][Phase 1] Automate performance suite and report performance numbers #537

krishnakalyan3 · 2017-06-09T03:47:52Z

Please refer to https://issues.apache.org/jira/browse/SYSTEMML-1451 for more details.

Phase 1:

Generate Data
Test all algorithms with singlenode
Test all algorithms spark-hybrid execution mode
Capture Time Taken
Generate a full set of plain text reports
Test automatic benchmark end to end

Error Handling and Reporting:

Current status of family

To test this script please navigate to the gist below

https://gist.github.com/krishnakalyan3/26f5578b7b342bd4e14d986a9889a42e

Local Machine Configuration

Operating System: OSX 10.2
Ram: 16GB DDR 3 @ 1600 MHz
Speed: 2.5 GHz
Processor: Intel Core i5

Standalone Configuration

JVM Memory Settings: -Xmx8g -Xms4g -Xmn1g

Spark Configuration

Number of Executors: 2
Memory Size (Driver): 5g
Memory of Executor: 2g
Executor Cores: 1
Spark Master Threads: 4

Performance Test Conducted on the following configs with all families (This includes all algorithms).

data size: 10k_100
execution mode: standalone
matrix_type: dense, sparse
Log: https://gist.github.com/krishnakalyan3/a07d404d7192261691584123fd69140a
(More than 5 hours)

data size: 10k_100
execution mode: hybrid_spark
matrix_type: dense, sparse
Log: https://gist.github.com/krishnakalyan3/213bcca9792addee62e8a4177fbba996
(Less than one hour)

akchinSTC · 2017-06-09T03:47:54Z

Can an Admin verify this patch?

deroneriksson · 2017-06-09T19:59:21Z

Test this Jenkins

deroneriksson · 2017-06-09T19:59:31Z

add to whitelist.

akchinSTC · 2017-06-10T01:09:45Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1558/

akchinSTC · 2017-06-11T07:44:03Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1566/

nakul02

I like (most) of your variable names. Most of them are self explanatory.

Nonetheless, please document all your functions and parameters (wherever it makes sense). When doing so myself, I have sometimes found a need to redesign the interface, usually resulting in a cleaner API. This may or may not help you the way it helped me, but it will definitely help the next person read through your code.

nakul02 · 2017-06-12T17:47:55Z

scripts/perftest/python/configuration.py

+
+    mat_shapes = split_rowcol(matrix_shape)
+
+    if job[0] == 1:


Instead of 0 & 1, use either an enum (i know they were added in 3.4 and may need some discussion) or a named constant.
Using "magic numbers" is bad idea.

nakul02 · 2017-06-12T17:49:08Z

scripts/perftest/python/configuration.py

+has_predict = ['GLM', 'Kmeans', 'l2-svm', 'm-svm', 'naive-bayes']
+
+
+def naive_bayes_datagen(matrix_type, mat_shapes, conf_dir):


I could be missing something obvious here (since I am not very familiar with Python), but it seems like this function naive_bayes_datagen has been defined twice (wit the same signature)

nakul02 · 2017-06-12T17:50:10Z

scripts/perftest/python/utils.py

+
+
+def get_algo(family, ml_algo):
+    algo = []


Best to add documentation to all the functions in this file. So that someone who wants to add perf tests in the future knows what to do.

nakul02 · 2017-06-12T17:51:01Z

scripts/perftest/python/run_perftest.py

+           'regression': ['LinearRegDS', 'LinearRegCG', 'GLM'],
+           'stats': ['Univar-Stats', 'bivar-stats', 'stratstats']}
+
+


I am not a fan of using the function name main. Could you call it something else? maybe something like perf_test_entry or something more appropriate.

nakul02 · 2017-06-12T17:52:18Z

scripts/perftest/python/configuration.py

+from utils import split_rowcol, config_writer
+import sys
+import logging
+


A little blurb about the contents/purpose of this file would be great.

akchinSTC · 2017-06-12T18:28:10Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1572/

krishnakalyan3 · 2017-06-12T18:28:40Z

@nakul02 thank for the review. I will incorporate theses changes to the best of my understanding.

krishnakalyan3 · 2017-06-12T22:50:04Z

How to test the script.

Run the line below to see the help message
./scripts/perftest/python/run_perftest.py --help see the help message.

akchinSTC · 2017-06-12T22:52:10Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1574/

akchinSTC · 2017-06-14T04:17:43Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1579/

…m/krishnakalyan3/systemml into SYSTEMML-1451-automatic-perftests

akchinSTC · 2017-06-17T04:08:27Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1607/

krishnakalyan3 · 2017-06-17T07:26:23Z

@niketanpansare could you please review this PR and share you feedback.

Some commands

Help
./scripts/perftest/python/run_perftest.py --help

This command will run Kmeans on following modes

datagen
train
predict

and captures the metrics in the log file

To run Kmeans using defaults
./scripts/perftest/python/run_perftest.py --algo Kmeans

Another Variation.
This will run Kmeans algorithm on various matrix dimensions defined in the --mat-shape argument
on all modes.

./scripts/perftest/python/run_perftest.py --algo Kmeans --mat-shape 10k_1k 20k_10 30k_50

If we just want to generate data for Kmeans

./scripts/perftest/python/run_perftest.py --family clustering --mat-shape 10k_1k 20k_10 30k_50 --mode  data-gen

PS: Please run all these scripts from root. ($SYSTEML_HOME). Right now this performance test suits supports single node execution with statistics and clustering. Please let me know if you have questions.

Thanks

akchinSTC · 2017-06-17T10:42:27Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1611/

niketanpansare · 2017-06-17T16:38:43Z

Awesome work. LGTM for tasks completed until now :)

krishnakalyan3 · 2017-06-17T23:15:26Z

@niketanpansare thank you for the review. :)

akchinSTC · 2017-06-18T01:43:03Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1612/

akchinSTC · 2017-06-18T17:10:42Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1613/

akchinSTC · 2017-06-25T12:27:37Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1639/

akchinSTC · 2017-06-26T19:36:15Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1640/

akchinSTC · 2017-06-28T15:10:26Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1669/

akchinSTC · 2017-06-28T23:59:06Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1672/

krishnakalyan3 · 2017-06-29T07:12:56Z

ping @nakul02, could you please review this PR.

akchinSTC · 2017-06-29T10:37:44Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1682/

nakul02 · 2017-06-29T23:08:20Z

This is great work @krishnakalyan3!

For the purpose of Phase 1 of GSoC, this code is merge-able as it is.

But, either in this PR or as part of your next phase, I'd like you to document the overall design and assumptions in the main entry file. This could include things like example runs, what a family is, how to add new algorithm and anything else you think appropriate.
You should also explain why there is json file being generated for each job being run (as in which library you are using so that this needs to happen). what was the design choice for this library?

Towards the end, we'd also like to add a User Guide with lots of examples.

In this PR thread, or in your comment where you have the list of tasks completed, could you please indicate which families, algorithms, data sizes, shapes that you have tested for. Also what is the machine that you tested on (its configuration), any spark settings (how many executors, memory sizes of driver, executor), any single node settings (JVM memory), etc.

akchinSTC · 2017-06-30T10:33:37Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1688/

akchinSTC · 2017-07-01T00:46:45Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1698/

nakul02 · 2017-07-02T03:57:16Z

@krishnakalyan3 - do you have anything to add?
Or should I merge this PR?

krishnakalyan3 · 2017-07-02T06:31:46Z

@nakul02 please merge.

Thanks

akchinSTC · 2017-07-14T00:17:33Z

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1754/

- Single entry point to run perf tests in any combination of algoriths, families, matrix shapes & densities - Reports time taken by a single perf test by parsing the output and grep-ing for the time - Detects tests that did not run and reports in the generated log - Robust error handling and reporting, informative help message Closes apache#537

WIP perftest script

7dfc5e9

update structure

33c06dc

more robust argparse

8db9959

nakul02 reviewed Jun 12, 2017

View reviewed changes

update comments and complete data gen perf test

bddfda5

make argparse more robust

99d246b

krishnakalyan3 added 3 commits June 16, 2017 18:45

Delete configuration.py

ad785b6

kmeans

df57a3d

Merge branch 'SYSTEMML-1451-automatic-perftests' of https://github.co…

d06e87b

…m/krishnakalyan3/systemml into SYSTEMML-1451-automatic-perftests

krishnakalyan3 added 2 commits June 17, 2017 00:08

test mode

bb1f148

add predict for Kmeans

9b56486

add statistics

a7ae3d2

fix error usage and extract time

faad7f4

krishnakalyan3 added 2 commits June 25, 2017 04:33

test spark backend

093222d

regression 1

bdb7cc8

rename some functions

c83154c

update comments and fix predict

8356eea

krishnakalyan3 added 2 commits June 28, 2017 14:27

regression1

c1e84fe

regression2

180b48c

krishnakalyan3 added 4 commits June 29, 2017 00:51

add comments

e72d73b

comments

33683f5

error handling

6c222d7

error handling if folder not present

e6664d1

krishnakalyan3 changed the title ~~[SYSTEMML-1451][WIP][Phase 1] Automate performance suite and report performance numbers~~ [SYSTEMML-1451][Phase 1] Automate performance suite and report performance numbers Jun 29, 2017

fix error handling when file exists

8689cff

krishnakalyan3 added 2 commits June 30, 2017 03:01

fix error success file with predict

29e310b

remove duplicate comment

19bbbbd

remove todo

e6733be

asfgit closed this in e7cfcad Jul 2, 2017

		has_predict = ['GLM', 'Kmeans', 'l2-svm', 'm-svm', 'naive-bayes']


		def naive_bayes_datagen(matrix_type, mat_shapes, conf_dir):

		'regression': ['LinearRegDS', 'LinearRegCG', 'GLM'],
		'stats': ['Univar-Stats', 'bivar-stats', 'stratstats']}

[SYSTEMML-1451][Phase 1] Automate performance suite and report performance numbers #537

[SYSTEMML-1451][Phase 1] Automate performance suite and report performance numbers #537

Conversation

krishnakalyan3 commented Jun 9, 2017 • edited

akchinSTC commented Jun 9, 2017

deroneriksson commented Jun 9, 2017

deroneriksson commented Jun 9, 2017

akchinSTC commented Jun 10, 2017

akchinSTC commented Jun 11, 2017

nakul02 left a comment

Choose a reason for hiding this comment

nakul02 Jun 12, 2017

Choose a reason for hiding this comment

nakul02 Jun 12, 2017

Choose a reason for hiding this comment

nakul02 Jun 12, 2017

Choose a reason for hiding this comment

nakul02 Jun 12, 2017

Choose a reason for hiding this comment

nakul02 Jun 12, 2017

Choose a reason for hiding this comment

akchinSTC commented Jun 12, 2017

krishnakalyan3 commented Jun 12, 2017

krishnakalyan3 commented Jun 12, 2017 • edited

akchinSTC commented Jun 12, 2017

akchinSTC commented Jun 14, 2017

akchinSTC commented Jun 17, 2017

krishnakalyan3 commented Jun 17, 2017 • edited

akchinSTC commented Jun 17, 2017

niketanpansare commented Jun 17, 2017 • edited

krishnakalyan3 commented Jun 17, 2017

akchinSTC commented Jun 18, 2017

akchinSTC commented Jun 18, 2017

akchinSTC commented Jun 25, 2017

akchinSTC commented Jun 26, 2017

akchinSTC commented Jun 28, 2017

akchinSTC commented Jun 28, 2017

krishnakalyan3 commented Jun 29, 2017

akchinSTC commented Jun 29, 2017

nakul02 commented Jun 29, 2017

akchinSTC commented Jun 30, 2017

akchinSTC commented Jul 1, 2017

nakul02 commented Jul 2, 2017

krishnakalyan3 commented Jul 2, 2017

akchinSTC commented Jul 14, 2017

krishnakalyan3 commented Jun 9, 2017 •

edited

krishnakalyan3 commented Jun 12, 2017 •

edited

krishnakalyan3 commented Jun 17, 2017 •

edited

niketanpansare commented Jun 17, 2017 •

edited