Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYSTEMML-1451][Phase 1] Automate performance suite and report performance numbers #537

Conversation

krishnakalyan3
Copy link
Member

@krishnakalyan3 krishnakalyan3 commented Jun 9, 2017

Please refer to https://issues.apache.org/jira/browse/SYSTEMML-1451 for more details.

Phase 1:

  • Generate Data
  • Test all algorithms with singlenode
  • Test all algorithms spark-hybrid execution mode
  • Capture Time Taken
  • Generate a full set of plain text reports
  • Test automatic benchmark end to end

Error Handling and Reporting:

  • Test End to End in standalone
  • Test End to End in spark_hybrid
  • If data already exists do not generate the data again
  • Fix time to be taken from the std.out
  • Execution function to return failure incase job fails
  • If data not present do not execute train or predict. (Minor)
  • Remove unused imports
  • Log Stdout and Std error
  • Proper Reporting of Metrics (Minor)
  • Add current time to log
  • User Guide (This will be a separate google doc file)

Current status of family

  • Clustering
  • Binomial
  • Multinomial
  • Regression1
  • Regression2
  • Stats1
  • Stats2

To test this script please navigate to the gist below

https://gist.github.com/krishnakalyan3/26f5578b7b342bd4e14d986a9889a42e

Local Machine Configuration

Operating System: OSX 10.2
Ram: 16GB DDR 3 @ 1600 MHz
Speed: 2.5 GHz
Processor: Intel Core i5

Standalone Configuration

JVM Memory Settings: -Xmx8g -Xms4g -Xmn1g

Spark Configuration

Number of Executors: 2
Memory Size (Driver): 5g
Memory of Executor: 2g
Executor Cores: 1
Spark Master Threads: 4

Performance Test Conducted on the following configs with all families (This includes all algorithms).

data size: 10k_100
execution mode: standalone
matrix_type: dense, sparse
Log: https://gist.github.com/krishnakalyan3/a07d404d7192261691584123fd69140a
(More than 5 hours)

data size: 10k_100
execution mode: hybrid_spark
matrix_type: dense, sparse
Log: https://gist.github.com/krishnakalyan3/213bcca9792addee62e8a4177fbba996
(Less than one hour)

@akchinSTC
Copy link
Contributor

Can an Admin verify this patch?

@deroneriksson
Copy link
Member

Test this Jenkins

@deroneriksson
Copy link
Member

add to whitelist.

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1558/

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1566/

Copy link
Member

@nakul02 nakul02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like (most) of your variable names. Most of them are self explanatory.

Nonetheless, please document all your functions and parameters (wherever it makes sense). When doing so myself, I have sometimes found a need to redesign the interface, usually resulting in a cleaner API. This may or may not help you the way it helped me, but it will definitely help the next person read through your code.


mat_shapes = split_rowcol(matrix_shape)

if job[0] == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of 0 & 1, use either an enum (i know they were added in 3.4 and may need some discussion) or a named constant.
Using "magic numbers" is bad idea.

has_predict = ['GLM', 'Kmeans', 'l2-svm', 'm-svm', 'naive-bayes']


def naive_bayes_datagen(matrix_type, mat_shapes, conf_dir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be missing something obvious here (since I am not very familiar with Python), but it seems like this function naive_bayes_datagen has been defined twice (wit the same signature)



def get_algo(family, ml_algo):
algo = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to add documentation to all the functions in this file. So that someone who wants to add perf tests in the future knows what to do.

'regression': ['LinearRegDS', 'LinearRegCG', 'GLM'],
'stats': ['Univar-Stats', 'bivar-stats', 'stratstats']}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of using the function name main. Could you call it something else? maybe something like perf_test_entry or something more appropriate.

from utils import split_rowcol, config_writer
import sys
import logging

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little blurb about the contents/purpose of this file would be great.

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1572/

@krishnakalyan3
Copy link
Member Author

@nakul02 thank for the review. I will incorporate theses changes to the best of my understanding.

@krishnakalyan3
Copy link
Member Author

krishnakalyan3 commented Jun 12, 2017

How to test the script.

Run the line below to see the help message
./scripts/perftest/python/run_perftest.py --help see the help message.

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1574/

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1579/

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1607/

@krishnakalyan3
Copy link
Member Author

krishnakalyan3 commented Jun 17, 2017

@niketanpansare could you please review this PR and share you feedback.

Some commands

Help
./scripts/perftest/python/run_perftest.py --help

This command will run Kmeans on following modes

  • datagen
  • train
  • predict

and captures the metrics in the log file

To run Kmeans using defaults
./scripts/perftest/python/run_perftest.py --algo Kmeans

Another Variation.
This will run Kmeans algorithm on various matrix dimensions defined in the --mat-shape argument
on all modes.

./scripts/perftest/python/run_perftest.py --algo Kmeans --mat-shape 10k_1k 20k_10 30k_50

If we just want to generate data for Kmeans

./scripts/perftest/python/run_perftest.py --family clustering --mat-shape 10k_1k 20k_10 30k_50 --mode  data-gen

PS: Please run all these scripts from root. ($SYSTEML_HOME). Right now this performance test suits supports single node execution with statistics and clustering. Please let me know if you have questions.

Thanks

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1611/

@niketanpansare
Copy link
Contributor

niketanpansare commented Jun 17, 2017

Awesome work. LGTM for tasks completed until now :)

@krishnakalyan3
Copy link
Member Author

@niketanpansare thank you for the review. :)

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1612/

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1613/

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1639/

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1640/

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1669/

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1672/

@krishnakalyan3 krishnakalyan3 changed the title [SYSTEMML-1451][WIP][Phase 1] Automate performance suite and report performance numbers [SYSTEMML-1451][Phase 1] Automate performance suite and report performance numbers Jun 29, 2017
@krishnakalyan3
Copy link
Member Author

ping @nakul02, could you please review this PR.

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1682/

@nakul02
Copy link
Member

nakul02 commented Jun 29, 2017

This is great work @krishnakalyan3!

For the purpose of Phase 1 of GSoC, this code is merge-able as it is.

But, either in this PR or as part of your next phase, I'd like you to document the overall design and assumptions in the main entry file. This could include things like example runs, what a family is, how to add new algorithm and anything else you think appropriate.
You should also explain why there is json file being generated for each job being run (as in which library you are using so that this needs to happen). what was the design choice for this library?

Towards the end, we'd also like to add a User Guide with lots of examples.

In this PR thread, or in your comment where you have the list of tasks completed, could you please indicate which families, algorithms, data sizes, shapes that you have tested for. Also what is the machine that you tested on (its configuration), any spark settings (how many executors, memory sizes of driver, executor), any single node settings (JVM memory), etc.

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1688/

@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1698/

@nakul02
Copy link
Member

nakul02 commented Jul 2, 2017

@krishnakalyan3 - do you have anything to add?
Or should I merge this PR?

@krishnakalyan3
Copy link
Member Author

@nakul02 please merge.

Thanks

@asfgit asfgit closed this in e7cfcad Jul 2, 2017
@akchinSTC
Copy link
Contributor

Refer to this link for build results (access rights to CI server needed):
https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/1754/

j143-zz pushed a commit to j143-zz/systemml that referenced this pull request Nov 4, 2017
- Single entry point to run perf tests in any combination of algoriths,
  families, matrix shapes & densities
- Reports time taken by a single perf test by parsing the output and
  grep-ing for the time
- Detects tests that did not run and reports in the generated log
- Robust error handling and reporting, informative help message

Closes apache#537
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants