Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/dev check #276

Closed
wants to merge 5 commits into from
Closed

Conversation

orhankislal
Copy link
Contributor

No description provided.

njayaram2 and others added 2 commits June 6, 2018 16:42
- The current install check is expensive since it runs various hyper param
permutations for all MADlib modules. This commits moves all of those
tests to dev-check, which can be used by developers for iterating
faster. We have now created watered down install-check for each module,
which just runs one  hyper-param combination for each MADlib function,
and does not do any asserts.
- This commit also includes changes in madpack to add a new madpack
  option for dev-check.

TODO:
- complete trimming install check for all modules.
- update documentation for developer consumption.

Co-authored-by: Arvind Sridhar <asridhar@pivotal.io>
Co-authored-by: Orhan Kislal <okislal@pivotal.io>
@asfgit
Copy link

asfgit commented Jun 8, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/499/

Copy link
Contributor

@iyerr3 iyerr3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The lmf.ic.sql_in is still too big. We should replace the 10k dataset with something considerably smaller.
  • Rename elastic_net_install_check.ic.sql_in to elastic_net.ic.sql_in
  • All graph modules create a graph before running the function. Would it make sense to combine them into a single file with a common graph (similar to GLM)?
  • Similarly, can we combine some of the PMML files (eg. combine all GLM related ones, all tree related ones) and also consolidate the regress module files.
  • I suggest including the kmeans_pp and kmeans_random as they test different function paths.

Ideal goal would be to have a single IC file for each module and have separate dev-checks for the individual elements of the module.

for sqlfile in sorted(glob.glob(sql_files), reverse=True):
algoname = os.path.basename(sqlfile).split('.')[0]
# run only algo specified
if (module in modset and modset[module] and
algoname not in modset[module]):
continue
# Do not run test/*.ic.sql_in files for dev-check.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe if you use *[!ic].sql_in in the dev-check glob expression, it will exclude the ic files. You won't need these lines if that works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great, will try it out.

@njayaram2
Copy link
Contributor

Thank you for the comments @iyerr3 , will make the changes you have requested.

Having one IC file for each module makes sense, but on Greenplum, for some modules the IC run time is still quite high. For example, if a module had 5 IC files, where each ran for 10 seconds, the user would see IC progressing every 10 seconds. But if those were combined into one IC file, the progression would happen after 50 seconds. It seemed a little odd (longer IC run times for some modules in terms of user experience) when we were trying it out, so decided to keep multiple IC files for such modules.

Address review comments.
Remove pmml install-check (keep the dev-check).

Co-authored-by: Arvind Sridhar <asridhar@pivotal.io>
@asfgit
Copy link

asfgit commented Jun 18, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/517/

Signed-off-by: Orhan Kislal <okislal@pivotal.io>
@asfgit
Copy link

asfgit commented Jun 18, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/518/

Signed-off-by: Arvind Sridhar <asridhar@pivotal.io>
@asfgit
Copy link

asfgit commented Jun 19, 2018

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/520/

Copy link
Contributor

@iyerr3 iyerr3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needs the merge conflict resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants