New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/dev check #276
Feature/dev check #276
Conversation
- The current install check is expensive since it runs various hyper param permutations for all MADlib modules. This commits moves all of those tests to dev-check, which can be used by developers for iterating faster. We have now created watered down install-check for each module, which just runs one hyper-param combination for each MADlib function, and does not do any asserts. - This commit also includes changes in madpack to add a new madpack option for dev-check. TODO: - complete trimming install check for all modules. - update documentation for developer consumption. Co-authored-by: Arvind Sridhar <asridhar@pivotal.io>
Co-authored-by: Orhan Kislal <okislal@pivotal.io>
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The
lmf.ic.sql_in
is still too big. We should replace the 10k dataset with something considerably smaller. - Rename
elastic_net_install_check.ic.sql_in
toelastic_net.ic.sql_in
- All graph modules create a graph before running the function. Would it make sense to combine them into a single file with a common graph (similar to GLM)?
- Similarly, can we combine some of the PMML files (eg. combine all GLM related ones, all tree related ones) and also consolidate the regress module files.
- I suggest including the
kmeans_pp
andkmeans_random
as they test different function paths.
Ideal goal would be to have a single IC file for each module and have separate dev-checks for the individual elements of the module.
src/madpack/madpack.py
Outdated
for sqlfile in sorted(glob.glob(sql_files), reverse=True): | ||
algoname = os.path.basename(sqlfile).split('.')[0] | ||
# run only algo specified | ||
if (module in modset and modset[module] and | ||
algoname not in modset[module]): | ||
continue | ||
# Do not run test/*.ic.sql_in files for dev-check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe if you use *[!ic].sql_in
in the dev-check glob expression, it will exclude the ic files. You won't need these lines if that works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great, will try it out.
Thank you for the comments @iyerr3 , will make the changes you have requested. Having one IC file for each module makes sense, but on Greenplum, for some modules the IC run time is still quite high. For example, if a module had 5 IC files, where each ran for 10 seconds, the user would see IC progressing every 10 seconds. But if those were combined into one IC file, the progression would happen after 50 seconds. It seemed a little odd (longer IC run times for some modules in terms of user experience) when we were trying it out, so decided to keep multiple IC files for such modules. |
Address review comments. Remove pmml install-check (keep the dev-check). Co-authored-by: Arvind Sridhar <asridhar@pivotal.io>
Refer to this link for build results (access rights to CI server needed): |
Signed-off-by: Orhan Kislal <okislal@pivotal.io>
Refer to this link for build results (access rights to CI server needed): |
Signed-off-by: Arvind Sridhar <asridhar@pivotal.io>
Refer to this link for build results (access rights to CI server needed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just needs the merge conflict resolved.
No description provided.