New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added support for PowerShell #114
Conversation
Wow, @StrikerRUS, thanks for this PR 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is great! Thanks a lot! 👍
.travis.yml
Outdated
env: | ||
matrix: | ||
- LANG=c | ||
- LANG=go | ||
- LANG=java | ||
- LANG=javascript | ||
- LANG=powershell | ||
- LANG=python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts and the failed latest commit due to exceeding 50 minutes.
I guess with the growths of number of supported languages this change was inevitable. With the growth of number of supported algorithms there will be need to parallelize across them as well (or at least across tasks: classification
/regression
). However, it seems that parallelization across languages is enough for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this looks really cool!
One thing that bothers me is that we will run exactly same set of unit tests for all of these runs. While it might seem insignificant, I would still prefer for every run to be isolated.
Would it make sense to add another row here specifically to run unit tests? However we should keep in mind that only unit tests report code coverage (this is by design), and so only this single run would report it. I'm not sure about travis specifics and how it would affect the calculation of the result code coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about travis specifics and how it would affect the calculation of the result code coverage.
Not sure, but I think that only the last one for PR/commit is taken into account.
Ah! Even with breakage in terms of languages one job hitted 50 minutes!
Hm, what do you think about moving to Azure Pipelines? I can create a minimal config here or in a separate PR. But you will have to sign up there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but I think that only the last one for PR/commit is taken into account.
What I mean is that even for one PR/commit there are many builds, each potentially reporting different coverage. So if we separate unit and e2e tests, only subset of our builds will report coverage, and somehow they will be aggregated into single metric.
what do you think about moving to Azure Pipelines?
I don't have a strong argument against, however I think it's worth checking what exactly makes powershell tests runnning so long as sometimes we need an ability to run e2e tests locally.
From pure technical point it could be one of the following:
- installing dependencies;
- code generation;
- execution.
Or some combination of those, of course :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we separate unit and e2e tests, only subset of our builds will report coverage, and somehow they will be aggregated into single metric.
You can click here and take a look. Seems that a coverage tool deals with this situation fine:
I don't have a strong argument against …
Fine! Then I'd like to prepare a separate PR for this today. BTW, with new time limits we will be free to keep the present configuration without the need to break tests.
... however I think it's worth checking what exactly makes powershell tests running so long as sometimes we need an ability to run e2e tests locally.
Installing PowerShell Core
takes 13 seconds, so to be honest, I simply don't know, sorry 🙁 . I only can confirm that the same situation is in my local Windows environment with native PowerShell (not Core): awful timings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I'd like to prepare a separate PR for this today.
Can you please register at Azure Pipelines and push an empty azure-pipelines.yml
file in master
, so that it will be possible to see a result for my incoming PR? Otherwise, without that file in master
builds will not be triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Any plans for this PR? Can the following be treated as a workaround? diff --git a/.travis.yml b/.travis.yml
index a3d4633..aec110d 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -13,8 +13,7 @@ env:
- LANG=go
- LANG=java
- LANG=javascript
- - LANG=powershell TASK=regression
- - LANG=powershell TASK=classification
+ - LANG=powershell
- LANG=python
- LANG=visual_basic
@@ -52,4 +51,4 @@ script:
- python setup.py install
- rm -rfd m2cgen/
- pytest -v tests/e2e/test_cli.py
- - pytest -v tests/e2e/test_e2e.py -m $LANG -k "train_model_$TASK"
+ - pytest -v tests/e2e/test_e2e.py -m $LANG
diff --git a/tests/utils.py b/tests/utils.py
index 5e73679..37c805f 100644
--- a/tests/utils.py
+++ b/tests/utils.py
@@ -145,6 +145,12 @@ def cartesian_e2e_params(executors_with_marks, models_with_trainers_with_marks,
# to be clean.
model = clone(model)
+ # PowerShell is extremely slow, so decrease test_fraction for it
+ if executor_mark.name == "powershell":
+ trainer_name = trainer.__name__
+ trainer = functools.partial(trainer, test_fraction=0.02)
+ trainer.__name__ = trainer_name
+
# We use custom id since pytest for some reason can't show name of
# the model in the automatic id. Which sucks.
ids.append("{} - {} - {}".format( Travis results for the patch above: https://travis-ci.org/StrikerRUS/m2cgen/builds/616146310 (all failures due to coverage token error). |
@StrikerRUS sorry, I didn't follow this PR much. @krinart can you please comment on that? |
@izeigerman FYI, here are two problems:
|
operator_map = OrderedDict([("==", "-eq"), ("!=", "-ne"), (">=", "-ge"), | ||
("<=", "-le"), (">", "-gt"), ("<", "-lt")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have comparison operator templates as well to avoid such ugly hack like this one? For example, in different languages equality can be represented by ==
, ===
, -eq
, inequality - by !=
, <>
, -ne
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in the latest commit possibility to overwrite default comparison operators via overriding _comp_op_overwrite()
helper method of CodeGenerator
.
Is it better now?
Resolved conflicts with |
.travis.yml
Outdated
@@ -10,6 +10,8 @@ env: | |||
- TEST=API | |||
- TEST=E2E LANG="c or python or java or go or javascript" | |||
- TEST=E2E LANG="c_sharp or visual_basic" | |||
- TEST=E2E LANG=powershell TASK=regression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be MODEL_TYPE
instead of TASK
? "Task" seems a bit vague.
Hey @StrikerRUS, with regard to the suggested workaround. I'm not sure I like the suggested solution that much.
This way you can generate a custom list of tests tailored specifically for Powershell. This will also allow us to tune tests for it better and perhaps give up some especially demanding tests altogether. |
@izeigerman Thanks for your feedback! We do not have any "special cases" for PowerShell. All tests can be run successfully and are equally slow. Given that all tests are run row-by-row, I strongly believe that running all supported ML models on a smaller dataset is much more reliable unit testing than running a subset of supported models but on a bigger dataset. |
@StrikerRUS In this case I'd suggest to reduce the size of the test dataset for all tests and make them faster this way. Initially we didn't give much thought to the size of the test dataset since it didn't have much impact so we just didn't care as long as it was big enough. Perhaps it's a good time to revisit this value.
Not sure I completely agree here. Eg. in case of tree/ensemble models larger dataset ensures a higher likelihood of hitting more branches within each individual estimator. |
Yeah, there is always a tradeoff between time and quality/coverage!
I think, in perfect, it should be easily adjustable value I'm happy to decrease
You are absolutely right! Also, larger datasets increases chances to cover more edge cases. However, I still think that ensuring principal compatibility between a ML model and a language is more important than covering all code branches for a particular model. |
Ok, I think it's time to ship it. Thank you @StrikerRUS ! |
With this PR Windows users will be able to execute ML models from "command line" without the need to install any programming language (PowerShell is already installed in Windows).