added support for PowerShell #114

StrikerRUS · 2019-10-23T22:34:16Z

With this PR Windows users will be able to execute ML models from "command line" without the need to install any programming language (PowerShell is already installed in Windows).

README.md

coveralls · 2019-10-23T23:24:47Z

Coverage increased (+0.1%) to 95.946% when pulling 056bd1c on StrikerRUS:powershell into d7cd068 on BayesWitnesses:master.

izeigerman · 2019-10-24T13:22:11Z

Wow, @StrikerRUS, thanks for this PR 👍

krinart

Wow, this is great! Thanks a lot! 👍

StrikerRUS · 2019-10-25T23:56:43Z

.travis.yml

+env:
+  matrix:
+    - LANG=c
+    - LANG=go
+    - LANG=java
+    - LANG=javascript
+    - LANG=powershell
+    - LANG=python


Refer to https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts and the failed latest commit due to exceeding 50 minutes.

I guess with the growths of number of supported languages this change was inevitable. With the growth of number of supported algorithms there will be need to parallelize across them as well (or at least across tasks: classification/regression). However, it seems that parallelization across languages is enough for now.

Wow, this looks really cool!

One thing that bothers me is that we will run exactly same set of unit tests for all of these runs. While it might seem insignificant, I would still prefer for every run to be isolated.

Would it make sense to add another row here specifically to run unit tests? However we should keep in mind that only unit tests report code coverage (this is by design), and so only this single run would report it. I'm not sure about travis specifics and how it would affect the calculation of the result code coverage.

For some reason powershell takes much more time to run: <10 minutes against 40:

I'm not sure about travis specifics and how it would affect the calculation of the result code coverage.

Not sure, but I think that only the last one for PR/commit is taken into account.

Ah! Even with breakage in terms of languages one job hitted 50 minutes!

Hm, what do you think about moving to Azure Pipelines? I can create a minimal config here or in a separate PR. But you will have to sign up there.

Not sure, but I think that only the last one for PR/commit is taken into account.

What I mean is that even for one PR/commit there are many builds, each potentially reporting different coverage. So if we separate unit and e2e tests, only subset of our builds will report coverage, and somehow they will be aggregated into single metric.

what do you think about moving to Azure Pipelines?

I don't have a strong argument against, however I think it's worth checking what exactly makes powershell tests runnning so long as sometimes we need an ability to run e2e tests locally.

From pure technical point it could be one of the following:

installing dependencies;

code generation;

execution.

Or some combination of those, of course :)

So if we separate unit and e2e tests, only subset of our builds will report coverage, and somehow they will be aggregated into single metric.

You can click here and take a look. Seems that a coverage tool deals with this situation fine:

I don't have a strong argument against …

Fine! Then I'd like to prepare a separate PR for this today. BTW, with new time limits we will be free to keep the present configuration without the need to break tests.

... however I think it's worth checking what exactly makes powershell tests running so long as sometimes we need an ability to run e2e tests locally.

Installing PowerShell Core takes 13 seconds, so to be honest, I simply don't know, sorry 🙁 . I only can confirm that the same situation is in my local Windows environment with native PowerShell (not Core): awful timings.

Then I'd like to prepare a separate PR for this today.

Can you please register at Azure Pipelines and push an empty azure-pipelines.yml file in master, so that it will be possible to see a result for my incoming PR? Otherwise, without that file in master builds will not be triggered.

@krinart Please see #116.

StrikerRUS · 2019-11-24T01:51:39Z

Hi! Any plans for this PR?

Can the following be treated as a workaround?

diff --git a/.travis.yml b/.travis.yml
index a3d4633..aec110d 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -13,8 +13,7 @@ env:
     - LANG=go
     - LANG=java
     - LANG=javascript
-    - LANG=powershell TASK=regression
-    - LANG=powershell TASK=classification
+    - LANG=powershell
     - LANG=python
     - LANG=visual_basic

@@ -52,4 +51,4 @@ script:
   - python setup.py install
   - rm -rfd m2cgen/
   - pytest -v tests/e2e/test_cli.py
-  - pytest -v tests/e2e/test_e2e.py -m $LANG -k "train_model_$TASK"
+  - pytest -v tests/e2e/test_e2e.py -m $LANG
diff --git a/tests/utils.py b/tests/utils.py
index 5e73679..37c805f 100644
--- a/tests/utils.py
+++ b/tests/utils.py
@@ -145,6 +145,12 @@ def cartesian_e2e_params(executors_with_marks, models_with_trainers_with_marks,
         # to be clean.
         model = clone(model)

+        # PowerShell is extremely slow, so decrease test_fraction for it
+        if executor_mark.name == "powershell":
+            trainer_name = trainer.__name__
+            trainer = functools.partial(trainer, test_fraction=0.02)
+            trainer.__name__ = trainer_name
+
         # We use custom id since pytest for some reason can't show name of
         # the model in the automatic id. Which sucks.
         ids.append("{} - {} - {}".format(

Travis results for the patch above: https://travis-ci.org/StrikerRUS/m2cgen/builds/616146310 (all failures due to coverage token error).

izeigerman · 2019-11-27T16:06:48Z

@StrikerRUS sorry, I didn't follow this PR much. @krinart can you please comment on that?

StrikerRUS · 2019-11-27T17:05:58Z

@izeigerman FYI, here are two problems:

PowerShell is VERY slow (~40 min vs ~10 min in comparison with other languages);
Travis has 50 min limit for 1 job (therefore, breakdown of tests in terms of languages/models/both is inevitable at some point in the future).

StrikerRUS · 2019-11-28T00:46:11Z

m2cgen/interpreters/powershell/code_generator.py

+    operator_map = OrderedDict([("==", "-eq"), ("!=", "-ne"), (">=", "-ge"),
+                                ("<=", "-le"), (">", "-gt"), ("<", "-lt")])


Is it possible to have comparison operator templates as well to avoid such ugly hack like this one? For example, in different languages equality can be represented by ==, ===, -eq, inequality - by !=, <>, -ne, etc.

Added in the latest commit possibility to overwrite default comparison operators via overriding _comp_op_overwrite() helper method of CodeGenerator.

Is it better now?

@izeigerman @krinart

StrikerRUS · 2019-12-13T00:48:50Z

Resolved conflicts with master by old approach (separating into regression and classification). Still waiting for feedback on this way to speed up test.

izeigerman · 2019-12-13T16:43:20Z

.travis.yml

@@ -10,6 +10,8 @@ env:
  - TEST=API
  - TEST=E2E LANG="c or python or java or go or javascript"
  - TEST=E2E LANG="c_sharp or visual_basic"
+  - TEST=E2E LANG=powershell TASK=regression


Should this be MODEL_TYPE instead of TASK? "Task" seems a bit vague.

izeigerman · 2019-12-13T16:53:43Z

Hey @StrikerRUS, with regard to the suggested workaround. I'm not sure I like the suggested solution that much.
Shall we perhaps consider making Powershell a special case in E2E and configure it manually.
In place of the the <empty> placeholder here: https://github.com/BayesWitnesses/m2cgen/blob/master/tests/e2e/test_e2e.py#L221 you can start listing your custom tests like this:

  pytest.param(	
        linear_regressor,	
        executors. PowershellExecutor,	
        utils.train_model_regression,	
        marks=[POWERSHELL, REGRESSION],	
    ),
    ...

This way you can generate a custom list of tests tailored specifically for Powershell. This will also allow us to tune tests for it better and perhaps give up some especially demanding tests altogether.
What do you think?

StrikerRUS · 2019-12-14T00:34:53Z

@izeigerman Thanks for your feedback!

We do not have any "special cases" for PowerShell. All tests can be run successfully and are equally slow. Given that all tests are run row-by-row, I strongly believe that running all supported ML models on a smaller dataset is much more reliable unit testing than running a subset of supported models but on a bigger dataset.

izeigerman · 2019-12-16T16:35:12Z

@StrikerRUS In this case I'd suggest to reduce the size of the test dataset for all tests and make them faster this way. Initially we didn't give much thought to the size of the test dataset since it didn't have much impact so we just didn't care as long as it was big enough. Perhaps it's a good time to revisit this value.

but on a bigger dataset.

Not sure I completely agree here. Eg. in case of tree/ensemble models larger dataset ensures a higher likelihood of hitting more branches within each individual estimator.

StrikerRUS · 2019-12-18T01:16:25Z

Yeah, there is always a tradeoff between time and quality/coverage!

Perhaps it's a good time to revisit this value.

I think, in perfect, it should be easily adjustable value test_fraction which we can tune for different models/languages.

I'm happy to decrease test_fraction globally to push this PR forward, but I think that 0.02 (10 / 3 / 11 samples) is quite small global value. But it deserves a separate PR I guess, because adding new language and adjusting test dataset size are quite different things.

Not sure I completely agree here. Eg. in case of tree/ensemble models larger dataset ensures a higher likelihood of hitting more branches within each individual estimator.

You are absolutely right! Also, larger datasets increases chances to cover more edge cases. However, I still think that ensuring principal compatibility between a ML model and a language is more important than covering all code branches for a particular model.

izeigerman · 2019-12-23T00:49:01Z

Ok, I think it's time to ship it. Thank you @StrikerRUS !

added support for Powershell

fceb8fe

StrikerRUS commented Oct 23, 2019

View reviewed changes

README.md Outdated Show resolved Hide resolved

simplified powershell call command

c458e3b

krinart approved these changes Oct 25, 2019

View reviewed changes

avoid job time limit at Travis

42c400e

StrikerRUS commented Oct 25, 2019

View reviewed changes

StrikerRUS mentioned this pull request Oct 28, 2019

Migrate from Travis to Azure #116

Closed

StrikerRUS added 4 commits October 28, 2019 21:27

fit in 50 min for sure

8775171

fixed conflicts

2a30f6a

fixed conflicts

9e02b2c

fixed pep8

3b8600f

StrikerRUS commented Nov 28, 2019

View reviewed changes

fixed conflicts

cd809df

StrikerRUS mentioned this pull request Dec 1, 2019

Bump version to 0.5.1 #124

Merged

StrikerRUS added 3 commits December 2, 2019 01:22

Merge branch 'master' into powershell

85731a0

Merge branch 'master' into powershell

cd2199f

added functionality to overwrite comparison operators

14495ad

StrikerRUS mentioned this pull request Dec 11, 2019

control Travis workflow by env variables and do not store commands in them #126

Merged

fixed conflicts

88a9852

izeigerman reviewed Dec 13, 2019

View reviewed changes

StrikerRUS and others added 4 commits December 18, 2019 04:24

new workaround for slow testing

f6b448b

Merge branch 'master' into powershell

b20cd1c

Merge branch 'master' into powershell

b91a1b2

Merge branch 'master' into powershell

056bd1c

izeigerman closed this Dec 19, 2019

izeigerman reopened this Dec 19, 2019

izeigerman merged commit 51ffc0f into BayesWitnesses:master Dec 23, 2019

StrikerRUS deleted the powershell branch December 23, 2019 12:32

StrikerRUS mentioned this pull request Apr 1, 2020

Fix #168. Enforce float32 type for split condition values for GBT models created using XGBoost #188

Merged

StrikerRUS mentioned this pull request Jul 8, 2020

Travis 50min limit... again #267

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added support for PowerShell #114

added support for PowerShell #114

StrikerRUS commented Oct 23, 2019

coveralls commented Oct 23, 2019 •

edited

izeigerman commented Oct 24, 2019 •

edited

krinart left a comment

StrikerRUS Oct 25, 2019 •

edited

krinart Oct 26, 2019

krinart Oct 26, 2019

StrikerRUS Oct 26, 2019

krinart Oct 26, 2019

StrikerRUS Oct 26, 2019 •

edited

StrikerRUS Oct 26, 2019 •

edited

StrikerRUS Oct 28, 2019

StrikerRUS commented Nov 24, 2019

izeigerman commented Nov 27, 2019

StrikerRUS commented Nov 27, 2019 •

edited

StrikerRUS Nov 28, 2019

StrikerRUS Dec 9, 2019

StrikerRUS commented Dec 13, 2019

izeigerman Dec 13, 2019

izeigerman commented Dec 13, 2019

StrikerRUS commented Dec 14, 2019 •

edited

izeigerman commented Dec 16, 2019 •

edited

StrikerRUS commented Dec 18, 2019 •

edited

izeigerman commented Dec 23, 2019

		operator_map = OrderedDict([("==", "-eq"), ("!=", "-ne"), (">=", "-ge"),
		("<=", "-le"), (">", "-gt"), ("<", "-lt")])

added support for PowerShell #114

added support for PowerShell #114

Conversation

StrikerRUS commented Oct 23, 2019

coveralls commented Oct 23, 2019 • edited

izeigerman commented Oct 24, 2019 • edited

krinart left a comment

Choose a reason for hiding this comment

StrikerRUS Oct 25, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS Oct 26, 2019 • edited

Choose a reason for hiding this comment

StrikerRUS Oct 26, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS commented Nov 24, 2019

izeigerman commented Nov 27, 2019

StrikerRUS commented Nov 27, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS commented Dec 13, 2019

Choose a reason for hiding this comment

izeigerman commented Dec 13, 2019

StrikerRUS commented Dec 14, 2019 • edited

izeigerman commented Dec 16, 2019 • edited

StrikerRUS commented Dec 18, 2019 • edited

izeigerman commented Dec 23, 2019

coveralls commented Oct 23, 2019 •

edited

izeigerman commented Oct 24, 2019 •

edited

StrikerRUS Oct 25, 2019 •

edited

StrikerRUS Oct 26, 2019 •

edited

StrikerRUS Oct 26, 2019 •

edited

StrikerRUS commented Nov 27, 2019 •

edited

StrikerRUS commented Dec 14, 2019 •

edited

izeigerman commented Dec 16, 2019 •

edited

StrikerRUS commented Dec 18, 2019 •

edited