replace BaselineClassifier/Regressor by DummyClassifier/Regressor. Issue #618 #644

vickyvishal · 2021-03-15T19:10:17Z

Replacing BaselineClassifier with DummyClassifier from Sklearn.
Relacing BaselineRegressor with DummyRegressor from Sklearn.
Created new operator with make_operator with schemas defined in BaselineClassifier and BaselineRegressor

Contributed by students of SRH University, Heidelberg.
https://github.com/tauseefhashmi
https://github.com/frankcode101
https://github.com/tanmaygaikwad
https://github.com/RajathReddy9
https://github.com/vickyvishal/

vickyvishal · 2021-03-15T19:20:35Z

@shinnar The pre-commit issue still persists. How should I run pre-commit run -a? Is it a git command?

shinnar · 2021-03-15T19:38:10Z

it looks like the previous pre-commit issues where indeed resolved by the recent fix. The remaining pre-commit issues are introduced by this PR.
as per https://github.com/IBM/lale/blob/master/CONTRIBUTING.md:

We have a pre-commit hook setup to help ensure that your code is properly formatted and passes various static checks. We highly recommend that you enable it, or at least run the check before submitting a PR. To do so, install the pre-commit python package (this is done automatically if you pip install lale[dev]). Run pre-commit install in your lale repository to enable pre-commit checking, or pre-commit run --all-files to just run the checks once.

shinnar

This PR is shaping up nicely.
It looks like DummyRegressor and DummyClassifier both have hyperparameter arguments that should be included in their schemas.

https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyRegressor.html
https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html

For the random state parameter, I would suggest copying from LogisticRegression:
https://github.com/IBM/lale/blob/master/lale/lib/sklearn/logistic_regression.py#L240-L254

note that for the constant parameter, the schema should have a "side constraint". If you have trouble with that part, I am happy to help

shinnar · 2021-03-15T19:41:19Z

lale/lib/sklearn/dummy_classifier.py

+_hyperparams_schema = {
+    "allOf": [
+        {
+            "description": "This first object lists all constructor arguments with their types, but omits constraints for conditional hyperparameters.",


please add a link to the underlying sklearn docs here (an example doing this is https://github.com/IBM/lale/blob/master/lale/lib/sklearn/logistic_regression.py#L397)

shinnar · 2021-03-15T19:41:33Z

lale/lib/sklearn/dummy_regressor.py

+_hyperparams_schema = {
+    "allOf": [
+        {
+            "description": "This first object lists all constructor arguments with their types, but omits constraints for conditional hyperparameters.",


please add a link to the underlying sklearn docs here (an example doing this is https://github.com/IBM/lale/blob/master/lale/lib/sklearn/logistic_regression.py#L397)

shinnar · 2021-03-15T19:43:29Z

lale/lib/sklearn/__init__.py

@@ -25,6 +25,7 @@
 * lale.lib.sklearn. `AdaBoostClassifier`_
 * lale.lib.sklearn. `BaggingClassifier`_
 * lale.lib.sklearn. `DecisionTreeClassifier`_
+* lale.lib.sklearn. `DummyRegressor`_


I think this was intended to be DummyClassifier

vickyvishal · 2021-03-16T21:08:39Z

it looks like the previous pre-commit issues where indeed resolved by the recent fix. The remaining pre-commit issues are introduced by this PR.
as per https://github.com/IBM/lale/blob/master/CONTRIBUTING.md:

We have a pre-commit hook setup to help ensure that your code is properly formatted and passes various static checks. We highly recommend that you enable it, or at least run the check before submitting a PR. To do so, install the pre-commit python package (this is done automatically if you pip install lale[dev]). Run pre-commit install in your lale repository to enable pre-commit checking, or pre-commit run --all-files to just run the checks once.

Thanks for the answer. I ran pre-commit run --all-files to check and it is failing for pyright check. On checking .pre-commit-config.yaml it looks like it checks for node environment. Is it necessary to run this check? Because we can't commit and push our changes

shinnar · 2021-03-16T21:59:42Z

pyright needs node to be installed. If that is not feasible, you can commit without running pyright. Note that we can not accept the PR until it passes pyright, but it is fine to optimistically commit, and then, if the static checking fails in github actions, fix the reported problem and push a fix. Note that if you have installed the pre-commit hooks, you can skip a given test for a commit by running:

SKIP=pyright git commit ...

vickyvishal · 2021-03-17T18:54:21Z

I have nodejs installed in my system but it still fails the pyright check. So I skipped the pyright check, and now I am facing an issue in the static check in the CI/CD pipeline. Any suggestion?

shinnar · 2021-03-17T19:14:41Z

apologies, but we still seem to be having some CI issues. I think we figured out a fix. once it is working, I will ping you to rebase onto it. apologies again.

* Score method for individual operators. * Score for pipelines. * update pre-commit checkers to latest versions. Fix some problems they newly flag * Trying to fix the pre-commit. * Reverting back isort version to 5.7.0. * Trying to change github action keys. Co-authored-by: Avi Shinnar <shinnar@us.ibm.com>

…umbers

shinnar

This looks good. One last thing is testing.
please add appropriate lines to
https://github.com/IBM/lale/blob/master/test/test_core_classifiers.py#L131
and
https://github.com/IBM/lale/blob/master/test/test_core_regressors.py#L91

also, please rebase to the latest on master, which will hopefully fix the CI problem.

shinnar · 2021-03-17T19:44:48Z

lale/lib/sklearn/dummy_regressor.py

+                "quantile": {
+                    "description": "The quantile to predict using the “quantile” strategy. A quantile of 0.5 corresponds to the median, while 0.0 to the minimum and 1.0 to the maximum.",
+                    "type": "float",
+                },


this should have a default and range constraints. I would suggest something like:

"quantile": { "description": "The quantile to predict using the “quantile” strategy. A quantile of 0.5 corresponds to the median, while 0.0 to the minimum and 1.0 to the maximum.", "anyOf": [{ "enum": [None] }, { "type": "float", "minimum": 0.0, "maximum": 1.0 }], "default": None, }

@vickyvishal
sorry, my fault. I was not careful when I expanded the schema:
that should be "type": "number", not "type": "float"

shinnar · 2021-03-18T11:13:56Z

lale/lib/sklearn/dummy_classifier.py

+                "constant": {
+                    "description": "The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.",
+                    "side constraint": "",
+                    "type": ["int", "str"],


I think this should be "type": ["integer", "string"]

it looks like the PR is failing because of this type. it should change to:
"constant": {
"description": "The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.",
"anyOf": [
{"type": ["integer", "string"]},
{"enum": [None]},
"default": None
]}

shinnar · 2021-03-18T11:19:43Z

This is looking great. once it passes the tests, we are almost ready to merge.
The last thing needed --- as per https://github.com/IBM/lale/blob/master/CONTRIBUTING.md,
we need everyone involved in this PR to sign a "Developer's Certificate of Origin", available here:
https://github.com/IBM/lale/blob/master/DCO1.1.txt
please send them to hirzel@us.ibm.com

Thanks for your contribution!

codecov · 2021-03-18T11:47:00Z

Codecov Report

Merging #644 (ff9d366) into master (7f53b63) will increase coverage by 0.05%.
The diff coverage is 86.17%.

@@            Coverage Diff             @@
##           master     #644      +/-   ##
==========================================
+ Coverage   81.23%   81.29%   +0.05%     
==========================================
  Files         308      312       +4     
  Lines       15816    15909      +93     
==========================================
+ Hits        12848    12933      +85     
- Misses       2968     2976       +8

Impacted Files	Coverage Δ
lale/lib/lale/baseline_classifier.py	`90.00% <25.00%> (-10.00%)`	⬇️
lale/lib/lale/baseline_regressor.py	`86.95% <25.00%> (-13.05%)`	⬇️
lale/operators.py	`82.98% <76.66%> (-0.05%)`	⬇️
lale/lib/lightgbm/lgbm_classifier.py	`89.18% <100.00%> (+1.31%)`	⬆️
lale/lib/lightgbm/lgbm_regressor.py	`85.71% <100.00%> (+1.84%)`	⬆️
lale/lib/sklearn/__init__.py	`100.00% <100.00%> (ø)`
lale/lib/sklearn/_common_schemas.py	`100.00% <100.00%> (ø)`
lale/lib/sklearn/ada_boost_classifier.py	`89.47% <100.00%> (+0.58%)`	⬆️
lale/lib/sklearn/ada_boost_regressor.py	`90.00% <100.00%> (+0.71%)`	⬆️
lale/lib/sklearn/dummy_classifier.py	`100.00% <100.00%> (ø)`
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f53b63...ff9d366. Read the comment docs.

shinnar · 2021-03-18T12:41:00Z

lale/lib/sklearn/dummy_classifier.py

+                },
+                "constant": {
+                    "description": "The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.",
+                    "side constraint": "",


this line should be removed

Previously it was decided to have a "side constraint" for "constant" schema. Should I remove it? If there is a property to add for "side constraint" I can add it now.

sorry for the confusion. a "side constraint", in our terminology, is something like this:
https://github.com/IBM/lale/blob/master/lale/lib/sklearn/logistic_regression.py#L320-L333
which adds additional constraints that relate more than one hyperparameter.
However, I don't think it is needed for this PR.
the "side constraint" line above should definitely be removed, as it is probably what is currently breaking the tests. I would guess that is due from the space in the key, although I did not investigate that closely.

Thanks for the explanation. Now it's clear. I will remove that key and push again

tanmaygaikwad · 2021-03-18T15:06:08Z

This is looking great. once it passes the tests, we are almost ready to merge.
The last thing needed --- as per https://github.com/IBM/lale/blob/master/CONTRIBUTING.md,
we need everyone involved in this PR to sign a "Developer's Certificate of Origin", available here:
https://github.com/IBM/lale/blob/master/DCO1.1.txt
please send them to hirzel@us.ibm.com

Thanks for your contribution!

@hirzel , I have sent you "Developer's Certificate of Origin" over email. Thanks

shinnar · 2021-03-18T19:48:50Z

@vickyvishal I think that something like #644 (comment) will fix the current build problem

shinnar

I checked out the PR and found the two remaining problems, which I added as comments. With those changes made, the dummy regressor test passes locally, so this should be it.

by the way, you can try running

python -m unittest test.test_core_regressors

to run the test locally

shinnar · 2021-03-18T22:51:03Z

lale/lib/sklearn/dummy_regressor.py

+            "relevantToOptimizer": [],
+            "additionalProperties": False,
+            "required": ["strategy", "quantile"],
+            "property": {


@vickyvishal this should be "properties", not "property".

shinnar · 2021-03-18T22:52:15Z

lale/lib/sklearn/dummy_regressor.py

+                    "enum": ["mean", "median", "quantile", "constant"],
+                    "default": "mean",
+                },
+                "random_state": {


the regressor does not have a random state argument. please delete. instead, please add

"constant": { "description": "The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.", "anyOf": [ {"type": ["integer", "string"]}, {"enum": [None]}, {"default": None}, ], },

shinnar · 2021-03-19T12:54:25Z

Thank you for your contribution!

vickyvishal added 2 commits March 15, 2021 20:09

create operator for dummy classifier and regressor

cd5eb39

Merge branch 'master' of https://github.com/IBM/lale

dadfa73

vickyvishal changed the title ~~Merge branch 'master' of https://github.com/IBM/lale~~ replace BaselineClassifier/Regressor by DummyClassifier/Regressor. Issue #618 Mar 15, 2021

vickyvishal mentioned this pull request Mar 15, 2021

Srh group8 #638

Closed

shinnar reviewed Mar 15, 2021

View reviewed changes

changes

1e5a340

kiran-kate and others added 2 commits March 17, 2021 15:16

loosen some error checks in test aif to account for the vagaries of n…

ce047c8

…umbers

shinnar reviewed Mar 17, 2021

View reviewed changes

hirzel and others added 6 commits March 17, 2021 20:01

Add VarianceThreshold operator from sklearn. (IBM#647)

b8ed10b

create operator for dummy classifier and regressor

0c9e8c3

changes

beca042

test core changes

363bac7

Merge branch 'master' of https://github.com/vickyvishal/lale

9621cc1

quantile schema changes

53e3ba7

shinnar reviewed Mar 18, 2021

View reviewed changes

vickyvishal added 2 commits March 18, 2021 12:15

changes

6681b0a

changes

afe04f1

shinnar reviewed Mar 18, 2021

View reviewed changes

dummy classifier schema changes

956e0c2

vickyvishal added 5 commits March 18, 2021 21:22

dummy classifier's constant schema changes

055107b

indentation changes

aee0f65

indentation changes

40917cd

indentation changes

1e5a2ca

type changes

93d6208

shinnar reviewed Mar 18, 2021

View reviewed changes

vickyvishal added 3 commits March 19, 2021 11:27

changes

914520e

changes

88fdee9

changes

ff9d366

shinnar merged commit 79f164d into IBM:master Mar 19, 2021

shinnar mentioned this pull request Mar 19, 2021

replace BaselineClassifier/Regressor by DummyClassifier/Regressor #618

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace BaselineClassifier/Regressor by DummyClassifier/Regressor. Issue #618 #644

replace BaselineClassifier/Regressor by DummyClassifier/Regressor. Issue #618 #644

vickyvishal commented Mar 15, 2021 •

edited

vickyvishal commented Mar 15, 2021

shinnar commented Mar 15, 2021

shinnar left a comment

shinnar Mar 15, 2021

shinnar Mar 15, 2021

shinnar Mar 15, 2021

vickyvishal commented Mar 16, 2021

shinnar commented Mar 16, 2021 •

edited

vickyvishal commented Mar 17, 2021 •

edited

shinnar commented Mar 17, 2021

shinnar left a comment

shinnar Mar 17, 2021

shinnar Mar 18, 2021 •

edited

shinnar Mar 18, 2021

shinnar Mar 18, 2021 •

edited

shinnar commented Mar 18, 2021

codecov bot commented Mar 18, 2021 •

edited

shinnar Mar 18, 2021

vickyvishal Mar 18, 2021

shinnar Mar 18, 2021 •

edited

vickyvishal Mar 18, 2021

tanmaygaikwad commented Mar 18, 2021

shinnar commented Mar 18, 2021

shinnar left a comment

shinnar Mar 18, 2021

shinnar Mar 18, 2021

shinnar commented Mar 19, 2021

replace BaselineClassifier/Regressor by DummyClassifier/Regressor. Issue #618 #644

replace BaselineClassifier/Regressor by DummyClassifier/Regressor. Issue #618 #644

Conversation

vickyvishal commented Mar 15, 2021 • edited

vickyvishal commented Mar 15, 2021

shinnar commented Mar 15, 2021

shinnar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vickyvishal commented Mar 16, 2021

shinnar commented Mar 16, 2021 • edited

vickyvishal commented Mar 17, 2021 • edited

shinnar commented Mar 17, 2021

shinnar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shinnar Mar 18, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shinnar Mar 18, 2021 • edited

Choose a reason for hiding this comment

shinnar commented Mar 18, 2021

codecov bot commented Mar 18, 2021 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shinnar Mar 18, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tanmaygaikwad commented Mar 18, 2021

shinnar commented Mar 18, 2021

shinnar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shinnar commented Mar 19, 2021

vickyvishal commented Mar 15, 2021 •

edited

shinnar commented Mar 16, 2021 •

edited

vickyvishal commented Mar 17, 2021 •

edited

shinnar Mar 18, 2021 •

edited

shinnar Mar 18, 2021 •

edited

codecov bot commented Mar 18, 2021 •

edited

shinnar Mar 18, 2021 •

edited