Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace BaselineClassifier/Regressor by DummyClassifier/Regressor. Issue #618 #644

Merged
merged 22 commits into from Mar 19, 2021

Conversation

vickyvishal
Copy link
Contributor

@vickyvishal vickyvishal commented Mar 15, 2021

Replacing BaselineClassifier with DummyClassifier from Sklearn.
Relacing BaselineRegressor with DummyRegressor from Sklearn.
Created new operator with make_operator with schemas defined in BaselineClassifier and BaselineRegressor

Contributed by students of SRH University, Heidelberg.
https://github.com/tauseefhashmi
https://github.com/frankcode101
https://github.com/tanmaygaikwad
https://github.com/RajathReddy9
https://github.com/vickyvishal/

@vickyvishal vickyvishal changed the title Merge branch 'master' of https://github.com/IBM/lale replace BaselineClassifier/Regressor by DummyClassifier/Regressor. Issue #618 Mar 15, 2021
@vickyvishal
Copy link
Contributor Author

@shinnar The pre-commit issue still persists. How should I run pre-commit run -a? Is it a git command?

@vickyvishal vickyvishal mentioned this pull request Mar 15, 2021
@shinnar
Copy link
Member

shinnar commented Mar 15, 2021

it looks like the previous pre-commit issues where indeed resolved by the recent fix. The remaining pre-commit issues are introduced by this PR.
as per https://github.com/IBM/lale/blob/master/CONTRIBUTING.md:

We have a pre-commit hook setup to help ensure that your code is properly formatted and passes various static checks. We highly recommend that you enable it, or at least run the check before submitting a PR. To do so, install the pre-commit python package (this is done automatically if you pip install lale[dev]). Run pre-commit install in your lale repository to enable pre-commit checking, or pre-commit run --all-files to just run the checks once.

Copy link
Member

@shinnar shinnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is shaping up nicely.
It looks like DummyRegressor and DummyClassifier both have hyperparameter arguments that should be included in their schemas.

https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyRegressor.html
https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html

For the random state parameter, I would suggest copying from LogisticRegression:
https://github.com/IBM/lale/blob/master/lale/lib/sklearn/logistic_regression.py#L240-L254

note that for the constant parameter, the schema should have a "side constraint". If you have trouble with that part, I am happy to help

_hyperparams_schema = {
"allOf": [
{
"description": "This first object lists all constructor arguments with their types, but omits constraints for conditional hyperparameters.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a link to the underlying sklearn docs here (an example doing this is https://github.com/IBM/lale/blob/master/lale/lib/sklearn/logistic_regression.py#L397)

_hyperparams_schema = {
"allOf": [
{
"description": "This first object lists all constructor arguments with their types, but omits constraints for conditional hyperparameters.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a link to the underlying sklearn docs here (an example doing this is https://github.com/IBM/lale/blob/master/lale/lib/sklearn/logistic_regression.py#L397)

@@ -25,6 +25,7 @@
* lale.lib.sklearn. `AdaBoostClassifier`_
* lale.lib.sklearn. `BaggingClassifier`_
* lale.lib.sklearn. `DecisionTreeClassifier`_
* lale.lib.sklearn. `DummyRegressor`_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was intended to be DummyClassifier

@vickyvishal
Copy link
Contributor Author

it looks like the previous pre-commit issues where indeed resolved by the recent fix. The remaining pre-commit issues are introduced by this PR.
as per https://github.com/IBM/lale/blob/master/CONTRIBUTING.md:

We have a pre-commit hook setup to help ensure that your code is properly formatted and passes various static checks. We highly recommend that you enable it, or at least run the check before submitting a PR. To do so, install the pre-commit python package (this is done automatically if you pip install lale[dev]). Run pre-commit install in your lale repository to enable pre-commit checking, or pre-commit run --all-files to just run the checks once.

Thanks for the answer. I ran pre-commit run --all-files to check and it is failing for pyright check. On checking .pre-commit-config.yaml it looks like it checks for node environment. Is it necessary to run this check? Because we can't commit and push our changes

@shinnar
Copy link
Member

shinnar commented Mar 16, 2021

pyright needs node to be installed. If that is not feasible, you can commit without running pyright. Note that we can not accept the PR until it passes pyright, but it is fine to optimistically commit, and then, if the static checking fails in github actions, fix the reported problem and push a fix. Note that if you have installed the pre-commit hooks, you can skip a given test for a commit by running:

SKIP=pyright git commit ...

@vickyvishal
Copy link
Contributor Author

vickyvishal commented Mar 17, 2021

I have nodejs installed in my system but it still fails the pyright check. So I skipped the pyright check, and now I am facing an issue in the static check in the CI/CD pipeline. Any suggestion?

@shinnar
Copy link
Member

shinnar commented Mar 17, 2021

apologies, but we still seem to be having some CI issues. I think we figured out a fix. once it is working, I will ping you to rebase onto it. apologies again.

kiran-kate and others added 2 commits March 17, 2021 15:16
* Score method for individual operators.

* Score for pipelines.

* update pre-commit checkers to latest versions.  Fix some problems they newly flag

* Trying to fix the pre-commit.

* Reverting back isort version to 5.7.0.

* Trying to change github action keys.

Co-authored-by: Avi Shinnar <shinnar@us.ibm.com>
Copy link
Member

@shinnar shinnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. One last thing is testing.
please add appropriate lines to
https://github.com/IBM/lale/blob/master/test/test_core_classifiers.py#L131
and
https://github.com/IBM/lale/blob/master/test/test_core_regressors.py#L91

also, please rebase to the latest on master, which will hopefully fix the CI problem.

Comment on lines 55 to 58
"quantile": {
"description": "The quantile to predict using the “quantile” strategy. A quantile of 0.5 corresponds to the median, while 0.0 to the minimum and 1.0 to the maximum.",
"type": "float",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have a default and range constraints. I would suggest something like:

"quantile": {
"description": "The quantile to predict using the “quantile” strategy. A quantile of 0.5 corresponds to the median, while 0.0 to the minimum and 1.0 to the maximum.",
"anyOf": [{
                   "enum": [None]
},
{
                    "type": "float",
		    "minimum": 0.0,
		    "maximum": 1.0
}],
"default": None,
}

Copy link
Member

@shinnar shinnar Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vickyvishal
sorry, my fault. I was not careful when I expanded the schema:
that should be "type": "number", not "type": "float"

"constant": {
"description": "The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.",
"side constraint": "",
"type": ["int", "str"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be "type": ["integer", "string"]

Copy link
Member

@shinnar shinnar Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the PR is failing because of this type. it should change to:
"constant": {
"description": "The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.",
"anyOf": [
{"type": ["integer", "string"]},
{"enum": [None]},
"default": None
]}

@shinnar
Copy link
Member

shinnar commented Mar 18, 2021

This is looking great. once it passes the tests, we are almost ready to merge.
The last thing needed --- as per https://github.com/IBM/lale/blob/master/CONTRIBUTING.md,
we need everyone involved in this PR to sign a "Developer's Certificate of Origin", available here:
https://github.com/IBM/lale/blob/master/DCO1.1.txt
please send them to hirzel@us.ibm.com

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Mar 18, 2021

Codecov Report

Merging #644 (ff9d366) into master (7f53b63) will increase coverage by 0.05%.
The diff coverage is 86.17%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #644      +/-   ##
==========================================
+ Coverage   81.23%   81.29%   +0.05%     
==========================================
  Files         308      312       +4     
  Lines       15816    15909      +93     
==========================================
+ Hits        12848    12933      +85     
- Misses       2968     2976       +8     
Impacted Files Coverage Δ
lale/lib/lale/baseline_classifier.py 90.00% <25.00%> (-10.00%) ⬇️
lale/lib/lale/baseline_regressor.py 86.95% <25.00%> (-13.05%) ⬇️
lale/operators.py 82.98% <76.66%> (-0.05%) ⬇️
lale/lib/lightgbm/lgbm_classifier.py 89.18% <100.00%> (+1.31%) ⬆️
lale/lib/lightgbm/lgbm_regressor.py 85.71% <100.00%> (+1.84%) ⬆️
lale/lib/sklearn/__init__.py 100.00% <100.00%> (ø)
lale/lib/sklearn/_common_schemas.py 100.00% <100.00%> (ø)
lale/lib/sklearn/ada_boost_classifier.py 89.47% <100.00%> (+0.58%) ⬆️
lale/lib/sklearn/ada_boost_regressor.py 90.00% <100.00%> (+0.71%) ⬆️
lale/lib/sklearn/dummy_classifier.py 100.00% <100.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f53b63...ff9d366. Read the comment docs.

},
"constant": {
"description": "The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.",
"side constraint": "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously it was decided to have a "side constraint" for "constant" schema. Should I remove it? If there is a property to add for "side constraint" I can add it now.

Copy link
Member

@shinnar shinnar Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the confusion. a "side constraint", in our terminology, is something like this:
https://github.com/IBM/lale/blob/master/lale/lib/sklearn/logistic_regression.py#L320-L333
which adds additional constraints that relate more than one hyperparameter.
However, I don't think it is needed for this PR.
the "side constraint" line above should definitely be removed, as it is probably what is currently breaking the tests. I would guess that is due from the space in the key, although I did not investigate that closely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. Now it's clear. I will remove that key and push again

@tanmaygaikwad
Copy link

This is looking great. once it passes the tests, we are almost ready to merge.
The last thing needed --- as per https://github.com/IBM/lale/blob/master/CONTRIBUTING.md,
we need everyone involved in this PR to sign a "Developer's Certificate of Origin", available here:
https://github.com/IBM/lale/blob/master/DCO1.1.txt
please send them to hirzel@us.ibm.com

Thanks for your contribution!

@hirzel , I have sent you "Developer's Certificate of Origin" over email. Thanks

@shinnar
Copy link
Member

shinnar commented Mar 18, 2021

@vickyvishal I think that something like #644 (comment) will fix the current build problem

Copy link
Member

@shinnar shinnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked out the PR and found the two remaining problems, which I added as comments. With those changes made, the dummy regressor test passes locally, so this should be it.

by the way, you can try running

python -m unittest test.test_core_regressors

to run the test locally

"relevantToOptimizer": [],
"additionalProperties": False,
"required": ["strategy", "quantile"],
"property": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vickyvishal this should be "properties", not "property".

"enum": ["mean", "median", "quantile", "constant"],
"default": "mean",
},
"random_state": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the regressor does not have a random state argument. please delete. instead, please add

"constant": {
                    "description": "The explicit constant as predicted by the “constant” strategy. This parameter is useful only for the “constant” strategy.",
                    "anyOf": [
                        {"type": ["integer", "string"]},
                        {"enum": [None]},
                        {"default": None},
                    ],
                },

@shinnar shinnar merged commit 79f164d into IBM:master Mar 19, 2021
@shinnar
Copy link
Member

shinnar commented Mar 19, 2021

Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants