Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge learner widgets #1657

Merged
merged 58 commits into from Dec 5, 2016

Conversation

Projects
None yet
5 participants
@pavlin-policar
Copy link
Collaborator

commented Oct 12, 2016

A much simpler approach to handling data with continuous or discrete target variables inside learner widgets.

The BaseLearner now defines a problem_type context setting that tells us whether the data represents a classification or regression problem. This setting can be manually set with radio buttons if there is no data (this is required to work properly with the Test&Score widget). If there is data on the input, we disable the radio buttons and assign the appropriate type. The radio buttons are completely opt-in and can be enabled with a flag on the widget, so widgets that only handle one kind of target variable don't have to deal with any extra logic.

Each individual learner can then override the on_type_change method, where they can do whatever updates they need to the UI or anything else.

I've made the modifications necessary for this approach to work to the TreeLearner widget. Please let me know if this is an okay approach and if I should continue this route and port the other widgets to something like this.

@janezd

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2016

This was an example of bad communication: tree classifier and regressor were already merged into a single widget, but unfortunately within a PR that was neglected from July. Sorry about that... The PR was finally merged today; please update to today's master and see the regression tree widget.

@@ -17,6 +17,7 @@ def widget_discovery(discovery):
"Orange.widgets.visualize",
"Orange.widgets.classify",
"Orange.widgets.regression",
"Orange.widgets.learners",

This comment has been minimized.

Copy link
@janezd

janezd Oct 14, 2016

Contributor

@BlazZupan, will the merged category (classify + regression) be called "learners"?

  • We have "Unsupervised", so to keep in line with that the new category would have to be "Supervised". It's weird, I don't like it.
  • A better option is "Modelling". The form doesn't fit, but we already have a mess -- "Visualize", "Classify" and "Regression", and not "Visualization", "Classification" and "Regression" (or "Visualize", "Classify" and "Regress"!?).
  • The problem with "Modelling" is that it also covers unsupervised modelling.

Still, I vote for "Model" or "Modelling".

This comment has been minimized.

Copy link
@janezd

janezd Nov 4, 2016

Contributor

@BlazZupan agrees with Model as the category name.

@@ -123,6 +128,7 @@ def __init__(self):
self.preprocessors = None
self.outdated_settings = False
self.setup_layout()
# TODO: Is this really necessary?

This comment has been minimized.

Copy link
@janezd

janezd Oct 14, 2016

Contributor

Yes. This is "forced commit"; the widget needs to set the output when constructed or upon receiving new data even if auto-commit is off.

@janezd

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2016

I don't like the radio buttons. Besides using space, they unnecessarily burden the user. Say the the user connect a File widget and a learner widget to Test Learners. From the user's perspective, the learner widget outputs a "modelling algorithm". The algorithm "should be able to handle classification and regression data". Having to specify the kind of data seems "unreasonable", since the algorithm "sees" what kind of data it got.

The philosophy is that the user can decide how the algorithm should run and not that is has to set the algorithm properly to run for a particular data type (when the decision is trivial and can be automated).

The need for the radio buttons hence doesn't come because the user needs to decide something but just because classifier and regressor are implemented as separate classes and Test Learners needs to be given the correct class.

I suggest two solutions.

  • Ideally, learning algorithms should be able to handle both regression and classification, in particular if they share most of the code, such as in knn.
  • Otherwise, we can have a general dispatcher for learning. It would be derived from Fitter, and contain a classifier and regressor, e.g. fitter = FitterDispatcher(tree_classifier, tree_regressor), and then it would self.send('Learner', fitter) to Test and Score. The dispatcher would then delegate the learning to either classifier or regressor, depending on the data type.
@pavlin-policar

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 14, 2016

I agree, this would be a much better approach.

But merging the learners into a single one still leaves a problem. While some of the learners (KNN, AdaBoost, RF) can be easily merged since they their UIs are identical, others take different parameters that need to be set in the UI.

  • Linear regression supports none, L1, L2 and elastic net regularization whereas logistic regression only supports L1 and L2 regularization.
  • In the new tree widgets, the classification has an extra option to stop when the majority reaches a certain percent while the regression one does not have this option.
  • The SVM can output either a C-SVM model for classification or an epsilon-SVM for regression.

Of course it makes sense that regression and classification problems can have slightly different parameters, but if we merge them into single learners where should these problem specific parameters appear or should they disappear altogether? Should the parameters be renamed to something more generic so they refer to either problems (but that might lead to confusion)?

If the learner widget has an input table, then displaying the correct options is simple and requires no input from the user, but if we just connect them to the Test&Score widget, how would the user be able adjust the specific parameters for instance regression since the learner widget has no awareness of the data (which makes sense)? This was the reasoning behind the radio buttons, since I haven't been able to come up with a better solution other than removing the specific parameters altogether.

As an aside, I've noticed that the SGDClassifier can be turned into logistic regression and does support all the types of regularization that linear regression does so this may not be a problem at all since then they could have the same parameters.

@janezd

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2016

Huh, I see. I'd say that it two methods (classification/regression) require substantially different GUIs that cannot be nicely merged into one, they should remain separate widgets. An extra option is not a problem, even a few options in separate (always visible?) boxes for classification and regression are OK, IMHO.

However, we will always have pure classification or regression methods (CN2?), and we could even keep logistic and linear regression in separate widgets if we had to.

@pavlin-policar

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 14, 2016

I do think that options that are always visible could be confusing and not the most user friendly, since when I change an option I expect a change, but that may not always happen in this case. Maybe help text, tooltips, help icons, ...

Anyway, I'll get on merging the ones that make sense first.

@kernc

This comment has been minimized.

Copy link
Member

commented Oct 14, 2016

Could remain visible but become disabled.

@janezd

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2016

@kernc, read more carefully. Disabled based upon ... what?

@kernc

This comment has been minimized.

Copy link
Member

commented Oct 14, 2016

Input, provided provided.

@janezd

This comment has been minimized.

Copy link
Contributor

commented Oct 14, 2016

I'm wasting time, I guess, but I'll bite. :)

If the learner widget has an input table, then displaying the correct options is simple and requires no input from the user, but if we just connect them to the Test&Score widget, how would the user be able adjust the specific parameters for instance regression since the learner widget has no awareness of the data (which makes sense)?

@kernc

This comment has been minimized.

Copy link
Member

commented Oct 14, 2016

Right.

@pavlin-policar pavlin-policar force-pushed the pavlin-policar:merge-learners branch 3 times, most recently from 3d48e5c to 5a26464 Oct 15, 2016

@codecov-io

This comment has been minimized.

Copy link

commented Oct 18, 2016

Current coverage is 89.03% (diff: 100%)

Merging #1657 into master will increase coverage by 0.06%

@@             master      #1657   diff @@
==========================================
  Files            82         85     +3   
  Lines          8963       9019    +56   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           7974       8030    +56   
  Misses          989        989          
  Partials          0          0          

Sunburst

Powered by Codecov. Last update 11c3f11...058dc41

@pavlin-policar

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 19, 2016

What might be a good solution when there are different options in the UI in dropdown menus that are specific to classification and regression problems?

For example, AdaBoost is identical in all aspects except the algorithm and should definitely be merged, but since the learner is completely unaware of the data (until it gets some data, but that doesn't matter since it has to work with Test&Score).

I'm not sure how I would go about solving this.

@janezd

This comment has been minimized.

Copy link
Contributor

commented Oct 19, 2016

I would have boxes "Discrete target (Classification)" and "Numeric target (Regression)", and put the specific controls into them. (I may change my mind about the box names, but at the moment I like them. :)

While you're changing the interface, could you also tidy it up a bit? Make the spin boxes the same width; see the classification tree widget for an example. I guess the two combo boxes should also have the same width (but not the same as the (narrower) spins).

@pavlin-policar pavlin-policar force-pushed the pavlin-policar:merge-learners branch from e0607cf to f794d43 Oct 23, 2016

@pavlin-policar pavlin-policar force-pushed the pavlin-policar:merge-learners branch from f794d43 to 02e1eea Nov 3, 2016

@@ -68,7 +73,7 @@ def preprocess(self, data):
"""
Apply the `preprocessors` to the data.
"""
for pp in self.preprocessors:
for pp in set(self.preprocessors) | set(type(self).preprocessors):

This comment has been minimized.

Copy link
@janezd

janezd Nov 4, 2016

Contributor

This changes the order of preprocessors. I guess you shouldn't do this.

return (("Number of neighbours", self.n_neighbors),
("Metric", self.metrics[self.metric_index].capitalize()),
("Weight", self.weights[self.weight_type].capitalize()))
class OWKNNLearner(OWKNNBase):

This comment has been minimized.

Copy link
@janezd

janezd Nov 4, 2016

Contributor

OWKNNLearner = OWKNNBase

@pavlin-policar pavlin-policar force-pushed the pavlin-policar:merge-learners branch 2 times, most recently from 10a2205 to 201b791 Nov 20, 2016

'Regression learner should override default preprocessors '
'when specified in constructor')

def test_use_deafult_preprocessors_property(self):

This comment has been minimized.

Copy link
@janezd

janezd Nov 25, 2016

Contributor

Typo.

@pavlin-policar

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 25, 2016

@janezd After looking over what we discussed more carefully, there should actually not be any need to set the domain on the learner inside the fitter __call__ method, since each individual learner sets its own property in their own __call__ method. It's actually the fitter instance that remains without the domain, which should be fine, since it doesn't need it and shouldn't need it.

if self.__handles_any_input_type():
# Only add a classification section if the method is overridden
if type(self).add_classification_layout is not \
OWBaseLearner.add_classification_layout:

This comment has been minimized.

Copy link
@janezd

janezd Nov 26, 2016

Contributor

@astaric, we're seeing more and more of this checking whether a class has overrides an inherited method. (Not that it's nice, but it's useful.) Is it time to implement something like

def overrides(method, base):
    return getattr(type(method.__self__), method.__name__) != getattr(base, method.__name__)

to be used like (here above) if overrides(self.add_classification_layout, OWBaseLearner)?

@janezd

This comment has been minimized.

Copy link
Contributor

commented Nov 26, 2016

I see, of course.

It looks safe to merge, unless @kernc has any objections. Just thing: can you remove the change to ada_boost, so the pr doesn't touch more files than necessary?

@janezd

janezd approved these changes Nov 26, 2016

@pavlin-policar

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 26, 2016

Sure thing.

@pavlin-policar pavlin-policar force-pushed the pavlin-policar:merge-learners branch from 7479bf4 to 058dc41 Dec 2, 2016

@kernc
Copy link
Member

left a comment

Looks good. Sorry for being so pedantic. Best start with a tidy core this time around; it spoils soon enough anyway.


def __get_kwargs(self, kwargs, problem_type):
params = self._get_learner_kwargs(self.__fits__[problem_type])
return {k: kwargs[k] for k in params & set(kwargs.keys())}

This comment has been minimized.

Copy link
@kernc

kernc Dec 2, 2016

Member

You can elegantly avoid the static method below if you use a more verbose variable name:

def __kwargs(self, problem_type):
    learner_kwargs = set(self.__fits__[problem_type].__init__.__code__.co_varnames[1:])
    return {k: v for k, v in self.kwargs.items() if k in learner_kwargs)

You also needn't pass kwargs separately.

"""Get the learner for a given problem type."""
# Prevent trying to access the learner when problem type is None
if problem_type not in self.__fits__:
raise AttributeError(

This comment has been minimized.

Copy link
@kernc

kernc Dec 2, 2016

Member

Is this more an AttributeError or a TypeError?

attribute with a valid value."""
def __new__(mcs, name, bases, kwargs):
# Check that a fitter implementation defines a valid `__fits__`
if kwargs.get('name', False):

This comment has been minimized.

Copy link
@kernc

kernc Dec 2, 2016

Member

I still don't think checking name attribute is adequate. e6a14f0 in #1771 obsoletes redundant short names on Learner types. Can you instead check that the class' name equals 'Fitter' or that none of the bases is named 'Fitter' or similar?

Also, kwargs is actually a dict of class attributes, so attrs or similar might be more appropriate.

@janezd janezd merged commit 0bb5de1 into biolab:master Dec 5, 2016

5 checks passed

codecov/patch 100% of diff hit (target 95.00%)
Details
codecov/project 89.03% (+0.06%) compared to 11c3f11
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
licence/cla Contributor License Agreement is signed.
Details

@kernc kernc referenced this pull request Jan 3, 2017

Merged

Merge OWRandomForest #1817

2 of 3 tasks complete

@astaric astaric modified the milestones: 3.3.10, future Jan 4, 2017

@astaric astaric modified the milestone: future Apr 19, 2017

@pavlin-policar pavlin-policar deleted the pavlin-policar:merge-learners branch Oct 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.