Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upMerge learner widgets #1657
Conversation
This comment has been minimized.
This comment has been minimized.
|
This was an example of bad communication: tree classifier and regressor were already merged into a single widget, but unfortunately within a PR that was neglected from July. Sorry about that... The PR was finally merged today; please update to today's master and see the regression tree widget. |
janezd
reviewed
Oct 14, 2016
| @@ -17,6 +17,7 @@ def widget_discovery(discovery): | |||
| "Orange.widgets.visualize", | |||
| "Orange.widgets.classify", | |||
| "Orange.widgets.regression", | |||
| "Orange.widgets.learners", | |||
This comment has been minimized.
This comment has been minimized.
janezd
Oct 14, 2016
Contributor
@BlazZupan, will the merged category (classify + regression) be called "learners"?
- We have "Unsupervised", so to keep in line with that the new category would have to be "Supervised". It's weird, I don't like it.
- A better option is "Modelling". The form doesn't fit, but we already have a mess -- "Visualize", "Classify" and "Regression", and not "Visualization", "Classification" and "Regression" (or "Visualize", "Classify" and "Regress"!?).
- The problem with "Modelling" is that it also covers unsupervised modelling.
Still, I vote for "Model" or "Modelling".
This comment has been minimized.
This comment has been minimized.
janezd
reviewed
Oct 14, 2016
| @@ -123,6 +128,7 @@ def __init__(self): | |||
| self.preprocessors = None | |||
| self.outdated_settings = False | |||
| self.setup_layout() | |||
| # TODO: Is this really necessary? | |||
This comment has been minimized.
This comment has been minimized.
janezd
Oct 14, 2016
Contributor
Yes. This is "forced commit"; the widget needs to set the output when constructed or upon receiving new data even if auto-commit is off.
This comment has been minimized.
This comment has been minimized.
|
I don't like the radio buttons. Besides using space, they unnecessarily burden the user. Say the the user connect a File widget and a learner widget to Test Learners. From the user's perspective, the learner widget outputs a "modelling algorithm". The algorithm "should be able to handle classification and regression data". Having to specify the kind of data seems "unreasonable", since the algorithm "sees" what kind of data it got. The philosophy is that the user can decide how the algorithm should run and not that is has to set the algorithm properly to run for a particular data type (when the decision is trivial and can be automated). The need for the radio buttons hence doesn't come because the user needs to decide something but just because classifier and regressor are implemented as separate classes and Test Learners needs to be given the correct class. I suggest two solutions.
|
This comment has been minimized.
This comment has been minimized.
|
I agree, this would be a much better approach. But merging the learners into a single one still leaves a problem. While some of the learners (KNN, AdaBoost, RF) can be easily merged since they their UIs are identical, others take different parameters that need to be set in the UI.
Of course it makes sense that regression and classification problems can have slightly different parameters, but if we merge them into single learners where should these problem specific parameters appear or should they disappear altogether? Should the parameters be renamed to something more generic so they refer to either problems (but that might lead to confusion)? If the learner widget has an input table, then displaying the correct options is simple and requires no input from the user, but if we just connect them to the Test&Score widget, how would the user be able adjust the specific parameters for instance regression since the learner widget has no awareness of the data (which makes sense)? This was the reasoning behind the radio buttons, since I haven't been able to come up with a better solution other than removing the specific parameters altogether. As an aside, I've noticed that the SGDClassifier can be turned into logistic regression and does support all the types of regularization that linear regression does so this may not be a problem at all since then they could have the same parameters. |
This comment has been minimized.
This comment has been minimized.
|
Huh, I see. I'd say that it two methods (classification/regression) require substantially different GUIs that cannot be nicely merged into one, they should remain separate widgets. An extra option is not a problem, even a few options in separate (always visible?) boxes for classification and regression are OK, IMHO. However, we will always have pure classification or regression methods (CN2?), and we could even keep logistic and linear regression in separate widgets if we had to. |
This comment has been minimized.
This comment has been minimized.
|
I do think that options that are always visible could be confusing and not the most user friendly, since when I change an option I expect a change, but that may not always happen in this case. Maybe help text, tooltips, help icons, ... Anyway, I'll get on merging the ones that make sense first. |
This comment has been minimized.
This comment has been minimized.
|
Could remain visible but become disabled. |
This comment has been minimized.
This comment has been minimized.
|
@kernc, read more carefully. Disabled based upon ... what? |
This comment has been minimized.
This comment has been minimized.
|
Input, provided provided. |
This comment has been minimized.
This comment has been minimized.
|
I'm wasting time, I guess, but I'll bite. :)
|
This comment has been minimized.
This comment has been minimized.
|
Right. |
pavlin-policar
force-pushed the
pavlin-policar:merge-learners
branch
3 times, most recently
from
3d48e5c
to
5a26464
Oct 15, 2016
This comment has been minimized.
This comment has been minimized.
codecov-io
commented
Oct 18, 2016
•
Current coverage is 89.03% (diff: 100%)@@ master #1657 diff @@
==========================================
Files 82 85 +3
Lines 8963 9019 +56
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 7974 8030 +56
Misses 989 989
Partials 0 0
|
This comment has been minimized.
This comment has been minimized.
|
What might be a good solution when there are different options in the UI in dropdown menus that are specific to classification and regression problems? For example, AdaBoost is identical in all aspects except the algorithm and should definitely be merged, but since the learner is completely unaware of the data (until it gets some data, but that doesn't matter since it has to work with Test&Score). I'm not sure how I would go about solving this. |
This comment has been minimized.
This comment has been minimized.
|
I would have boxes "Discrete target (Classification)" and "Numeric target (Regression)", and put the specific controls into them. (I may change my mind about the box names, but at the moment I like them. :) While you're changing the interface, could you also tidy it up a bit? Make the spin boxes the same width; see the classification tree widget for an example. I guess the two combo boxes should also have the same width (but not the same as the (narrower) spins). |
pavlin-policar
force-pushed the
pavlin-policar:merge-learners
branch
from
e0607cf
to
f794d43
Oct 23, 2016
pavlin-policar
force-pushed the
pavlin-policar:merge-learners
branch
from
f794d43
to
02e1eea
Nov 3, 2016
janezd
reviewed
Nov 4, 2016
| @@ -68,7 +73,7 @@ def preprocess(self, data): | |||
| """ | |||
| Apply the `preprocessors` to the data. | |||
| """ | |||
| for pp in self.preprocessors: | |||
| for pp in set(self.preprocessors) | set(type(self).preprocessors): | |||
This comment has been minimized.
This comment has been minimized.
janezd
Nov 4, 2016
Contributor
This changes the order of preprocessors. I guess you shouldn't do this.
janezd
reviewed
Nov 4, 2016
| return (("Number of neighbours", self.n_neighbors), | ||
| ("Metric", self.metrics[self.metric_index].capitalize()), | ||
| ("Weight", self.weights[self.weight_type].capitalize())) | ||
| class OWKNNLearner(OWKNNBase): |
This comment has been minimized.
This comment has been minimized.
pavlin-policar
force-pushed the
pavlin-policar:merge-learners
branch
2 times, most recently
from
10a2205
to
201b791
Nov 20, 2016
janezd
reviewed
Nov 25, 2016
| 'Regression learner should override default preprocessors ' | ||
| 'when specified in constructor') | ||
|
|
||
| def test_use_deafult_preprocessors_property(self): |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@janezd After looking over what we discussed more carefully, there should actually not be any need to set the domain on the learner inside the fitter |
janezd
reviewed
Nov 26, 2016
| if self.__handles_any_input_type(): | ||
| # Only add a classification section if the method is overridden | ||
| if type(self).add_classification_layout is not \ | ||
| OWBaseLearner.add_classification_layout: |
This comment has been minimized.
This comment has been minimized.
janezd
Nov 26, 2016
Contributor
@astaric, we're seeing more and more of this checking whether a class has overrides an inherited method. (Not that it's nice, but it's useful.) Is it time to implement something like
def overrides(method, base):
return getattr(type(method.__self__), method.__name__) != getattr(base, method.__name__)
to be used like (here above) if overrides(self.add_classification_layout, OWBaseLearner)?
This comment has been minimized.
This comment has been minimized.
|
I see, of course. It looks safe to merge, unless @kernc has any objections. Just thing: can you remove the change to ada_boost, so the pr doesn't touch more files than necessary? |
janezd
approved these changes
Nov 26, 2016
This comment has been minimized.
This comment has been minimized.
|
Sure thing. |
pavlin-policar
added some commits
Nov 26, 2016
pavlin-policar
force-pushed the
pavlin-policar:merge-learners
branch
from
7479bf4
to
058dc41
Dec 2, 2016
kernc
reviewed
Dec 2, 2016
|
Looks good. Sorry for being so pedantic. Best start with a tidy core this time around; it spoils soon enough anyway. |
|
|
||
| def __get_kwargs(self, kwargs, problem_type): | ||
| params = self._get_learner_kwargs(self.__fits__[problem_type]) | ||
| return {k: kwargs[k] for k in params & set(kwargs.keys())} |
This comment has been minimized.
This comment has been minimized.
kernc
Dec 2, 2016
Member
You can elegantly avoid the static method below if you use a more verbose variable name:
def __kwargs(self, problem_type):
learner_kwargs = set(self.__fits__[problem_type].__init__.__code__.co_varnames[1:])
return {k: v for k, v in self.kwargs.items() if k in learner_kwargs)You also needn't pass kwargs separately.
| """Get the learner for a given problem type.""" | ||
| # Prevent trying to access the learner when problem type is None | ||
| if problem_type not in self.__fits__: | ||
| raise AttributeError( |
This comment has been minimized.
This comment has been minimized.
| attribute with a valid value.""" | ||
| def __new__(mcs, name, bases, kwargs): | ||
| # Check that a fitter implementation defines a valid `__fits__` | ||
| if kwargs.get('name', False): |
This comment has been minimized.
This comment has been minimized.
kernc
Dec 2, 2016
•
Member
I still don't think checking name attribute is adequate. e6a14f0 in #1771 obsoletes redundant short names on Learner types. Can you instead check that the class' name equals 'Fitter' or that none of the bases is named 'Fitter' or similar?
Also, kwargs is actually a dict of class attributes, so attrs or similar might be more appropriate.
pavlin-policar commentedOct 12, 2016
A much simpler approach to handling data with continuous or discrete target variables inside learner widgets.
The
BaseLearnernow defines aproblem_typecontext setting that tells us whether the data represents a classification or regression problem. This setting can be manually set with radio buttons if there is no data (this is required to work properly with the Test&Score widget). If there is data on the input, we disable the radio buttons and assign the appropriate type. The radio buttons are completely opt-in and can be enabled with a flag on the widget, so widgets that only handle one kind of target variable don't have to deal with any extra logic.Each individual learner can then override the
on_type_changemethod, where they can do whatever updates they need to the UI or anything else.I've made the modifications necessary for this approach to work to the
TreeLearnerwidget. Please let me know if this is an okay approach and if I should continue this route and port the other widgets to something like this.