New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Rule induction (CN2) #1397

Merged
merged 38 commits into from Sep 16, 2016

Conversation

Projects
None yet
8 participants
@matevzkren
Member

matevzkren commented Jul 1, 2016

A more general framework of replaceable individual components that can be fine-tuned to specific needs. To induce rules from examples, divide-and-conquer strategy is applied. CN2 and CN2Unordered algorithms have been implemented so far.. Next up -> widgets!

@matevzkren matevzkren added the gsoc label Jul 1, 2016

Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
Show outdated Hide outdated Orange/classification/rules.py Outdated
@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Jul 6, 2016

Current coverage is 88.66% (diff: 95.02%)

Merging #1397 into master will increase coverage by 0.40%

@@             master      #1397   diff @@
==========================================
  Files            77         78     +1   
  Lines          7624       8066   +442   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           6729       7152   +423   
- Misses          895        914    +19   
  Partials          0          0          

Sunburst

Powered by Codecov. Last update 2b02285...c678871

codecov-io commented Jul 6, 2016

Current coverage is 88.66% (diff: 95.02%)

Merging #1397 into master will increase coverage by 0.40%

@@             master      #1397   diff @@
==========================================
  Files            77         78     +1   
  Lines          7624       8066   +442   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           6729       7152   +423   
- Misses          895        914    +19   
  Partials          0          0          

Sunburst

Powered by Codecov. Last update 2b02285...c678871

@BlazZupan

This comment has been minimized.

Show comment
Hide comment
@BlazZupan

BlazZupan Jul 22, 2016

Contributor

Trying to run on brown-selected.tab (only 186 data instances, 79 attributes). The induction is very slow on this data set, perhaps a problem with missing values?

Contributor

BlazZupan commented Jul 22, 2016

Trying to run on brown-selected.tab (only 186 data instances, 79 attributes). The induction is very slow on this data set, perhaps a problem with missing values?

matevzkren added some commits Aug 8, 2016

Compact view.
Implemented compact view, offering a more suitable one-line dispaly of rules, regardless of their size. Only one column is manipulated to achive the result (section size, resize mode, wrap). Other slight changes included (more expressive relation signs, copy mode worked on).
Implemented 'copy to clipboard'
Enables users to save selected rules to clipboard, for later use.
Adjusted rule labels.
Labels are now more expressive if compact view is ON.
QSelectionModel can now be sorted.
The selection indexes are not really sorted. According to the sorting procedure, new selection indexes (QModelIndex) are determined/calculated and selected manually.
Re-building the RuleViewer.
To enable some specific widget behavior, the core of the widget was re-built. Selection now sticks not only when sorting, but also if 'compact view' is toggled. Copying rules to clipboard was cleaned up. Also implemented output signals. In general, the code is now more expressive and easier to follow.
No duplicate rules allowed.
A new method to compare rules was implemented. It is not obvious to compare rules this way (covered_examples truth vectors are compared), but the implementation is fast, correct, and elegant. It works for all included learners, regardless of chosen covering algorithm and rule ordering.
Unified rule.is_significant() to accomodate cleaner code in the low-l…
…evel search procedure and the top-level control procedure.
Selection model sorting.
The selection persists through ordering, compat view toggling, and restoring original order. The solution is not yet optimal, however. Looking into QSortFilterProxyModel next.
Improved selection handling with QSortFilterProxyModel.
PyTableModel will be updated at a later time to provide the same functionality.
RuleViewer widget unit tests.
Also included are some missing docstrings.
Rule comparison FIX.
Small fix involving rule comparison. In rare occurrences, not having copied covered examples (numpy array) resulted in incorrect cmp result.
[FIX] OWRuleLearner: Progress bar updates are now handled through a c…
…allback function.

Fixes a bug previously produced using test&score on windows machines. Other instances using the generated learner will no longer have affect on the widget's progress bar.
@janezd

This comment has been minimized.

Show comment
Hide comment
@janezd

janezd Aug 26, 2016

Contributor

Having the data input is a good idea, but ...

For classification trees, I can have schema File -> Classification tree -> Classification tree viewer -> Table. When I select a node in the viewer, I see the corresponding subset from the training data. If I try to do the same with CN2 rules, I have to connect File to the Viewer.

Have the data stored with the rules (either always, or add it just in the widget), so the viewer can get it from there.

Contributor

janezd commented Aug 26, 2016

Having the data input is a good idea, but ...

For classification trees, I can have schema File -> Classification tree -> Classification tree viewer -> Table. When I select a node in the viewer, I see the corresponding subset from the training data. If I try to do the same with CN2 rules, I have to connect File to the Viewer.

Have the data stored with the rules (either always, or add it just in the widget), so the viewer can get it from there.

@astaric

This comment has been minimized.

Show comment
Hide comment
@astaric

astaric Sep 8, 2016

Member

Could this PR be merged in the present state and additional features added in subsequent PRs? Are there any obvious parts missing?

Member

astaric commented Sep 8, 2016

Could this PR be merged in the present state and additional features added in subsequent PRs? Are there any obvious parts missing?

@janezd

This comment has been minimized.

Show comment
Hide comment
@janezd

janezd Sep 8, 2016

Contributor

That was my plan, too -- merge and perhaps add features later. If you'd do it, yes, please give it a quick check and merge.

Contributor

janezd commented Sep 8, 2016

That was my plan, too -- merge and perhaps add features later. If you'd do it, yes, please give it a quick check and merge.

@matevzkren

This comment has been minimized.

Show comment
Hide comment
@matevzkren

matevzkren Sep 9, 2016

Member

The existing code is fully functional and ready to be merged as is. Some features are being worked on (including additional algorithms) and should be ready by the end of next week. Those can however be handled in subsequent PRs.

Member

matevzkren commented Sep 9, 2016

The existing code is fully functional and ready to be merged as is. Some features are being worked on (including additional algorithms) and should be ready by the end of next week. Those can however be handled in subsequent PRs.

@matevzkren matevzkren changed the title from [WIP] [ENH] Rule induction (CN2) to [ENH] Rule induction (CN2) Sep 9, 2016

@janezd janezd merged commit 9749ee2 into biolab:master Sep 16, 2016

4 checks passed

codecov/patch 95.02% of diff hit (target 95.00%)
Details
codecov/project 88.66% (+0.40%) compared to 2b02285
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
licence/cla Contributor License Agreement is signed.
Details
@markotoplak

This comment has been minimized.

Show comment
Hide comment
@markotoplak

markotoplak Sep 16, 2016

Member

Merging this increased "python setup.py test" on my computer from 17 to 33 seconds. @matevzkren , could you look into this?

Member

markotoplak commented Sep 16, 2016

Merging this increased "python setup.py test" on my computer from 17 to 33 seconds. @matevzkren , could you look into this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment