Additional Feature Selection operators #50

bartleyn · 2015-12-07T21:04:29Z

Per #45 , this is a first stab at the four additional feature selection operators.

Things to note:

Could use some optimization (caching?)
Could use some toying around with out-of-bounds parameters (e.g., if num_features < 0 in RFE)
Terminals for both estimators & scoring functions that plug into the new operators
- i.e., RFE takes a supervised estimator so I hard-coded in a linear-kernel SVC; likewise Select* takes a scoring function, so I used chi2.
- I think this will be of particular importance for generalizing this to regression tasks.

rhiever · 2015-12-07T21:13:32Z

Reviewing this now. Is RFE the one that's quite slow?

I may also drop dt_feature_selection() as a part of this upgrade since it's not supported in sklearn, and exporting it is quite ugly. Perhaps I'll contact the sklearn folks about merging a variant of dt_feature_selection() into sklearn.

Probably going to drop subset_df() as well, as I now can't imagine it being useful at this point.

bartleyn · 2015-12-07T21:30:15Z

Yeah RFE's the slowest, perhaps because I instantiate the estimator every time.

rhiever · 2015-12-07T21:42:44Z

tpot/tpot.py

+        if '_select_kbest' in operators_used: pipeline_text += 'from sklearn.feature_selection import SelectKBest'
+        if '_select_percentile' in operators_used: pipeline_text += 'from sklearn.feature_selection import SelectPercentile'
+        if '_select_percentile' or '_select_kbest' in operators_used: pipeline_text += 'from sklearn.feature_selection import chi2'
+        if '_rfe' in operators_used: pipeline_text += 'from sklearn.feature_selection import RFE'


These imports need newlines \n at the end of them.

_rfe should also import SVC.

rhiever · 2015-12-07T22:21:47Z

In addition to the comments I've made directly inline, as you mentioned the exports need to ensure that the variables are within their proper limits.

rhiever · 2015-12-07T22:24:52Z

tpot/tpot.py

+        try:
+            selector.fit(training_features) 
+        except ValueError:
+            return input_df.copy()


I'm thinking _variance_threshold() should return an "empty" DF (with only class, group, and guess) in the case where none of the columns are above the threshold. Otherwise it may become an issue where these _variance_threshold() operators with high thresholds are in the pipeline but not actually doing anything.

rhiever · 2015-12-07T22:33:30Z

Alrighty, I think that's all of the comments for now. In all, very nice work on this PR! Thank you for putting this together. Let me know if you want to hack at these comments, else I can merge the PR and clean it up later this week.

Could use some optimization (caching?)

At least with large steps (>= 0.1), I found RFE to run pretty quickly on my test data sets. Maybe we could limit the RFE steps to >= 0.05 or so and it'll be fine?

Could use some toying around with out-of-bounds parameters (e.g., if num_features < 0 in RFE)

Except in the export function, everything looks good to me. A great way to test it is to run TPOT with the new operators for several hundred generations. If it doesn't crash by the end of the run, your code has run the gauntlet and probably caught all of the possible parameter edge cases. :-)

Terminals for both estimators & scoring functions that plug into the new operators

Great idea! I'd imagine we can easily encode various estimators and scoring functions with integer values. It may be tricky to have actual estimators and scoring functions as terminals.

Let's file an issue for this to work on after this is merged.

bartleyn · 2015-12-07T22:46:43Z

Yeah I'm on it. I'll work on it when I get home tonight.

rhiever · 2015-12-07T22:48:04Z

You rock! 👍

…checks, cleaning up

rhiever · 2015-12-08T14:52:47Z

tpot/tpot.py

+    {2} = {0}[['guess', 'class', 'group']]
+try:
+    mask = selector.get_support(True)
+    mask_cols = list(training_features[mask].columns) + ['guess', 'class', 'group']


There is no guess and group in the export code.

rhiever · 2015-12-08T14:52:51Z

tpot/tpot.py

+try:
+    selector.fit(training_features.values)
+except ValueError:
+    {2} = {0}[['guess', 'class', 'group']]


There is no guess and group in the export code.

rhiever · 2015-12-08T15:02:48Z

Just noting some minor issues in the export code. About to do some final tests of the pipeline operators themselves, and if those turn out fine, I'll merge this and do some final cleanup.

bartleyn · 2015-12-08T15:27:15Z

Gotcha, thanks! I won't concern myself with cleaning up the export code then. I'll get it right one day though, I promise :P.

Additional Feature Selection operators

rhiever · 2015-12-08T15:27:47Z

Looks good! Thanks again for your PR! :-)

bartleyn added 5 commits December 7, 2015 00:18

Added VarianceThreshold feature selection operator and export code

97b88d7

Added SelectKBest and SelectPercentile, and corresponding export code

c751bbe

Added RFE and export code

7afba07

Minor changes to export for RFE

fa2178f

Merge remote-tracking branch 'upstream/master'

55995ba

rhiever self-assigned this Dec 7, 2015

rhiever added the enhancement label Dec 7, 2015

rhiever reviewed Dec 7, 2015
View reviewed changes

Expanded export code for feature selection operators, tweaked bounds …

5b02770

…checks, cleaning up

rhiever reviewed Dec 8, 2015
View reviewed changes

rhiever pushed a commit that referenced this pull request Dec 8, 2015

Merge pull request #50 from bartleyn/master

36cc872

Additional Feature Selection operators

rhiever merged commit 36cc872 into EpistasisLab:master Dec 8, 2015

rhiever added a commit that referenced this pull request Dec 8, 2015

Minor cleanup on PR #50

3f95931

AIAdventures mentioned this pull request Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

saddy001 mentioned this pull request Mar 20, 2018

Segfault on optimization process #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional Feature Selection operators #50

Additional Feature Selection operators #50

bartleyn commented Dec 7, 2015

rhiever commented Dec 7, 2015

bartleyn commented Dec 7, 2015

rhiever Dec 7, 2015

rhiever Dec 7, 2015

rhiever commented Dec 7, 2015

rhiever Dec 7, 2015

rhiever commented Dec 7, 2015

bartleyn commented Dec 7, 2015

rhiever commented Dec 7, 2015

rhiever Dec 8, 2015

rhiever Dec 8, 2015

rhiever commented Dec 8, 2015

bartleyn commented Dec 8, 2015

rhiever commented Dec 8, 2015

Additional Feature Selection operators #50

Additional Feature Selection operators #50

Conversation

bartleyn commented Dec 7, 2015

rhiever commented Dec 7, 2015

bartleyn commented Dec 7, 2015

rhiever Dec 7, 2015

Choose a reason for hiding this comment

rhiever Dec 7, 2015

Choose a reason for hiding this comment

rhiever commented Dec 7, 2015

rhiever Dec 7, 2015

Choose a reason for hiding this comment

rhiever commented Dec 7, 2015

bartleyn commented Dec 7, 2015

rhiever commented Dec 7, 2015

rhiever Dec 8, 2015

Choose a reason for hiding this comment

rhiever Dec 8, 2015

Choose a reason for hiding this comment

rhiever commented Dec 8, 2015

bartleyn commented Dec 8, 2015

rhiever commented Dec 8, 2015