-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional Feature Selection operators #50
Conversation
Reviewing this now. Is RFE the one that's quite slow? I may also drop Probably going to drop |
Yeah RFE's the slowest, perhaps because I instantiate the estimator every time. |
if '_select_kbest' in operators_used: pipeline_text += 'from sklearn.feature_selection import SelectKBest' | ||
if '_select_percentile' in operators_used: pipeline_text += 'from sklearn.feature_selection import SelectPercentile' | ||
if '_select_percentile' or '_select_kbest' in operators_used: pipeline_text += 'from sklearn.feature_selection import chi2' | ||
if '_rfe' in operators_used: pipeline_text += 'from sklearn.feature_selection import RFE' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These imports need newlines \n
at the end of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_rfe
should also import SVC.
In addition to the comments I've made directly inline, as you mentioned the exports need to ensure that the variables are within their proper limits. |
try: | ||
selector.fit(training_features) | ||
except ValueError: | ||
return input_df.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking _variance_threshold()
should return an "empty" DF (with only class
, group
, and guess
) in the case where none of the columns are above the threshold. Otherwise it may become an issue where these _variance_threshold()
operators with high thresholds are in the pipeline but not actually doing anything.
Alrighty, I think that's all of the comments for now. In all, very nice work on this PR! Thank you for putting this together. Let me know if you want to hack at these comments, else I can merge the PR and clean it up later this week.
At least with large steps (>= 0.1), I found RFE to run pretty quickly on my test data sets. Maybe we could limit the RFE steps to >= 0.05 or so and it'll be fine?
Except in the export function, everything looks good to me. A great way to test it is to run TPOT with the new operators for several hundred generations. If it doesn't crash by the end of the run, your code has run the gauntlet and probably caught all of the possible parameter edge cases. :-)
Great idea! I'd imagine we can easily encode various estimators and scoring functions with integer values. It may be tricky to have actual estimators and scoring functions as terminals. Let's file an issue for this to work on after this is merged. |
Yeah I'm on it. I'll work on it when I get home tonight. |
You rock! 👍 |
…checks, cleaning up
{2} = {0}[['guess', 'class', 'group']] | ||
try: | ||
mask = selector.get_support(True) | ||
mask_cols = list(training_features[mask].columns) + ['guess', 'class', 'group'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no guess
and group
in the export code.
try: | ||
selector.fit(training_features.values) | ||
except ValueError: | ||
{2} = {0}[['guess', 'class', 'group']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no guess
and group
in the export code.
Just noting some minor issues in the export code. About to do some final tests of the pipeline operators themselves, and if those turn out fine, I'll merge this and do some final cleanup. |
Gotcha, thanks! I won't concern myself with cleaning up the export code then. I'll get it right one day though, I promise :P. |
Additional Feature Selection operators
Looks good! Thanks again for your PR! :-) |
Per #45 , this is a first stab at the four additional feature selection operators.
Things to note: