-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using the tpot object for prediction #67
Comments
Hi @kadarakos! Error with .predict for iris examplePlease check if the iris features and classes are encoded as numerical features. This is likely the source of your error. We've raised issue #61 to address this problem in the near future. Interpreting generated codeHappy to see feedback about the generated code! The following is occurring in the pipeline you posted:
If you have thoughts on how to make the generated code clearer or easier to use, please let me know. Best, Randy |
Hi @rhiever , Both iris features and classes are encoded as floats. Your explanation makes it clear how to interpret the generated code. It makes me wonder, however, if this is the best way to ensemble models. Imho using the VotingClassifier object would be a more standard/straightforward way of ensembling different classifiers, plus it provides some additional flexibility. |
Ah, I see what happened. The predict function is missing the return result[result['group'] == 'testing', 'guess'].values should be return result.loc[result['group'] == 'testing', 'guess'].values This has already been fixed in the development version, but I haven't rolled it out to pip yet. I will do this soon! |
wrt ensembles of classifiers: I agree 100%! This is also something we're working on in the near future -- adding a pipeline operator that pools classifications from multiple classifiers in different ways (majority etc.). |
Thanks for the quick reply! I evolved another piece of code that scores 1.0 on the iris data set, which is pretty impressive. However, I did raise some questions.
A minor issue was that the DecisionTreeClassifier wasn't imported for the feature selection. Apart from that the I was a bit surprised by the way the feature selection part was implemented. I believe, could be replaced with the shorter - and maybe more general - code snippet from the sklearn documentation:
Is it just me or would this be a bit more concise? Best, |
Actually from observing the code a bit more precisely it seems to me that "result3" is just a sorted version of the original features:
and then the kNN is fitted to the this sorted data frame
, so as far as I understand the feature selection was not actually performed. Running this piece of code
actually shows - unsurprisingly - that the most informative features are the decisions of the previous classifiers. |
That's exactly right. It seems the feature selection in this case was "junk code" that wasn't pruned by the optimization process. i.e., because the feature selection didn't do anything, it wasn't optimized away. I'm working on code now that selects against bloat like that currently. In the most recent version, we've actually removed the decision tree-based feature selection entirely and replaced it with more standard feature selection operators from sklearn: RFE, variance threshold, and various forms of univariate feature selection. Hopefully that will be out soon. You can check it out on the development version in the meantime. |
Error with .predict for iris example
But when I try to use the pipeline as a predictor
tpot.predict(X_train, y_train, X_test)
this is the error I get (iPython debugger output):
Interpreting generated code
Running the iris example generated this piece of code
I struggle a bit to understand what is the intended idea behind providing this result2 dataframe. So there are 2 classification results in the above example both with decision trees and with different hyper-parameters, but how do these get combined?
The text was updated successfully, but these errors were encountered: