-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] untransform encoded categorical values and change type of problem #21
Comments
Hi @fjpa121197 👍
|
Hi @AutoViML, But the last part, using XGBoost, it gives the following output:
And outputs[0] is giving the target variable only (as a dataframe). The first suggestion solved my problem, but I'm curious when looking at the transformed dataset (or the dataset with selected features) to find my categorical variables encoded using OrdinalEncoder? Is this the default on how the XGBoost part finds the most important features? Not sure if assuming an ordinal relationship is appropiate for all categorical columns. |
Hi @fjpa121197 👍 |
Hi @AutoViML, That did solve my problem, and was able to run the last part that without problem, thanks! I do still have questions about this: Is there any way to see if the results are different when using One-hot encoding? But to be able to see the actual features after encoding? For example: Lets says I have a categorical column However, after using featurewiz, the returned features (selected) and returned like that:
Is there any way to know the actual value or to which type does it refer to? |
Hi @fjpa121197 👍 |
Hi @AutoViML, The first option sound good for me! And I can handle the inverse/untransformation of the columns with the output from Featurewiz, and do not assume an ordinal relationship for my categorical features. Sorry for another question, but I'm really interested and amazed by the automation part. Is there any way to know the performance of the XGBoost estimator at the different stages where it reduces features? |
Hi @fjpa121197 👍
You should not worry too much about performance each time since Recursive XGBoost uses fewer and fewer features to use in its modeling. That means the actual performance in each round might be falling: but that is not what matters. What matters is that we need to know among the fewer variables, which one stands out as being the most important. That's why I don't show the performance since that will give a misleading picture. If you don't believe this method will work for you, the best thing to do is to compare If this answers your question, please consider closing this issue. |
That is understandable, I think I will compare results with other techniques. But overall, great tool. Thanks for the help and answering these questions! Closing this. |
Hello, I'm testing featurewiz with a dataframe with numerical and categorical variables, and a target variables that ranges from 0 - 55, with most of my values (for the target variable) between 0-6.
My first question comes to the fact that when I run:
Everything runs fine, but the final output is like this:
Is there any change that I know what is property_type_1? Or at least have it transformed back to its original name?
On the other hand, for the type of problem, is there any way to override this? I do want to set it to a regression problem, but it is assuming the target variables as multi classification (and the XGBoost part ends up not working).
Thanks
The text was updated successfully, but these errors were encountered: