Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature importance on classifier #1161

Closed
pecade opened this issue Jul 26, 2023 · 0 comments
Closed

Feature importance on classifier #1161

pecade opened this issue Jul 26, 2023 · 0 comments

Comments

@pecade
Copy link

pecade commented Jul 26, 2023

"Variable importance" is a measure often used in classification models to identify which features (variables) are most useful in predicting the target variable. In general, this is done by fitting a model and then examining the model coefficients, or by using specific techniques, such as Permutation Feature Importance (PFI) or decision tree analysis.

dedupe trains a classifier model to distinguish between pairs of records that are duplicates and those that are not, but does not provide an easy method for directly examining the importance of variables. Instead, dedupe focuses more on providing a simple interface to perform the de-duplication task, and hides much of the internal details of the model. However, I think this is very valuable to get insihts on the variables definition to have a better understanding on feature interactions and further steps.

@fgregg fgregg closed this as completed Dec 16, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants