[FEAT] Feature correlation research #1421

noamzbr · 2022-05-11T12:29:41Z

Is your feature request related to a problem? Please describe.
In order to implement #1164 , need a robust and scaled way to compute correlation between all features - not just numeric to numeric, but also categorical to categorical and numeric to categorical.

Describe the solution you'd like
A proven and tested (in the sense that it has tests) util function that can compute these arbitrary correlations.

Source

Nadav-Barak · 2022-05-18T09:18:00Z

Categorical to categorical
Theil's U an asymmetric measure ranges [0,1] based on entropy. Run time is M^2 times by how fast entropy is calculated (which is in turn a function of the number of unique values)
Alternative is Cramer's V, problem is that it is symmetric therefore losses some information.
We need to compare them in practice (performance and run time). Both have a solid theoretical backbone.

Nadav-Barak · 2022-05-18T10:58:03Z

Numeric to categorical
Correlation ratio looks most promising. It is an asymmetric (single direction!) variance based method to determine how much a numeric feature explains a categorical feature. return a value in [0,1].
It groups values from the numeric feature based on the corresponding category and measures the weighted variance of the groups means divided by the variance of all samples (squared).
Other option is ANOVA which use the same grouping mechanism but uses a simpler (yet similar) metric - compares variance between groups mean with the average variance inside a group

Nadav-Barak · 2022-05-18T12:44:34Z

After discussion with the one and only @noamzbr it was decided to use Theil's U for cat to cat (bi directional) and Correlation ration for numeric to cat. the output will look something like this:

need to consider only presenting some of the columns

noamzbr added the suggestion label May 11, 2022

noamzbr added this to the 0.6.4 milestone May 11, 2022

noamzbr assigned Nadav-Barak May 11, 2022

noamzbr mentioned this issue May 11, 2022

[FEAT] tabular - new check - correlation between features #1164

Closed

noamzbr removed the suggestion label May 18, 2022

Nadav-Barak linked a pull request May 19, 2022 that will close this issue

Nb/feat/correlation methods #1484

Merged

Nadav-Barak closed this as completed in #1484 May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Feature correlation research #1421

[FEAT] Feature correlation research #1421

noamzbr commented May 11, 2022 •

edited by Nadav-Barak

Nadav-Barak commented May 18, 2022 •

edited

Nadav-Barak commented May 18, 2022 •

edited

Nadav-Barak commented May 18, 2022

[FEAT] Feature correlation research #1421

[FEAT] Feature correlation research #1421

Comments

noamzbr commented May 11, 2022 • edited by Nadav-Barak

Nadav-Barak commented May 18, 2022 • edited

Nadav-Barak commented May 18, 2022 • edited

Nadav-Barak commented May 18, 2022

noamzbr commented May 11, 2022 •

edited by Nadav-Barak

Nadav-Barak commented May 18, 2022 •

edited

Nadav-Barak commented May 18, 2022 •

edited