Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PhiK correlation between categorical and numerical variables #547

Open
aecio opened this issue May 11, 2020 · 2 comments
Open

Comments

@aecio
Copy link

aecio commented May 11, 2020

Describe the solution you'd like

Phi_k is a new correlation coefficient between categorical, ordinal, and interval variables with Pearson characteristics. Paper: https://arxiv.org/pdf/1811.11440.pdf

Phi_k seems to be based on the Chi2 contingency test and use Brent's optimization, both of which seem to be already available in this library. Perhaps it wouldn't be too hard to implement it in smile.

Besides working with categorical, ordinal, and interval variables, it also captures non-linear dependency, so it would be a powerful tool for data analysis.

Describe alternatives you've considered

There is an official Python implementation based on NumPy and SciPy available here: https://github.com/KaveIO/PhiK. It is well integrated with pandas and allows computing correlation matrices of pandas data frames.

--

I'm curious to know if this is something you'd like to include in smile.

@haifengl
Copy link
Owner

Thanks. Would you like to make a PR?

@aecio
Copy link
Author

aecio commented May 12, 2020

Thanks, for the reply. I'm not sure I would be able to make it in the next month or so given my current project priorities, but I may be able to do it in the future. I wanted to first gauge if this was something already on your radar and if you would be interested to add it to SMILE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants