Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniform Count Discretization using Dynamic Programming #3

Open
tawheeler opened this issue Mar 28, 2016 · 0 comments
Open

Uniform Count Discretization using Dynamic Programming #3

tawheeler opened this issue Mar 28, 2016 · 0 comments
Assignees
Labels

Comments

@tawheeler
Copy link
Contributor

Uniform Count Discretization requires breaking a set of values into $k$ bins of a roughly equal number of entries. This works great for most continuous data, but has some corner cases if you have a lot of repeated values.

I have a problem with "a roughly equal number of entries" and would like to more rigorously define an optimal discretization scheme.

We ideally want M/k entries per bin, where M is the number of data points and k is the number of bins.

If we use an L2 loss, the score of a particular discretization is merely sum (b - M/k)^2, where b is the size of each bin.

This results in a dynamic programming problem.

@tawheeler tawheeler self-assigned this Mar 28, 2016
@tawheeler tawheeler added the bug label Mar 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant