Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test out Gaussian Process for dealing with outliers #52

Open
sonyahanson opened this issue Apr 28, 2016 · 8 comments
Open

Test out Gaussian Process for dealing with outliers #52

sonyahanson opened this issue Apr 28, 2016 · 8 comments

Comments

@sonyahanson
Copy link
Contributor

Thanks to Patrick and Bas for chatting about this over coffee after my lab meeting. Sounds like Lee had a very promising answer: Gaussian Processes!

Here are some potentially useful links I found by googling 'gaussian process outliers python':
https://bugra.github.io/work/notes/2014-05-11/robust-regression-and-outlier-detection-via-gaussian-processes/
https://ocefpaf.github.io/python4oceanographers/blog/2015/03/16/outlier_detection/

@jchodera
Copy link
Member

I don't think this is what we want. GPs are great for data in which there is a natural spatial relationship between the collected data, but that relationship must be learned. We are dealing with a very different case---we know what the relationship is, through the dissociation constant equations and mass conservation laws. Utilizing a GP of the sort in those examples would not only "forget" that information, but it doesn't allow us to propagate any uncertainty in which points are outliers into the posterior.

@jchodera
Copy link
Member

Instead, I think we should use an approach like this, where there is a prior on the fraction of outliers and the outlier distribution has a mean and variance that is inferred (and marginalized out) during MCMC sampling:
http://www.astroml.org/book_figures/chapter8/fig_outlier_rejection.html

@jchodera
Copy link
Member

But first, before we even talk about models, we absolutely need to collect some examples of the outliers and look at them to see what it tells us about the nature of the data.

@sonyahanson
Copy link
Contributor Author

Just making a note here that this is something we should keep at the front of our minds.

@jchodera
Copy link
Member

jchodera commented Jun 6, 2016

Agreed! Would be great to compile a list of data with outliers to find a strategy that works!

@jchodera
Copy link
Member

jchodera commented Aug 1, 2016

Awesome! This is exactly what we need to make this work! Thanks!

@sonyahanson
Copy link
Contributor Author

@jchodera has an idea about Bayesian outlier detection that he is interested in implementing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants