We can learn a lot about our function by measuring its values at various locations. But we still have uncertainty, because...
- ...a continuous function has (uncountably) infinitely many values
(but we can only perform finitely many measurements)
- ...even the values we do measure are often subject to noise.
The best we can hope for is to compute the probability that any given candidate function is the true curve.
Using probabilities to quantify uncertainty is known as Bayesian
gppois performs Bayesian uncertainty
analysis for continuous functions.
To find the probability that a given function is the true curve, we ask two questions.
- How well does it fit the data we measured?
- How plausible is it in the first place?
These questions have standard names in Bayesian analysis:
The likelihood depends on the noise model.
The easiest case is i.i.d. Gaussian noise. The computation becomes extremely simple: we only have to invert a matrix to get probabilities for all our curves.
Poisson noise (a.k.a. "counting statistics") is harder: not only is the noise level different at different places, but it depends on the value of the true function! (And if we knew that...) The probabilities are still perfectly computable, but it requires MCMC simulation (and all its associated headaches).
Fortunately, there's a trick which transforms Poisson-noised data into i.i.d. Gaussian data. It's called the Anscombe Transform, and we use it extensively in this package.
To define what makes a function "plausible", we need to state our assumptions.
We prefer robust assumptions -- e.g., the function is continuous, and (in some sense) smooth. But we don't want to tie ourselves to a particular functional form without good reason. How can we assign probabilities to arbitrary curves, without assuming some functional form?
The answer is Gaussian Processes (see our paper, or this excellent freely-available text). We break a function into pieces: a function is simply a collection of values, indexed by a continuous variable. Each of these values has uncertainty, and therefore a probability distribution.
But the individual values only tell half the story. How they relate to each other is the real key.
- Values which are very close (in x) are strongly correlated
- Values which are far apart are practically independent
This gives us smoothness and continuity, without tying us to a particular functional form.
gp stands for Gaussian Processes, and the
pois stands for Poisson noise.
There are several ways to install
gppois. In any case, after installation, it can be used by typing
at the prompt in an
gppois is on CRAN. This makes it very easy to install.
Execute one of the following commands, depending on your preferences:
full install (recommended):
# If you have R 2.15 or newer: install.packages("gppois", dependencies=TRUE) # If you have R 2.14.xx, uncomment and use this instead: #install.packages("gppois", dependencies=c("Depends", "Imports", "LinkingTo", "Suggests")
And you're done!
Alternatively, you can download the package file and install from the command line. Note that you will have to take this approach if you want to install an old version.
devtools / install_github
devtools package includes a function,
install_github, which will grab
and install the latest version from github. I have not tested this myself!
However, it should be fairly straightforward.
Obviously, you will first need to install and load the
Type or copy-paste the following commands into an
# Install the 'devtools' package from CRAN install.packages("devtools") # Load the 'devtools' package for this session library("devtools")
Now you can use
devtools to install
gppois. If you want to use the latest
development version, enter the following command from inside an
install_github(repo="gppois", username="chiphogg", branch="develop")
If you just want the latest stable version, type the following instead:
Note that the latter will almost always be equivalent to the version on CRAN.