Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue/838 hmm #840

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Issue/838 hmm #840

wants to merge 7 commits into from

Conversation

charlesm93
Copy link
Contributor

Submission Checklist

  • Builds locally
  • New functions marked with <<{ since VERSION }>>
  • Declare copyright holder and open-source license: see below

Summary

Addresses issue #838 and updates user-doc on HMMs.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Charles Margossian, Simons Foundation

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@bob-carpenter
Copy link
Member

I took a look through this and while I'm in favor of minimal examples, this one's a bit too minimal. I would really like to see the model laid out in more detail than the conditional distributions. Here are some concrete suggestions:

  1. Use z for latent discrete parameters. We already use x for covariates, so it's confusing to use x. I've used z elsewhere in the user manual for this in the latent discrete parameter chapter.

  2. Define the complete data likelihood p ( y , z | ϕ ) = p ( z ϕ ) p ( y z , ϕ ) and then the marginalization p ( y ϕ ) that we actually fit.

  3. Mention that ϕ is actually several parameters.

  4. I find the constrained version of the transition matrix where you insert zeros makes this first example too challenging. If you want to discuss that case, I'd suggest another section after this one where you talk about imposing structural zeros. Otherwise it makes the simple case seem too complicated. And then you can just define

array[3] simplex[3] gamma_arr;

matrix[3, 3] gamma;
for (n in 1:3) gamma[n] = gamma_arr[n];
  1. For the doc, those mu and sigma are not just the measurement model---that's all the error terms.

  2. Wherever you have repetition, use loops. It's less error prone and more clear that it's a homogeneous operation:

for (n in 1:N) {
  for (k in 1:3) {
    log_omega[k, n] = normal_lpdf(y[n] | mu[k], sigma);
  }
}
  1. Given that you're tying the parameter sigma across outputs, you need to mention that. I'd recommend just keeping this simple with a vector of sigma values.

  2. You can't just say "computes the relevant log marginal distribution"---you have to say what that is. I don't mean including the marginalization algorithm, I just mean as I wrote it above.

  3. For more details, since -> For more details, see. Also, I wouldn't say "corresponding case study", I'd just say it's a case study on HMMs. And then you should put it in the bibtex file and cite it properly with reference to Ben (the author). And you should cite that you "borrowed" the example from Ben Bales's case study.

@charlesm93
Copy link
Contributor Author

I'm rewriting the code to make the transition matrix less constrained (per comment 4) and I wanted to check: we don't have a stochastic matrix type, right?

@WardBrian
Copy link
Member

We have row and column but not double yet https://mc-stan.org/docs/reference-manual/types.html#stochastic-matrices

@charlesm93
Copy link
Contributor Author

@bob-carpenter I implemented your feedback.

Some questions:

  1. What exactly is the difference between a measurement and an error model? I'm ok to use either term, and I know they have different conceptual implication, but I'm wondering if they have a formal definition.

  2. I'm keeping sigma a scalar. I'm not sure it's simpler conceptually to pass it as a vector.

@bob-carpenter
Copy link
Member

For (5), I think the idea's that there are three sources of error: measurement error, modeling error, and sampling error. For example, sampling error arises when you subsample a population and use that for estimation. You get modeling error if you use a linear regression for a relationship that's not linear or use normal errors when the errors are skewed, and so on. If you're weighing things with a scale and you know the scale's biased to the high side, you can correct that measurement error. You can explicitly add a measurement error model if you know your measurement model (e.g., gravitational lensing is part of the measurement error model; your work with Bruno et al. on deconvolving galactic dust is part of the measurement model for the CMB, etc.).

Copy link
Member

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a ton of little things to fix, but nothing major.

@@ -294,6 +294,21 @@ pagetitle: Alphabetical Index
- <div class='index-container'>[distribution statement](unbounded_discrete_distributions.qmd#index-entry-0c7465aa1beceb6e7e303af36b60e2b847fc562a) <span class='detail'>(unbounded_discrete_distributions.html)</span></div>


<a id='beta_neg_binomial_cdf' href='#beta_neg_binomial_cdf' class='anchored unlink'>**beta_neg_binomial_cdf**:</a>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this PR touching negative binomial? Are you up to date with the main branch?

latent state $k$ is parameterized by a $V$-simplex $\phi_k$. The
observed output $y_t$ at time $t$ is generated based on the hidden
state indicator $z_t$ at time $t$,
When $z_{1:N}$ is continuous, the user can explicitly encode these distributions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't z_{1:N} a subset of { 1 , . . . , K } N when K is the number of states? I don't think it can be continuous.

Did you perhaps mean to say that the output y 1 , , y T can be continuous? It can also be mixed discrete and continuous, it can be multivariate, etc.---it can really be anything.

Next, we introduce the $K \times K$ transition matrix, $\Gamma$, with
$$
\Gamma_{ij} = p(z_n = j \mid z_{n - 1} = i, \phi).
$$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to mention that this is a right-stochastic matrix, where we now have that data type built in.

$$
Finally, we define the initial state $K$-vector $\rho$, with
$$
\rho_k = p(z_0 = k \mid \phi).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention at this point that it is common to take ρ to be the stationary distribution of the HMM. Many package like movementHMM in R let you set it to the stationary distribution by default. You can mention that the stationary distribution is the first eigenvector (that's why they called Google's PageRank algorithm, which computed the stationary distribution of their random web walking model using an iterative algorithm, the "billion-dollar eigenvector"). It's the solution for π in Γ π = π for left-stochastic transition matrix Γ .

In the situation where the hidden states are known, the following
naive model can be used to fit the parameters $\theta$ and $\phi$.
As an example, consider a three-state model, with $K = 1, 2, 3$.
The observations are normally distributed with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K is an integer, so I think that should just be K = 3 .

The model for the supervised data does not change; the unsupervised
data are handled with the following Stan implementation of the forward
algorithm.
The last function `hmm_marginal` takes in all the ingredients of the HMM,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminate the comma---English doesn't use commas between conjunctions unless there are more than two. So it's just "A and B", but it's either "A, B, and C" (Oxford style) or "A, B and C" (defective American style).

different from sample to sample in the posterior.
To obtain samples from the posterior distribution of $z$,
we use the generated quantities block and draw, for each sample $\phi$,
a sample from $p(z \mid y, \phi)$.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't typically need draws from z . If you do need statistics involving z , it's often easier to compute them in expectation (i.e., take advantage of the Rao-Blackwellization you get from marginalizing). For example, if there are low probability but non-zero z , they're very hard to evaluate with sampling. There are examples in the latent discrete parameters chapter.

You can also use the forward-backward algorithm to work out marginally p ( z n y ) and you can use Viterbi with backtracking to find the most probable z given y.

and with the draw in generated quantities, we obtain draws from
$p(\phi \mid y) p(z \mid y, \phi) = p(z, \phi \mid y)$.
It is also possible to compute the posterior probbability of each hidden state,
that is $\text{Pr}(z_n = k \mid \phi, y)$.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically this phi is marginalized out and we look at p ( z n = k | y ) .

This function cannot be used to compute the joint probability
$\text{Pr}(z \mid \phi, y)$, because such calculation requires accounting
for the posterior correlation between the different components of $z$.
Therefore, `hidden_probs` should NOT be used to obtain posterior samples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOT -> not (that'll make it italics, which is the standard way to add emphasis to typeset text).

This point is a bit more subtle, though. You can use this to obtain posterior draws of the marginals z n p ( z n y ) , but if you put a sequence of draws from p ( z 1 y ) , , p ( z N y ) , you don't get a draw of p ( z y ) .

$\text{Pr}(z \mid \phi, y)$, because such calculation requires accounting
for the posterior correlation between the different components of $z$.
Therefore, `hidden_probs` should NOT be used to obtain posterior samples.
Instead, users should rely on `hmm_latent_rng`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although not necessary, it's nice to have examples of how to do that. Like:

generated quantities {
   array[N] int<lower=1, upper=K> z = hmm_latent_rng(...fill-in params here to match example...);
}

I know this feels obvious, but it can be hard for the users to put the arguments and result types together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants