Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Truncated RVs #131

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ricardoV94
Copy link
Contributor

@ricardoV94 ricardoV94 commented Apr 6, 2022

Closes #96

@ricardoV94 ricardoV94 force-pushed the truncated_rvs branch 6 times, most recently from d67ee37 to cc14e41 Compare April 6, 2022 16:08
@codecov
Copy link

codecov bot commented Apr 6, 2022

Codecov Report

Base: 96.12% // Head: 96.23% // Increases project coverage by +0.11% 🎉

Coverage data is based on head (5ef727c) compared to base (7890228).
Patch coverage: 98.23% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #131      +/-   ##
==========================================
+ Coverage   96.12%   96.23%   +0.11%     
==========================================
  Files          12       13       +1     
  Lines        1987     2100     +113     
  Branches      241      253      +12     
==========================================
+ Hits         1910     2021     +111     
  Misses         39       39              
- Partials       38       40       +2     
Impacted Files Coverage Δ
aeppl/truncation.py 98.23% <98.23%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

aeppl/truncation.py Outdated Show resolved Hide resolved
@ricardoV94 ricardoV94 force-pushed the truncated_rvs branch 2 times, most recently from b8f681a to 773dc50 Compare April 6, 2022 17:03
@brandonwillard brandonwillard added enhancement New feature or request important This label is used to indicate priority over things not given this label op-probability Involves the implementation of log-probabilities for Aesara `Op`s labels Apr 6, 2022
@ricardoV94 ricardoV94 force-pushed the truncated_rvs branch 2 times, most recently from 982e469 to 77343ca Compare April 11, 2022 14:38
@ricardoV94 ricardoV94 marked this pull request as ready for review April 12, 2022 06:22
@ricardoV94
Copy link
Contributor Author

Okay, this seems to be working reasonably well. Ready for review

@ricardoV94 ricardoV94 force-pushed the truncated_rvs branch 4 times, most recently from ad22781 to 697af2f Compare April 12, 2022 17:43
tests/test_truncation.py Outdated Show resolved Hide resolved
aeppl/truncation.py Show resolved Hide resolved
aeppl/truncation.py Outdated Show resolved Hide resolved
aeppl/truncation.py Outdated Show resolved Hide resolved
Copy link
Member

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great; it only needs to be refactored to use RandomStreams and return updates.

aeppl/truncation.py Outdated Show resolved Hide resolved
aeppl/truncation.py Outdated Show resolved Hide resolved
aeppl/truncation.py Outdated Show resolved Hide resolved
aeppl/truncation.py Outdated Show resolved Hide resolved
@ricardoV94
Copy link
Contributor Author

Added RandomStream as a separate commit for review.

brandonwillard
brandonwillard previously approved these changes Apr 29, 2022
@ricardoV94 ricardoV94 marked this pull request as draft June 16, 2022 12:24
@rlouf
Copy link
Member

rlouf commented Nov 2, 2022

Afair truncation means that you cannot observe any data and that you know that before observation. Censoring is when we don't know the value of every data point, but only that it's greater (or smaller) than a given value.

With that in mind it seems to me that a natural implementation for truncation would be:

import aesara.tensor as at

srng = at.random.RandomStream(0)
x_rv = srng.normal(0, 1)
y = at.clip(x_rv, -1, 1)

Here we are specifying that we know that y can only take values between -1 and 1. To see this, it is enough to consider prior predictive sampling: y can only take values between -1 and 1. In the censored case, the prior predictive distribution should also return values that are out of bounds. I think the censored/truncated interface should be revamped as it currently doesn't reflect the mathematical meaning of these quantities.

at.clip returns its endpoints and doesn't reject the samples so that wouldn't work.

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Nov 2, 2022

Truncated means you cannot observe a value in a certain interval, and you wouldn't know it it occurred. The generative graph involves rejection sampling (if you want to achieve a fixed data size) or subtensoring (if you don't care about a fixed data size).

Censoring means you cannot observe a value in a certain interval but you know when that occurs. The generative process involves clipping or rounding (or any other encoding that loses resolution but not number of datapoints)

There is a good pymc example that explains the difference: https://www.pymc.io/projects/examples/en/latest/generalized_linear_models/GLM-truncated-censored-regression.html

The Stan manual is also nice:
https://mc-stan.org/docs/stan-users-guide/truncated-data.html
https://mc-stan.org/docs/stan-users-guide/censored-data.html

@rlouf
Copy link
Member

rlouf commented Nov 2, 2022

Take prior predictive samples from the code example I shared, where do the values lie and what does this model generate?

Truncation is about the generative model for your data, censoring is about how your data is observed.

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Nov 2, 2022

Your example code generates censored data

@rlouf
Copy link
Member

rlouf commented Nov 2, 2022

To generate censored data you generate from the full distribution (normal(0,1)) and then apply a >= threshold filter to your prior samples to mimic the observation process.

If that's not it, how do you call the process I just described?

@brandonwillard
Copy link
Member

Are you guys essentially talking about how the endpoints of the cliped intervals are/should be handled/understood?

@rlouf
Copy link
Member

rlouf commented Nov 2, 2022

Are you guys essentially talking about how the endpoints of the cliped intervals are/should be handled/understood?

Yes.

In this PR are we talking about truncated random variables or random variables with a truncated distribution? There isn't a 1:1 map between these two concepts. We also call truncated random variable a variable $Y$ that is defined such that $Y=X$ if $X &lt; c$ and $Y=c$ if $X \geq c$ where $c$ a constant. This can be represented by at.clip, assuming in my example x_rv represents a random variable object (and not a distribution).

Regarding censoring, there's an implicit assumption about the meaning of the threshold values when using at.clip that is not clearly stated, and not completely obvious. Saying that the value is equal to the value of the threshold is not the same as saying that it is greater than the threshold, these two correspond to different types. This probably touches on broader design concerns wrt the representation of these objects in the library.

@brandonwillard
Copy link
Member

brandonwillard commented Nov 2, 2022

There's an implicit assumption about the meaning of the bounds when using at.clip that is not clearly stated, and not completely obvious. Saying that the value is equal to the value of the threshold is not the same as saying that it is greater than the threshold, these two correspond to different types. This probably touches on a deeper problem with the representation of these objects in the library.

Yes, there are some fundamental representation issues involved with using existing Aesara Ops for these types of derived measures. Simply put, if samples from the original model graphs (e.g. that use clip or whatever else) don't match the measures/densities we assign them in AePPL, then we have a representation issue that needs to be fixed, and that might be the case with the representations of truncation and/or censoring used here. If so, the easy fix is to create graphs or Ops that explicitly represent truncation and/or censoring. As I recall, that's what was done in here, though.

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Nov 2, 2022

There's an implicit assumption about the meaning of the threshold values when using at.clip that is not clearly stated, and not completely obvious. Saying that the value is equal to the value of the threshold is not the same as saying that it is greater than the threshold, these two correspond to different types. This probably touches on broader design concerns wrt the representation of these objects in the library.

The logprob is correct (hopefully) for a data generative process that uses at.clip. This is somewhat analogous to a censoring process but it's not the only way of obtaining censored data and is also perhaps not the most intuitive. In a recent PR I added censoring by rounding and took away the emphasis of clipping as censoring. Just called it measurable clip or something.

Back to clip, some other packages would use two vectors of data, one with the censored values and another with boolean/ categorical values that indicate whether each value was censored or not. You could try to handle this in aeppl as well.

Those have slightly different meanings at the edges. In the clip case you assume edge values could only have come from censoring for continuous RVs (ignoring precision issues, it's a pdf vs pmf), and a mixture for discrete RVs (IIRC?), whereas for the double vector approach you wouldn't have to assume for continuous RVs or count as a mixture for discrete RVs because the source of the value it's disambiguated by the user. The double vector retains more information about the data generating process.

Anyway, this PR was about truncation not censoring.

@rlouf
Copy link
Member

rlouf commented Nov 2, 2022

Anyway, this PR was about truncation not censoring.

You're right, will open a separate discussion for my remaining questions about censoring.

This is still relevant and important here:

In this PR are we talking about truncated random variables or random variables with a truncated distribution? There isn't a 1:1 map between these two concepts. We also call truncated random variable a variable $Y$ that is defined such that $Y=X$ if $X &lt; c$ and $Y=c$ if $X \geq c$ where $c$ a constant. This can be represented by at.clip, assuming in my example x_rv represents a random variable object (and not a distribution).

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Nov 3, 2022

Anyway, this PR was about truncation not censoring.
This is still relevant and important here:

In this PR are we talking about truncated random variables or random variables with a truncated distribution? There isn't a 1:1 map between these two concepts. We also call truncated random variable a variable $Y$ that is defined such that $Y=X$ if $X &lt; c$ and $Y=c$ if $X \geq c$ where $c$ a constant. This can be represented by at.clip, assuming in my example x_rv represents a random variable object (and not a distribution).

I think that still refers to censoring in the literature, not truncation. It's just one side censoring instead two side censoring. You can obtain those graphs with y = at.clip(-np.inf, c). That's considered in the clip_logprob to generate a more efficient expression. There is also an issue for arbitrary censoring via set_subtensor here that you might be interested in: #177

In general, if you can lose information about the values of your data but not the size of your data you are always talking about censoring.

@rlouf
Copy link
Member

rlouf commented Nov 3, 2022

Does this means this PR is about RVs with a truncated probabilistic distribution then?

@ricardoV94
Copy link
Contributor Author

Does this means this PR is about RVs with a truncated probabilistic distribution then?

I guess so. I am not sure what is the difference between RVs and distributions that you were mentioning above. It's definitely about truncation, not censoring.

@rlouf
Copy link
Member

rlouf commented Nov 3, 2022

Ok, that's all I needed to know, and it's fine as long as it is clearly stated. I believe there is a difference between RVs with a truncated distribution (the sample space is the truncated interval), and a truncated RV which is the product of another RV (who sample space can be the real line) with an indicator function. At the very least, if equivalence there is, it is not obvious.

When the terminology in the literature is so confused, here between measure-theory people and people only working with densities, it is important to be very specific about the relation between our representation and the different mathematical representation. This is not about splitting hair for the sake of it, but rather not ending up with the confusion and ambiguities that PPLs are usually ridden with.

@brandonwillard brandonwillard force-pushed the truncated_rvs branch 3 times, most recently from da00b21 to cee8573 Compare January 18, 2023 19:08
@rlouf
Copy link
Member

rlouf commented Jan 20, 2023

As I expressed above, I have strong reservations when it comes to this PR and the meaning of the operator that is proposed. The problem can be seen clearly when comparing:

import aesara.tensor as at

srng = at.random.RandomStream()

x = at.random.uniform(0, 10, name="x", size=100)
xt, _ = truncate(x, lower=5, upper=15, srng=srng)

to:

import aesara.tensor as at

srng = at.random.RandomStream(0)
x = srng.uniform(0, 10)
xt = at.clip(x)

In the latter case, if we assume that $x$ represents a random variable the code can be correctly read as defining a new random variable $xt$ such that $xt = \operatorname{clip} \circ \operatorname{UniformRV}$. But if we admit this here, we also have to interpret the first snippet as defining a new random variable $xt$ such that $xt = \operatorname{truncate} \circ \operatorname{UniformRV}$. This is obviously not correct, and that is because $\operatorname{truncate}$ operates on measures (if we follow the standard definition).

I think that the abstractions that are present today in Aesara do not allow us to work on measure transformations since measures are not first-class citizens, but are present only implicitly.

Here is a modest proposal to alleviate these limitations. In this proposal we represent e.g. the measure with a standard normal density wrt the Lebesgue measure as

x_m = normal(0, 1)

the $\operatorname{truncate}$ operator acts on measures, so we can defined the truncated measure as

x_t = truncate(normal(0, 1), lower=0)

to bridge the gap with the existing RandomVariable API in Aesara, we define the $\operatorname{sample}$ operator which takes a measure and returns an independent random variable with this measure each time it is called:

x_rv = sample(rng, normal(0, 1))

here we retain the existing ambiguity there is in Aesara between random variables and elements of $E$ (which does not matter mathematically). The RandomStream interface can be retained with

srng = at.random.RandomStream(0)
x_rv = srng.sample(normal(0, 1))

So the first code snippet above could be rewritten:

import aesara.tensor as at
import aesara.tensor.random as ar

srng = ar.RandomStream(0)
x_m = truncate(ar.uniform(0, 10), lower=5, upper=15)
xt = srng.sample(x_m)

This interface has a few nice incidental properties:

  • The size argument gets a new interpretation. Doing x_rv = srng.sample(normal(0, 1), size=(10,)) means that we are defining 10 independent random variables with the same associated measure normal(0, 1) (we should probably rename it to iid or something similar);
  • Provided we can make the RandomStream interface work this way, it would be much easier for one to add a custom measure and use it with this interface;
  • I have a feeling this would make it easy to go the "explicitly representing rng key advancement" way if we ever choose to.

I can write an extended API proposal if you think this makes sense, and try to work out the potential challenges linked to this representation of probabilistic objects in Aesara and AePPL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request important This label is used to indicate priority over things not given this label op-probability Involves the implementation of log-probabilities for Aesara `Op`s
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Truncated RVs
3 participants