Sprint20 Add Random Variables (Using Abstract Syntax Tree) #243

nabriis · 2023-05-29T21:33:31Z

closes #202, #85, #29

Related to #250

Tracking of final changes to PR

17/11/2023

Consider making name of random variable different than parameter name.
Update logic in to_cuqi_format (removed again)
Explain condition method and gradient method
Explain geometry and dim etc.

18/11/2023

Consider input name to JointDistribution and BayesianProblem (components)
Consider if force_ndarray should detect distribution and give useful error message (potential common mistake), as well as Model.forward(Distribution)
Consdier if it should be par_name for Distribution. Maybe another name?
Disable condition related properties of RVs for now.
Remove __call__ method of Distribution. Rename to e.g. fix or given
Update demo files? (Moved to legacy folder with readme)
Check if text descriptions near updates from Dist to RV should be added (e.g. if we write stuff like. Define distribution, when we actually now define RV).

24/11/2023

Whiteboard meeting on RV API

nabriis · 2023-06-03T10:50:29Z

Hi @amal-ghamdi and @jakobsj. Following the seminar I have now updated the tutorial. Feel free to review

nabriis · 2023-06-14T11:31:31Z

Minor issue if you do

A = cuqi.model.LinearModel(lambda x: x, lambda y: y, 1, 1))
x = cuqi.distribution.Gaussian(0,1)
y = x+1
A@y # returns x+1

This is because A evaluates the lambda expression (which is identity) and thus simply returns y. This behavior is unexpected and LinearModel should have special behavior implemented for RandomVariable class.

EDIT: now fixed.

jakobsj

Quick approval. Few tiny comments inline. Looks really good, exciting new functionality. Merge pending more detailed review and approval from @amal-ghamdi

cuqi/distribution/_cmrf.py

cuqi/distribution/_gmrf.py

cuqi/utilities/_utilities.py

demos/tutorials/RandomVariables.py

amal-ghamdi

Thank you @nabriis for this rich feature. Really impressive work you have done here that I enjoyed reviewing and learning from!.

I have some comments here and there for your consideration.

cuqi/distribution/_cmrf.py

cuqi/distribution/_distribution.py

cuqi/distribution/_gamma.py

cuqi/distribution/_gmrf.py

cuqi/randomvariable/_randomvariable.py

nabriis · 2023-08-15T23:19:38Z

Need to decide if .rv should even be exposed at this point in time.

Co-authored-by: amal-ghamdi <amal.m.alghamdi@gmail.com>

nabriis · 2023-11-18T21:30:50Z

Hello @jakobsj, @amal-ghamdi, @chaozg. I updated the code according to Jakobs comments. I also reverted the to_cuqi_format change, which greatly simplified the additional code to be reviewed (e.g. changes to specific distributions etc.).

…ibution to RandomVariable

jakobsj

Thank you @nabriis, I've run through the tutorial again and really like the updates you have made. Most points are now addressed. Running through did result in a number of new questions/points, not sure if bugs, please see specific comments. I also resolved many of the conversations. I will take a look at the remaining ones. I haven't got more time today so I think it makes sense if you can address this round of comments, and any from others, then after that I can take a look again, including at other parts of the code.

demos/tutorials/RandomVariables.py

jakobsj · 2023-11-20T16:40:15Z

I have gone through another time and resolved conversations that had been addressed. There are four old ones, I think all having to do with if_dist_force_rv. I think this functionality has changed back and forth to to_cuqi_array or similar, and I'd like to understand these changes and behavior better before resolving these.

…ments

Co-authored-by: Jakob Sauer Jørgensen <jakobsj@users.noreply.github.com>

nabriis · 2023-11-20T21:16:47Z

I have gone through another time and resolved conversations that had been addressed. There are four old ones, I think all having to do with if_dist_force_rv. I think this functionality has changed back and forth to to_cuqi_array or similar, and I'd like to understand these changes and behavior better before resolving these.

Thanks @jakobsj. I updated once again. In relation to the force_ndarray and to_cuqi_format these methods are used to have inputs to distributions conform to the expected format. After removing to_cuqi_format again, we are currently only forcing ndarray in scalar-type inputs. For example if the user creates a Gaussian like this:

X = Gaussian(0, 1)

both the mean and covariance will become ndarrays.

The reason to generalize to to_cuqi_format would be to add e.g. error handling for common pitfalls. For example:

# A Gaussian distribution with conditioning variable x (from cov) is created in line two:
x = Gaussian(0, 1).rv
y = Gaussian(0, x).rv

The above is as expected. However, in the following case we get odd behavior that we would like to catch

# A Gaussian distribution with a ??Gaussian distribution?? as cov is created in line two:
x = Gaussian(0, 1) # Forgot to put .rv
y = Gaussian(0, x).rv

This behavior was also present in current CUQIpy. We could catch this mistake by parsing inputs and if they were a Distribution object, throw error with helpful message asking if the user forgot .rv using a more general method named e.g. to_cuqi_format or parse_input or cuqify_input

Originally I had a if_dist_force_rv method that called .rv on distributions given as input to other distributions, but I now rather think we should throw an error with a helpful message instead.

amal-ghamdi

Thank you @nabriis again for this carefully and creatively designed PR, powerful and impressive work!

I left some comments throughout, feel free to address or immediately resolve those as it makes sense to you.

I have few general thoughts for your consideration:

I thought we can make par_name more specific and clear if we call it rv_name, if that is applicable in all cases.
I would assume we need issues to update other demo/plugin repos to address this change.

Well done!

demos/tutorials/Gibbs.py

amal-ghamdi · 2023-11-20T20:35:03Z

cuqi/distribution/_distribution.py

+    # The following methods define algebraic operations on distributions
+    # The return type is a RandomVariable instance


are these two comments outdated? if not, it seems a bit unclear

Good catch! Removed.

amal-ghamdi · 2023-11-20T20:46:41Z

cuqi/likelihood/_likelihood.py

+    @property
+    def _par_name(self):
+        """ Return name of likelihood """
+        return self.distribution._par_name
+
+    @_par_name.setter
+    def _par_name(self, value):
+        self.distribution._par_name = value


just curious if the property par_name is sufficient here? May be I am missing why we need property _par_name

Good catch. This is now removed and it helped simplify some checks in RV class, which checked for the private property _par_name where it should not have done so.

amal-ghamdi · 2023-11-20T20:49:44Z

cuqi/likelihood/_likelihood.py

-            self._name = _get_python_variable_name(self)
-        return self._name
+        if self._par_name is None:
+            self._par_name = _get_python_variable_name(self) # TODO


we have an issue #184 about refactoring UserDefinedLikelihood. I am not sure what the TODO here refer to but thought may be it can be added to that issue?

Its related. Currently the UserDefinedLikelihood can infer python variable names where normally after this change it was supposed to only be RV class that does that. However, we have some use cases, where users define their UserDefinedLikelihood using e.g.
y = UserDefinedLikelihood(...)

Assuming it gets the name y. I think we can add this info into the issue you mentioned.

cuqi/density/_density.py

amal-ghamdi · 2023-11-22T07:29:32Z

cuqi/randomvariable/_randomvariable.py

+    def get_conditioning_variables(self):
+        """ Get conditioning variables. """
+        if self.is_transformed:
+            raise NotImplementedError("Extracting conditioning variables is not implemented for transformed random variables")


Do you see this as high priority issue or not necessarily? if so, we can create an issue to address it.

I believe we should generate a tracking issue to track all of the features we have disabled and aim to create in the future for RV class. There are more than thus, but if we create the issue we can add bullets to it.

amal-ghamdi · 2023-11-22T07:38:37Z

cuqi/randomvariable/_randomvariable.py

+        new_variable._distributions = copy(self.distributions)
+        new_variable._tree = copy(self._tree)


are these two lines needed? Don’t they happen automatically after first line? not sure though.

Not, it is a shallow copy. So copying here ensures we get a new pointer to the distributions, meaning if we change internal parameters it only affects this new copied RV.

Think of it as a 2-level deep copy.

amal-ghamdi · 2023-11-22T09:36:20Z

demos/tutorials/RandomVariables.py

+# Providing data to the Bayesian problem creates a posterior as shown.
+#
+
+BP.set_data(y=y_obs)


Just a comment here, the resulting distribution BP is JointDistribution. I think this is expected because our posterior is very specific and not general to handle this case. Do we need/have an issue for that?

I can't recall. But yes its not a "Posterior" class, but a JointDistribution representing a posterior over joint set of variables

amal-ghamdi · 2023-11-22T11:01:25Z

demos/tutorials/RandomVariables.py

I experimented a bit with the rvs, and one scenario in inferring the distribution name came to mind (below). Depending on the flag, z2 name will be different. Is that something we want to capture in an issue? Also, is this is something we should prevent by not allowing creating more than one rv from a distribution (kind of 1 to 1 association in terms of class relationship).

Z_s = cuqi.distribution.Gaussian(0, lambda s: s) Z = Z_s(s=3) z = Z.rv flag = False if flag: print(z.name) z2 = Z.rv print(z2.name)

Good point here. I actually updated now so that .rv creates a copy of the distribution and does NOT introduce side effects to the original variables. Thus Z_s and Z in the above example does not get their parameter name updated. Instead its updated in z.dist.par_name.

amal-ghamdi · 2023-11-22T11:05:22Z

demos/tutorials/RandomVariables.py

+
+# %%
+######################################################################
+# Direct evaluation


May be we can clarify to the user here or in some other place in the tutorial that using () operator means different things when we talk about X and x, X(1) is logpdf and X.rv(1) is the evaluation as described here.

jakobsj · 2023-11-22T11:25:35Z

I believe I have now understood the points about forcing rv, etc, and I agree that in the spirit of being clear about when something is an RV and a distb respectively, we should not convert automatically between the two behind the scenes. Instead, as you suggest suitable error messages should be given when a user tries to mix.

jakobsj · 2023-11-22T11:32:42Z

I have spent the morning thinking more about distributions, random variables and the "call" syntax. I am not sure if this is the best place to capture these, but for now I will put them here. Please note, that I honestly feel the random variable is an amazing addition, and I am only giving it so much scrutiny, because it introduces such a fundamental change in the usage of CUQIpy, that we want to make sure everything remains (or becomes!) consistent.

This will be the first part, outlining some thoughts first on "call" syntax with only distributions, followed by using only random variables. I expect to follow up later (but out of time for now), with additional thoughts on potential cases of combining the two. Sorry if it reads slightly rambling at times, but I thought I'd sent it now, rather than let it sit until I have time to streamline it more.

Only distributions

Before introduction of RVs, we have distributions

X = Gaussian(0,1)

One can do

X(4)
EvaluatedDensity([-8.91893853])

in which the “call” syntax amounts to returning the scalar value of the pdf at the input value.

We can have a “simple” conditional distribution but eg omitting an input

Y = Gaussian(0)
Y(2) # or Y(cov=2)
CUQI Gaussian.

In which the “call” syntax amounts to conditioning on cov=2, and returning the resulting (unconditional) distribution.

Similarly using lambda function can set up more involved conditional distributions such as
Z = Gaussian(lambda t: 1/t)
Z(4)
CUQI Gaussian.

Two issues here:

The “call” syntax means two different things, depending on whether used with an unconditional or a conditional distribution. In the extreme, one can in fact employ both in a single call
Y(4)(3)
which first conditions on cov=4, then evaluates the pdf at 3. If one knows what is going on, this can be understood, but is probably very confusing to users.
For the conditional distribution with lambda function, it is unclear what
Z(4)
refers to, cov or t. Turns out one can in fact call with either! Followed by evaluation, we can see we get different results:
Z(cov=4)(0)
EvaluatedDensity([-1.61208571])

Z(t=4)(0)
EvaluatedDensity([-0.22579135])

and doing this without keyword we see that Z(4) corresponds to Z(t=4):
Z(4)(0)
EvaluatedDensity([-0.22579135])
as we get the same value.

It surprises me that using “cov” is permitted, when we have explicitly made Y conditional on t, instead of cov. Is that what we want?

Ideas to consider here: Replace both call syntax by more explicit syntax. And consider disallowing the usage of “cov” in conditioning when lambda function is used. And/or disallow positional inputs, allowing only keyword based for clarity.

Only random variables:

Introducing random variables (RVs):

X = Gaussian(0, 1)
x = X.rv

we can now use “call” syntax (of the RV) to do
x(4) # or x(x=4)
producing
4
because this is “direct evaluation”, meaning we fix x at 4, which means the value of x is indeed 4, ignoring that it also happens to be a Gaussian random variable.

For a transformed RV, eg.
y = (x + 10)**2
using the “call” syntax is consistent with this (and a bit clearer), since it also simply fixes the value of x, e.g.,
y(x=3)
produces 169, again ignoring that x is a Gaussian random variable.

Conceptually, this appears very similar to conditioning for distributions, in the sense that we fix one and given that evaluate the other. Except the transformed variable here is doing algebraic functions, not feeding through a distribution. But we do want to use RVs to specify relations between Rvs/distributions, not just algebraic manipulations. So we want to do eg
a = Gaussian(0,1).rv
b = Gaussian(0, 1/a).rv
which makes both RVs:
a ~ CUQI Gaussian.
b ~ CUQI Gaussian. Conditioning variables ['a'].
What happens if we use call syntax?
a(5) # or a(a=5)
5 which is consistent with the x above.
Same for b, which completely disregards that it is conditional on a:
b(5) # or b(b=5):
5
We get an error if we try
b(a=5)
ValueError: Expected arguments ['b'], got arguments {'a': 5}
So we cannot use “call” syntax to “directly evaluate” b for a fixed value of a, as in the case of y given a value of x above. This is probably because y only used algebraic operations, while b is a conditional distribution w.r.t. a. Currently, one then needs to do
b.condition(a=5)
yielding
b ~ CUQI Gaussian.
So now it has been conditioned on a. Maybe this usage is indeed not “direct evaluation”, since we end up with a fully specified distribution, not a value as in direct evaluation. But in general, it seems a subtle difference, and I could imagine more involved combinations of algebraic expressions and conditional distribution RVs for defining an RV. As well as wanting to call b.sample() and expect it to draw a sample from a, then use that in b. That gives an error at present, but mathematically nothing prevents us from expecting this to work.

An additional aspect is that before conditioning we have
print(b)
b ~ CUQI Gaussian. Conditioning variables ['a'].
but above, and also if we assign to, say c, we still get “b” printed:
c = b.condition(a=5)
print(c)
b ~ CUQI Gaussian.

Co-authored-by: amal-ghamdi <amal.m.alghamdi@gmail.com>

…lgebra

Co-authored-by: amal-ghamdi <amal.m.alghamdi@gmail.com>

…hat is not the original one prior to conditioning

amal-ghamdi requested review from amal-ghamdi and jakobsj May 31, 2023 07:24

nabriis force-pushed the sprint20_randomvariables_ast branch from 3cbec6e to 99974ac Compare June 3, 2023 07:36

jakobsj approved these changes Jun 23, 2023

View reviewed changes

amal-ghamdi reviewed Jul 3, 2023

View reviewed changes

nabriis added 10 commits August 16, 2023 01:05

Initial version of random variables (still needs some iteration)

0d2db47

Fix confict in imports

746a6f8

Add tests to random variable

0eac912

Add tests on distribution conditioning

ccb745f

Add forcing method from dist to rv in most distributions

0b5bb9a

Add basic abstract syntax tree

44a9590

Add initial version of AST to allow algebra between random variables

1564add

Add sample method

b194c6f

Add self contained ordered set

1db371a

Fix Model not working with RandomVariable class

3d73813

nabriis force-pushed the sprint20_randomvariables_ast branch from 6ab12d9 to 3d73813 Compare August 15, 2023 23:11

nabriis added 3 commits August 16, 2023 01:13

Removed unused import

c982ada

Remove utf issue characters

a453787

Make as random variable private

650919a

nabriis and others added 8 commits August 16, 2023 23:32

Update force ndarray to a more general cuqi format converter

9f490d9

Simplify wording

2948ac8

Co-authored-by: amal-ghamdi <amal.m.alghamdi@gmail.com>

Make wording more clear

a8ad74f

Co-authored-by: amal-ghamdi <amal.m.alghamdi@gmail.com>

Clarify definition of set

20700f0

Add docstrings to ast

917727e

Add further documentation on ast for RVs

89e335c

Explain index

55ea49f

Add comment explaining use of ordered set

0391b4e

nabriis added 2 commits November 18, 2023 22:19

Revert to_cuqi_format back to force_ndarray

7583139

update deconvolution example

bc7cb18

nabriis requested a review from jakobsj November 18, 2023 21:29

nabriis added 2 commits November 19, 2023 10:12

Update text descriptions near .rv method to reflect update from Distr…

7b8dfb7

…ibution to RandomVariable

Move demo files to legacy folder

4ebce0b

jakobsj requested changes Nov 20, 2023

View reviewed changes

nabriis and others added 4 commits November 20, 2023 21:35

Add one more condition to check if RV is evaluated with too many argu…

d02626e

…ments

Apply suggestions from jakj code review on RV tutorial

4d93e45

Co-authored-by: Jakob Sauer Jørgensen <jakobsj@users.noreply.github.com>

Update parameter for conditioning example in RV tutorial

60bc852

Set name to underlying distribution if extracting name from RV

c756eb5

nabriis requested a review from jakobsj November 20, 2023 21:20

amal-ghamdi reviewed Nov 22, 2023

View reviewed changes

nabriis and others added 13 commits November 22, 2023 22:04

Update wording in Gibbs tutorial

3e63151

Co-authored-by: amal-ghamdi <amal.m.alghamdi@gmail.com>

Update error message for conditioned density

b33fbba

Co-authored-by: amal-ghamdi <amal.m.alghamdi@gmail.com>

Update wording in deconvolution howto

1d358a6

Co-authored-by: amal-ghamdi <amal.m.alghamdi@gmail.com>

distinguish from logical negation in NegateNode

9fafa02

Co-authored-by: amal-ghamdi <amal.m.alghamdi@gmail.com>

Update name consistancy check for density

8b5e4e4

Ensure .rv method creates copy of distribution to avoid side effects

4ed3200

Removed unnecceary code comment

fc040f6

Remove some technical debt related to _par_name checking in RV class

99152a1

Update RV docs to reflect distributions no longer creating RVs from a…

fbf1ea8

…lgebra

Update cuqi/randomvariable/_randomvariable.py

67303a2

Co-authored-by: amal-ghamdi <amal.m.alghamdi@gmail.com>

Update error message when attemptign to set name of random variable t…

4ad59d0

…hat is not the original one prior to conditioning

Fix issue with order of name inference

bf32c18

Update formating of tutorial for RV

8ab8323

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sprint20 Add Random Variables (Using Abstract Syntax Tree) #243

Sprint20 Add Random Variables (Using Abstract Syntax Tree) #243

nabriis commented May 29, 2023 •

edited

Loading

nabriis commented Jun 3, 2023

nabriis commented Jun 14, 2023 •

edited

Loading

jakobsj left a comment

amal-ghamdi left a comment

nabriis commented Aug 15, 2023

nabriis commented Nov 18, 2023

jakobsj left a comment

jakobsj commented Nov 20, 2023

nabriis commented Nov 20, 2023 •

edited

Loading

amal-ghamdi left a comment

amal-ghamdi Nov 20, 2023

nabriis Nov 22, 2023

amal-ghamdi Nov 20, 2023

nabriis Nov 22, 2023

amal-ghamdi Nov 20, 2023

nabriis Nov 22, 2023

amal-ghamdi Nov 22, 2023

nabriis Nov 22, 2023

amal-ghamdi Nov 22, 2023

nabriis Nov 22, 2023

nabriis Nov 22, 2023

amal-ghamdi Nov 22, 2023

nabriis Nov 22, 2023

amal-ghamdi Nov 22, 2023

nabriis Nov 22, 2023

amal-ghamdi Nov 22, 2023

jakobsj commented Nov 22, 2023

jakobsj commented Nov 22, 2023

		# The following methods define algebraic operations on distributions
		# The return type is a RandomVariable instance

		new_variable._distributions = copy(self.distributions)
		new_variable._tree = copy(self._tree)

Sprint20 Add Random Variables (Using Abstract Syntax Tree) #243

Are you sure you want to change the base?

Sprint20 Add Random Variables (Using Abstract Syntax Tree) #243

Conversation

nabriis commented May 29, 2023 • edited Loading

Tracking of final changes to PR

nabriis commented Jun 3, 2023

nabriis commented Jun 14, 2023 • edited Loading

jakobsj left a comment

Choose a reason for hiding this comment

amal-ghamdi left a comment

Choose a reason for hiding this comment

nabriis commented Aug 15, 2023

nabriis commented Nov 18, 2023

jakobsj left a comment

Choose a reason for hiding this comment

jakobsj commented Nov 20, 2023

nabriis commented Nov 20, 2023 • edited Loading

amal-ghamdi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakobsj commented Nov 22, 2023

jakobsj commented Nov 22, 2023

nabriis commented May 29, 2023 •

edited

Loading

nabriis commented Jun 14, 2023 •

edited

Loading

nabriis commented Nov 20, 2023 •

edited

Loading