Skip to content

Conversation

@rlouf
Copy link
Member

@rlouf rlouf commented Sep 29, 2021

Adds the dual averaging algorithm that is commonly used to adapt the step size of HMC algorithms. Closes #34.

This one is completely independent from the rest of the code so I'll be working on it while we solve the current issues with HMC and NUTS.

Reference: http://webdoc.sub.gwdg.de/ebook/serien/e/CORE/dp2005_67.pdf

@rlouf rlouf added the enhancement New feature or request label Sep 29, 2021
@rlouf rlouf self-assigned this Sep 29, 2021
@rlouf
Copy link
Member Author

rlouf commented Sep 29, 2021

@brandonwillard As you can see in test_algorithm.py I have to set the type of x_init to float64. If I set it to float32 I get an error because the scan upcasts the parameters.

This also happens if I set aesara.config.floatX='float32'. It is not the first time I run into this upcast annoyance, and I am wondering if that is something you would be willing to address aesara-side?

@codecov
Copy link

codecov bot commented Sep 29, 2021

Codecov Report

Merging #35 (fa43113) into main (0d1d7c1) will not change coverage.
The diff coverage is 100.00%.

❗ Current head fa43113 differs from pull request most recent head 416ba7d. Consider uploading reports for the commit 416ba7d to get more accurate results
Impacted file tree graph

@@            Coverage Diff            @@
##              main       #35   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            9        10    +1     
  Lines          345       361   +16     
  Branches        14        14           
=========================================
+ Hits           345       361   +16     
Impacted Files Coverage Δ
aehmc/algorithms.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d1d7c1...416ba7d. Read the comment docs.

@brandonwillard
Copy link
Member

This also happens if I set aesara.config.floatX='float32'. It is not the first time I run into this upcast annoyance, and I am wondering if that is something you would be willing to address aesara-side?

We can definitely address this; however, I first need to understand the entire context of the upcasting. In general, there are some casting configuration options (e.g. cast_policy) and defaults that we need to revisit.

brandonwillard
brandonwillard previously approved these changes Sep 29, 2021
gradient = aesara.grad(value, x)
return update(gradient, step, x, x_avg, gradient_avg)

x_init = at.as_tensor(0.0, dtype="float64")
Copy link
Member

@brandonwillard brandonwillard Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fail if you use aesara.config.floatX?

I'm guessing that the (CI and local) tests are running with aesara.config.floatX set to "float64", so that's why an explicit "float64" is fine, but setting this to aesara.config.floatX and changing that config option to "float32" (before loading Aesara) should not cause a problem.

If that's the upcasting issue you described, then, yes, that sounds like an Aesara issue. Just don't forget to set that option before loading Aesara; otherwise, it won't be active and you will get confusing casting issues. In other words, that option can't be properly changed after loading Aesara, because sometimes that value is used during class/type/object creation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set aesara.config.floatX = "float32" right after the imports I still get an error if I don't specify x_init's type or set it to float32.

But I’m not sure what pytest does with the code so I'll try in a stand-alone script.

Copy link
Member

@brandonwillard brandonwillard Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try it with a conftest.py setup like Aesara's, but with floatX set, of course, or try the same with the AESARA_FLAGS env variable. That will make sure the setting is available before the imports.

Copy link
Member Author

@rlouf rlouf Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the conftest.py file with floatX set to float32; I checked that print(aesara.config.floatX) in the test_dual_averaging function returns float32. However the test still fails with the following error:

ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (`outputs_info` in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32, while the result of the inner function (`fn`) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this related question with a simpler example on SO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just provided an answer to that question.

@rlouf
Copy link
Member Author

rlouf commented Oct 4, 2021

The type error has nothing to do with scan; it is a result of aesara's default behavior. I don't know if it is a good or a bad thing, but it is surprising. The following piece of code:

t0 = 10
step = at.as_tensor(0, dtype="int32")
eta = 1.0 / (step + t0)
print(eta.dtype)

will return float64, even if floatX is set to float32.

On the other hand the following code

```python
t0 = 10
step = at.as_tensor(0, dtype="int32")
eta = (1.0 / (step + t0)).astype("floatX")
print(eta.dtype)

will return whatever floatX is. I would expect eta to be cast to floatX in the first example.

Another thing that I find very confusing is that

x_init = at.as_tensor(1.0)
print(x_init.dtype)

will return float32 regardless of the value of the floatX flag.

@rlouf
Copy link
Member Author

rlouf commented Oct 4, 2021

I ended up adding astype("floatX") wherever needed. Good to merge if the tests pass.

@rlouf rlouf force-pushed the dual-averaging branch 2 times, most recently from eb475b6 to 7e1635e Compare October 4, 2021 14:44
@rlouf rlouf force-pushed the dual-averaging branch 3 times, most recently from fa43113 to ce59d26 Compare October 4, 2021 15:36
@brandonwillard
Copy link
Member

brandonwillard commented Oct 4, 2021

The type error has nothing to do with scan; it is a result of aesara's default behavior. I don't know if it is a good or a bad thing, but it is surprising. The following piece of code:

t0 = 10
step = at.as_tensor(0, dtype="int32")
eta = 1.0 / (step + t0)
print(eta.dtype)

will return float64, even if floatX is set to float32.

In this instance, 1.0 is a floating point number and the denominator is an integer (possibly int64), so, according to the casting/promotion rules, Aesara will promote the result to a "larger" floating point type—regardless of aesara.config.floatX.

NumPy has the same behavior:

import numpy as np


np.dtype(np.array(1.0, dtype=np.float32) / np.array(10, dtype=np.int32))
# dtype('float64')

Another thing that I find very confusing is that

x_init = at.as_tensor(1.0)
print(x_init.dtype)

will return float32 regardless of the value of the floatX flag.

I'm not seeing that locally:

import os

os.environ["AESARA_FLAGS"] = "floatX=float32"

import aesara.tensor as at


assert aesara.config.floatX == "float32"

x_init = at.as_tensor(1.0)
x_init.dtype
# 'float32'

Copy link
Member

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we need to go along with the configured Aesara upcasting/promotion rules, and this often necessitates the use of dtypes from user-provided graphs. I'll try running this again locally to see where that could be done.

@rlouf
Copy link
Member Author

rlouf commented Oct 5, 2021

Ok, that makes a lot more sense. I'll try to see how we could use the dtypes from the graphs the user provides.

As for the previous example, I get the same thing as you do on my machine. The following is confusing though:

import os

os.environ["AESARA_FLAGS"] = "floatX=float64"

import aesara
import aesara.tensor as at


assert aesara.config.floatX == "float64"

x_init = at.as_tensor(1.0)
print(x_init.dtype)
# 'float32'

@rlouf rlouf merged commit 9011be2 into main Oct 14, 2021
@rlouf rlouf deleted the dual-averaging branch October 14, 2021 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request important

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Dual Averaging

3 participants