Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update parameter to be savable #1518

Merged
merged 22 commits into from Aug 17, 2020
Merged

Update parameter to be savable #1518

merged 22 commits into from Aug 17, 2020

Conversation

awav
Copy link
Member

@awav awav commented Jun 30, 2020

An attempt to make the custom tensor Parameter savable (loadable) with tf.saved_model.

@vdutor vdutor self-requested a review July 8, 2020 12:54
@vdutor
Copy link
Contributor

vdutor commented Jul 8, 2020

Hey @awav, how would you like to proceed with this?

@awav awav marked this pull request as ready for review August 10, 2020 21:32
@codecov
Copy link

codecov bot commented Aug 11, 2020

Codecov Report

Merging #1518 into develop will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #1518   +/-   ##
========================================
  Coverage    95.31%   95.32%           
========================================
  Files           85       85           
  Lines         3823     3766   -57     
========================================
- Hits          3644     3590   -54     
+ Misses         179      176    -3     
Impacted Files Coverage Δ
gpflow/base.py 91.83% <100.00%> (-0.57%) ⬇️
gpflow/optimizers/natgrad.py 85.03% <100.00%> (+0.11%) ⬆️
gpflow/utilities/utilities.py 94.23% <100.00%> (+0.07%) ⬆️
gpflow/optimizers/mcmc.py 92.50% <0.00%> (-2.50%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4682722...700eb46. Read the comment docs.

@joelberkeley-secondmind
Copy link
Contributor

joelberkeley-secondmind commented Aug 13, 2020

these types can be removed if they're not in use any more https://github.com/GPflow/GPflow/pull/1518/files#diff-e2281e00a6a807545fc8c393630c5c36L17-L19

ok, the link doesn't work... VariableData, Transform, Prior, maybe others

Copy link
Contributor

@vdutor vdutor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work - thanks. Only thing that is missing is a proper unit test to show that a model is actually saveable.


Parameter._OverloadAllOperators()
tf.register_tensor_conversion_function(Parameter, lambda x, *args, **kwds: x.read_value())


def _cast_to_dtype(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't TF2.2 resolve the issue with cast - can we simply use tf.case now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still support 2.1

gpflow/base.py Outdated Show resolved Hide resolved
unconstrained_value = self.validate_unconstrained_value(value, self.dtype)
return self._unconstrained.assign(
unconstrained_value = _validate_unconstrained_value(value, self.transform, self.dtype)
return self.unconstrained_variable.assign(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The super class already has this implemented. Making use of it makes it more clear what is happening:

  1. validate that the unconstrained value exists.
  2. assign it.
Suggested change
return self.unconstrained_variable.assign(
return super().assign(value, use_locking, name, read_value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're comment is right, the suggestion is wrong (I guess you know that though) - If you're going to do that, you can simply remove the method, it'll inherit it from the parent.

I suppose what you're suggesting is

unconstrained_value = super().assign(value, use_locking, name, read_value)
message = (
        "gpflow.Parameter: the value to be assigned is incompatible with this parameter's "
        "transform (the corresponding unconstrained value has NaN or Inf) and hence cannot be "
        "assigned."
    )
return tf.debugging.assert_all_finite(unconstrained_value, message=message)

Which makes sense to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, the transformation will be performed 2 times. We redefine the method so that we could do extra steps before assigning value without sacrificing performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what you mean. The snippet @sam-willis posted seems correct to me?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will only be performed twice if you call _validate_unconstrained_value, which is why I suggested just using tf.debugging.assert_all_finite. You could make the message a constant, if you don't want code duplication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove _validate_unconstrained_value, _unconstrained_value and _constrained_value, and just check that after either assignment or initialisation the unconstrained value is finite.

Copy link
Member Author

@awav awav Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't apply Sam's suggestion because it changes the value first and then checks correctness, whereas it should be vice versa.

Also, I don't understand what you mean by: I think you can remove _validate_unconstrained_value, _unconstrained_value and _constrained_value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I do see your point, maybe it's better to not change the value if it's invalid, although an exception is thrown, so I'm not sure how important it is. I guess if you catch the exception your current code makes recovering easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_constrained_value is unused, and technically you should always have the identity for the transform now, so you don't need the if in _unconstrained_value, but I think those are very minor.

@property
def unconstrained_variable(self) -> tf.Variable:
return self._unconstrained
return self._pretransformed_input

@property
def transform(self) -> Optional[Transform]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is mainly because of legacy, but why are we still using the term transform? We could make the interface cleaner if we followed TFP's naming and used the word bijector instead. Having two words for the same object makes the code harder to read and understand. This PR seems like the right place and time to update this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like Bijector name, because not every bijector that TFP has is actually a bijection. Transform name has been used since the start of gpflow and I think keeping that name indeed reflects meaning better. And another reason for transform - keep the commit diff minimal.

@@ -190,17 +190,22 @@ def leaf_components(input: tf.Module):
def _merge_leaf_components(
input: Dict[str, Union[tf.Variable, tf.Tensor, Parameter]]
) -> Dict[str, Union[tf.Variable, tf.Tensor, Parameter]]:
input_values = set([value.experimental_ref() for value in input.values()])

ref_fn = lambda x: (x if isinstance(x, Parameter) else x.ref())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add a ref / deref method to Parameter? Seems like this is just missing from TFP's TransformedVariable API.

def ref(self):
    return self.unconstrained.ref()

Copy link
Member Author

@awav awav Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter is not a tensor object and not a trait to the tensor class, it simply a wrapper to a tensor. You cannot use the same reference for two different objects. That was one of the reasons why saving didn't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be cleaner to have

def ref(self):
    return self

in order to get rid of ref_fn and deref_fn? It seems to me that this referencing functionality could be handy elsewhere as well at some point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ref() belongs to tensor API, and ref() returns a different type. There must be a distinction between these two for clarity. I would prefer not to introduce this method to the parameter class because of a printing issue.

tests/gpflow/config/test_config.py Outdated Show resolved Hide resolved
unconstrained_value = self.validate_unconstrained_value(value, self.dtype)
return self._unconstrained.assign(
unconstrained_value = _validate_unconstrained_value(value, self.transform, self.dtype)
return self.unconstrained_variable.assign(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're comment is right, the suggestion is wrong (I guess you know that though) - If you're going to do that, you can simply remove the method, it'll inherit it from the parent.

I suppose what you're suggesting is

unconstrained_value = super().assign(value, use_locking, name, read_value)
message = (
        "gpflow.Parameter: the value to be assigned is incompatible with this parameter's "
        "transform (the corresponding unconstrained value has NaN or Inf) and hence cannot be "
        "assigned."
    )
return tf.debugging.assert_all_finite(unconstrained_value, message=message)

Which makes sense to me.

Copy link
Contributor

@sam-willis sam-willis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to add more comments, but I see @vdutor has already addressed them all.

Copy link
Contributor

@sam-willis sam-willis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have a test for parameter saving. Otherwise LGTM

@awav
Copy link
Member Author

awav commented Aug 14, 2020

@vdutor, @sam-willis I added a test.

@@ -55,3 +56,24 @@ def test_cast_to_dtype_precision_issue():
assert actual_value.dtype == np.float64
expected_value = np.float64(0.2)
assert actual_value == expected_value


def test_parameter_saved():
Copy link
Contributor

@vdutor vdutor Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also check that the number of trainable parameters is the same in the original and reloaded model? I've had issues with that before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exec returns the same result. If it were a different parameter the result would be different? Or do you mean something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you run model.variables on both models do you get the same number of parameters back? There used to be an issue that certain variables / parameters were dropped even though the execution of exec gave the same result.

Copy link
Member Author

@awav awav Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, tf.save_model saves models for deployment purposes only (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md#builder). m1 can be a different object that wraps graph computation and doesn't support interface of the tf.Module.

Copy link
Member Author

@awav awav Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keras uses tf.save_model format in a different way. Standard tf.save_model returns not the same object that you pass in at saving time.

> m1
<tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject object at 0x13f553fd0>

> m1.variables
AttributeError: '_UserObject' object has no attribute 'variables'

@awav awav merged commit 8c65ef6 into develop Aug 17, 2020
@awav awav deleted the awav/save-parameter-fix branch August 17, 2020 13:26
vdutor added a commit that referenced this pull request Aug 27, 2020
* Update pull request template (#1510)

Clarify template to make it easier for contributors to fill in relevant information.

* Temporary workaround for tensorflow_probability dependency issue (#1522)

* pin cloudpickle==1.3.0 as temporary workaround for tensorflow/probability#991 to unblock our build (to be reverted once fixed upstream)

* Update readme with new project using GPflow (#1530)

* fix bug in varying_noise notebook (#1526)

* Fix formatting in docs (intro.md) and restore link removed by #1498 (#1520)

* pin tensorflow<2.3 tensorflow-probability<0.11 (#1537)

* Quadrature Refactoring (#1505)

* WIP: quadrature refactoring

* Removing old ndiagquad code

* deleted test code

* formatting and type-hint

* merge modules

* black formatting

* formatting

* solving failing tests

* fixing failing tests

* fixes

* adapting tests for new syntax, keeping numerical behavior

* black formatting

* remove printf

* changed code for compiled tf compatibility

* black

* restored to original version

* undoing changes

* renaming

* renaming

* renaming

* reshape kwargs

* quadrature along axis=-2, simplified broadcasting

* black

* docs

* docs

* helper function

* docstrings and typing

* added new and old quadrature equivalence tests

* black

* Removing comments

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* Typo

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* notation

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* reshape_Z_dZ return docstring fix

* FIX: quad_old computed with the ndiagquad_old

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* more readable implementation

Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* tf.ensure_shape added

* removed ndiagquad

* removed ndiagquad

* Revert "removed ndiagquad"

This reverts commit 7bb0e9f.

* FIX: shape checking of dZ

* Revert "removed ndiagquad"

This reverts commit 8e23524.

Co-authored-by: Gustavo Carvalho <gustavo.carvalho@delfosim.com>
Co-authored-by: ST John <st@prowler.io>
Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* Add base_conditional_with_lm function (#1528)

* Added base_conditional_with_lm function, which accepts Lm instead of Kmm

Co-authored-by: Neil Ferguson <neil@prowler.io>
Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>
Co-authored-by: st-- <st--@users.noreply.github.com>

* Fixed separate_independent_conditional to correctly handle q_sqrt=None. (#1533)

* Fixed separate_independent_conditional to correctly handle q_sqrt=None.

Co-authored-by: Aidan Scannell <scannell.aidan@gmail.com>
Co-authored-by: st-- <st--@users.noreply.github.com>

* Bump version numbers to 2.1.0. (#1544)

* Re-introduce pytest-xdist (#1541)

Enables pytest-xdist for locally running tests (`make test`) on multiple cores in parallel.

* check dependency versions are valid on CI (#1536)

* Update to not use custom image (#1545)

* Update to not use custom image

* Add test requirements

* Update parameter to be savable (#1518)

* Fix for quadrature failure mode when autograph was set to False (#1548)

* Fix and test

* Change shape of quadrature tensors for better broadcasting (#1542)

* using the first dimension to hold the quadrature summation

* adapting ndiagquad wrapper

* Changed bf for bX in docstrings

Co-authored-by: Gustavo Carvalho <gustavo.carvalho@delfosim.com>
Co-authored-by: st-- <st--@users.noreply.github.com>
Co-authored-by: Vincent Dutordoir <dutordoirv@gmail.com>

* Update min TFP supported version to 0.10 (#1551)

* Broadcasting constant and zero mean function (#1550)

* Broadcasting constant and zero mean function

* Use rank instead of ndim

Co-authored-by: st-- <st--@users.noreply.github.com>
Co-authored-by: joelberkeley-pio <joel.berkeley@prowler.io>
Co-authored-by: gustavocmv <47801305+gustavocmv@users.noreply.github.com>
Co-authored-by: Gustavo Carvalho <gustavo.carvalho@delfosim.com>
Co-authored-by: ST John <st@prowler.io>
Co-authored-by: Neil Ferguson <nfergu@users.noreply.github.com>
Co-authored-by: Neil Ferguson <neil@prowler.io>
Co-authored-by: Aidan Scannell <as12528@my.bristol.ac.uk>
Co-authored-by: Aidan Scannell <scannell.aidan@gmail.com>
Co-authored-by: Sandeep Tailor <s.tailor@insysion.net>
Co-authored-by: Artem Artemev <art.art.v@gmail.com>
@st-- st-- mentioned this pull request Sep 14, 2020
@st-- st-- mentioned this pull request Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants