Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify scalers, move to gluonts.torch.scaler #2632

Merged
merged 8 commits into from
Feb 16, 2023

Conversation

lostella
Copy link
Contributor

@lostella lostella commented Feb 9, 2023

Description of changes: There's no need for scalers to be torch.nn.Module since they don't really hold parameters. Also fixes the default keepdim of MeanScaler for consistency.

cc @kashif

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

@lostella lostella added the BREAKING This is a breaking change (one of pr required labels) label Feb 9, 2023
@kashif
Copy link
Contributor

kashif commented Feb 9, 2023

very cool! LGTM!

@lostella lostella changed the title Turn torch scalers into callable dataclasses, move to gluonts.torch.scaler Make torch scalers callable dataclasses, move to gluonts.torch.scaler Feb 9, 2023
@@ -37,21 +36,12 @@ class MeanScaler(nn.Module):
minimum possible scale that is used for any item.
"""

@validated()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We loose the validation property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I use our own dataclass?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's give it a try

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just keep them validated for now

Comment on lines 41 to 50
default_scale: float = 0.0
minimum_scale: float = 1e-10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these need to be torch.tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.clamp also deals with numbers; for the default_scale, isn't this some kind of "immutable" (mind the quotes) property of the object, so that replacing torch.where with if is not really harmful?

self.register_buffer("minimum_scale", torch.tensor(minimum_scale))
dim: int = -1
keepdim: bool = False
minimum_scale: float = 1e-5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, minimum_scale should be a torch.Tensor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I think we can add a torch.Tensor to a number

@lostella
Copy link
Contributor Author

lostella commented Feb 9, 2023

It's funny that all tests pass on macOS

@lostella lostella added the torch This concerns the PyTorch side of GluonTS label Feb 9, 2023
self.default_scale,
batch_scale,
)
if self.default_scale is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the effect if this change on tracing?

Copy link
Contributor Author

@lostella lostella Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that, as long as self.default_scale stays constant (which it should) then tracing should be fine with it and produce the right code for the model. But I need to verify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also according to the warning here https://pytorch.org/docs/stable/generated/torch.jit.trace.html#torch-jit-trace it should be fine as long as the control flow is not affected by the value in input tensors (and I would add: if it’s not otherwise changed throughout the model execution, like the forward changing the value of self.default_scale here for some reason)

@lostella lostella changed the title Make torch scalers callable dataclasses, move to gluonts.torch.scaler Make torch scalers callable objects, move to gluonts.torch.scaler Feb 10, 2023
@lostella lostella changed the title Make torch scalers callable objects, move to gluonts.torch.scaler Simplify scalers, move to gluonts.torch.scaler Feb 10, 2023
@abdulfatir
Copy link
Contributor

Sorry for the late comment but any takers for Scalers being of torch.distributions.transforms.Transform type, AffineTransform in particular. This provides us with the inverse function that you can just apply back to your scaled output and also provides log_abs_det_jacobian, if needed for loss computation.

@kashif
Copy link
Contributor

kashif commented Feb 10, 2023

@abdulfatir on the output side the distribution is an AffineTransform which takes the loc and scale from the scalers... but you mean more general scalers? And I believe currently it's an appropriate place to have the scalers (in the model) since then the model can use the loc and scale as input as well (instead of just on the emission side).

@abdulfatir
Copy link
Contributor

abdulfatir commented Feb 10, 2023

Currently, I don't have an example of a general scaler in mind, but yes, using torch.distributions.transforms.Transform can provide that flexibility. Regarding access of scaler params (loc, scale) in the models, they can also be accessed from the scaler/transform object's properties.

The primary benefit IMO is clarity. You have a scaler (Transform) that normalizes the data and then use its inverse (and log_abs_det_jacobian) at the output side.
@kashif

@abdulfatir
Copy link
Contributor

Also, inside models I think we should be able to provide a Scaler() object instead of a boolean scaling as currently implemented.

@lostella
Copy link
Contributor Author

@abdulfatir I think it makes sense to consider this. The (log) scale should be accessible via log_abs_det_jacobian if one needs that as input to the model, but not the location. I’ll give it a try and share how that goes

@abdulfatir
Copy link
Contributor

@lostella If the transform is of AffineTransform type, then we can access the .loc and .scale attributes. How a model handles/uses transforms could be their internal matter. For instance, for DeepAR like models we can restrict the scale to be of AffineTransform type and freely access these properties.

@abdulfatir
Copy link
Contributor

For completeness, I would like to add that the inverse transform (which we would need at the output side) is available via the .inv attribute.

import torch
from torch.distributions.transforms import AffineTransform

tr = AffineTransform(10., 1.)
inv_tr = tr.inv

x = torch.rand(2, 3, 4)
y = tr(x)

torch.allclose(x, inv_tr(y), atol=1e-5) # True

@lostella
Copy link
Contributor Author

lostella commented Feb 13, 2023

@lostella If the transform is of AffineTransform type, then we can access the .loc and .scale attributes. How a model handles/uses transforms could be their internal matter. For instance, for DeepAR like models we can restrict the scale to be of AffineTransform type and freely access these properties.

The catch is that these scaling operations are not really affine transformations of the data: for some array x, x and 2*x get scaled down to the same vector. They are once you fix the loc and scale, so you can access them as properties of the transformation, but scale (and possibly loc) depend on the input data here.

@lostella
Copy link
Contributor Author

Also, inside models I think we should be able to provide a Scaler() object instead of a boolean scaling as currently implemented.

@abdulfatir agreed! We can do that in a separate PR

@lostella lostella enabled auto-merge (squash) February 16, 2023 13:02
@lostella lostella merged commit c5b64b4 into awslabs:dev Feb 16, 2023
@lostella lostella deleted the torch-scalers-dataclasses branch May 23, 2023 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BREAKING This is a breaking change (one of pr required labels) torch This concerns the PyTorch side of GluonTS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants