Consolidate `DeepNPTSEstimator` #2496

lostella · 2022-12-14T11:18:00Z

Description of changes:

Align DeepNPTSEstimator.train arguments to those of Estimator and PyTorchLightningEstimator in particular (move many of its arguments to the constructor)
Make dropout_rate not Optional: it's a float that can be zero
Consolidate tests for DeepNPTSEstimator with similar ones for other estimator classes
Rename DeepNPTSMultiStepPredictor -> DeepNPTSMultiStepNetwork to avoid confusion (since it's not a Predictor subclass)
Fix docstrings that ended up formatted wrong for some reason

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

lostella · 2022-12-14T17:29:30Z

Unsure about the failure, it is only happening on Py38 on ubuntu, works locally (macos) on Python 3.8.13 with torch 1.13.0 (same as in the tests)

/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/gluonts/torch/model/deep_npts/_estimator.py:413: in train
    pred_net = self.train_model(
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/gluonts/torch/model/deep_npts/_estimator.py:372: in train_model
    loss.backward()
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/torch/_tensor.py:487: in backward
    torch.autograd.backward(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

tensors = (tensor(0.),), grad_tensors = None, retain_graph = False
create_graph = False, grad_variables = None, inputs = ()

    def backward(
        tensors: _TensorOrTensors,
        grad_tensors: Optional[_TensorOrTensors] = None,
        retain_graph: Optional[bool] = None,
        create_graph: bool = False,
        grad_variables: Optional[_TensorOrTensors] = None,
        inputs: Optional[_TensorOrTensors] = None,
    ) -> None:
        r"""Computes the sum of gradients of given tensors with respect to graph
        leaves.
    
        The graph is differentiated using the chain rule. If any of ``tensors``
        are non-scalar (i.e. their data has more than one element) and require
        gradient, then the Jacobian-vector product would be computed, in this
        case the function additionally requires specifying ``grad_tensors``.
        It should be a sequence of matching length, that contains the "vector"
        in the Jacobian-vector product, usually the gradient of the differentiated
        function w.r.t. corresponding tensors (``None`` is an acceptable value for
        all tensors that don't need gradient tensors).
    
        This function accumulates gradients in the leaves - you might need to zero
        ``.grad`` attributes or set them to ``None`` before calling it.
        See :ref:`Default gradient layouts<default-grad-layouts>`
        for details on the memory layout of accumulated gradients.
    
        .. note::
            Using this method with ``create_graph=True`` will create a reference cycle
            between the parameter and its gradient which can cause a memory leak.
            We recommend using ``autograd.grad`` when creating the graph to avoid this.
            If you have to use this function, make sure to reset the ``.grad`` fields of your
            parameters to ``None`` after use to break the cycle and avoid the leak.
    
        .. note::
    
            If you run any forward ops, create ``grad_tensors``, and/or call ``backward``
            in a user-specified CUDA stream context, see
            :ref:`Stream semantics of backward passes<bwd-cuda-stream-semantics>`.
    
        .. note::
    
            When ``inputs`` are provided and a given input is not a leaf,
            the current implementation will call its grad_fn (even though it is not strictly needed to get this gradients).
            It is an implementation detail on which the user should not rely.
            See https://github.com/pytorch/pytorch/pull/60521#issuecomment-867061780 for more details.
    
        Args:
            tensors (Sequence[Tensor] or Tensor): Tensors of which the derivative will be
                computed.
            grad_tensors (Sequence[Tensor or None] or Tensor, optional): The "vector" in
                the Jacobian-vector product, usually gradients w.r.t. each element of
                corresponding tensors. None values can be specified for scalar Tensors or
                ones that don't require grad. If a None value would be acceptable for all
                grad_tensors, then this argument is optional.
            retain_graph (bool, optional): If ``False``, the graph used to compute the grad
                will be freed. Note that in nearly all cases setting this option to ``True``
                is not needed and often can be worked around in a much more efficient
                way. Defaults to the value of ``create_graph``.
            create_graph (bool, optional): If ``True``, graph of the derivative will
                be constructed, allowing to compute higher order derivative products.
                Defaults to ``False``.
            inputs (Sequence[Tensor] or Tensor, optional): Inputs w.r.t. which the gradient
                be will accumulated into ``.grad``. All other Tensors will be ignored. If
                not provided, the gradient is accumulated into all the leaf Tensors that
                were used to compute the attr::tensors.
        """
        if torch._C._are_functorch_transforms_active():
            raise RuntimeError(
                "backward() called inside a functorch transform. This is not "
                "supported, please use functorch.grad or functorch.vjp instead "
                "or call backward() outside of functorch transforms.")
    
        if grad_variables is not None:
            warnings.warn("'grad_variables' is deprecated. Use 'grad_tensors' instead.")
            if grad_tensors is None:
                grad_tensors = grad_variables
            else:
                raise RuntimeError("'grad_tensors' and 'grad_variables' (deprecated) "
                                   "arguments both passed to backward(). Please only "
                                   "use 'grad_tensors'.")
        if inputs is not None and len(inputs) == 0:
            raise RuntimeError("'inputs' argument to backward() cannot be empty.")
    
        tensors = (tensors,) if isinstance(tensors, torch.Tensor) else tuple(tensors)
        inputs = (inputs,) if isinstance(inputs, torch.Tensor) else \
            tuple(inputs) if inputs is not None else tuple()
    
        grad_tensors_ = _tensor_or_tensors_to_tuple(grad_tensors, len(tensors))
        grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
        if retain_graph is None:
            retain_graph = create_graph
    
        # The reason we repeat same the comment below is that
        # some Python versions print out the first line of a multi-line function
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/xdist/dsession.py:70: PytestDeprecationWarning: The hookimpl DSession.pytest_sessionstart uses old-style configuration options (marks or attributes).
Please use the pytest.hookimpl(trylast=True) decorator instead
 to configure the hooks.
 See https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers
  @pytest.mark.trylast
/opt/hostedtoolcache/Python/3.8.15/x64/lib/python3.8/site-packages/xdist/dsession.py:[93](https://github.com/awslabs/gluonts/actions/runs/3697166510/jobs/6261819151#step:5:94): PytestDeprecationWarning: The hookimpl DSession.pytest_xdist_make_scheduler uses old-style configuration options (marks or attributes).
Please use the pytest.hookimpl(trylast=True) decorator instead
 to configure the hooks.
 See https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers
  @pytest.mark.trylast
        # calls in the traceback and some print out the last line
>       Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
            tensors, grad_tensors_, retain_graph, create_graph, inputs,
            allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
E       RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

lostella added enhancement New feature or request BREAKING This is a breaking change (one of pr required labels) labels Dec 14, 2022

lostella requested a review from rshyamsundar December 14, 2022 11:18

lostella force-pushed the consolidate-deep-npts branch from bf79c79 to abe79a7 Compare January 2, 2023 08:53

Lorenzo Stella added 6 commits January 17, 2023 10:13

consolidate deep npts

9e184d7

fix typing issues

aafeb07

fixup

6b0ac01

attempt to fix torch issue

56299ef

try fixing issue

73c16fc

undo change

c3ac212

lostella force-pushed the consolidate-deep-npts branch from 5190433 to c3ac212 Compare January 17, 2023 09:14

lostella added this to the v0.12 milestone Jan 17, 2023

lostella added models This item concerns models implementations torch This concerns the PyTorch side of GluonTS labels Jan 20, 2023

jaheba approved these changes Jan 20, 2023

View reviewed changes

Merge branch 'dev' into consolidate-deep-npts

baa0b8d

lostella enabled auto-merge (squash) January 20, 2023 16:23

lostella merged commit 9c64127 into awslabs:dev Jan 20, 2023

lostella mentioned this pull request Feb 24, 2023

Fix off-by-one in torch DeepAR #2618

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate `DeepNPTSEstimator` #2496

Consolidate `DeepNPTSEstimator` #2496

lostella commented Dec 14, 2022

lostella commented Dec 14, 2022 •

edited

Consolidate DeepNPTSEstimator #2496

Consolidate DeepNPTSEstimator #2496

Conversation

lostella commented Dec 14, 2022

lostella commented Dec 14, 2022 • edited

Consolidate `DeepNPTSEstimator` #2496

Consolidate `DeepNPTSEstimator` #2496

lostella commented Dec 14, 2022 •

edited