Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch Engine, LSTM, [W RNN.cpp:982] Warning: RNN module weights are not part of single contiguous chunk of memory. #3061

Open
pmingkr opened this issue Apr 3, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@pmingkr
Copy link

pmingkr commented Apr 3, 2024

Description

Warning messages from Pytorch's LSTM

Expected Behavior

no warning,
make the LSTM parameter to contiguous chunk of memory.

Error Message

[W RNN.cpp:982] Warning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted 
at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). (function _cudnn_rnn)  

How to Reproduce?

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. use LSTM on the PyTorch Engine
  • Code
package kr.hepi.mre.djl;

import java.io.IOException;
import java.util.List;

import ai.djl.Model;
import ai.djl.ndarray.NDList;
import ai.djl.ndarray.NDManager;
import ai.djl.ndarray.types.Shape;
import ai.djl.nn.recurrent.LSTM;
import ai.djl.training.DefaultTrainingConfig;
import ai.djl.training.EasyTrain;
import ai.djl.training.Trainer;
import ai.djl.training.dataset.Batch;
import ai.djl.training.dataset.Dataset;
import ai.djl.training.loss.Loss;
import ai.djl.training.optimizer.Optimizer;
import ai.djl.translate.StackBatchifier;
import ai.djl.translate.TranslateException;
import ai.djl.util.Progress;

public class Main {
    public static void main(String[] args) throws IOException, TranslateException {
        System.out.println("Hello world!");

        final var model = Model.newInstance("tempmodel");
        model.setBlock(LSTM.builder().setNumLayers(1).setStateSize(2).build());

        try (var trainer = new Trainer(model, new DefaultTrainingConfig(Loss.l2Loss()).optOptimizer(Optimizer.adam().build()))) {
            EasyTrain.fit(trainer, 1, new Dataset() {

                @Override
                public Iterable<Batch> getData(NDManager manager) throws IOException, TranslateException {
                    final var batchifier = new StackBatchifier();
                    final var input = manager.zeros(new Shape(1, 1, 1));
                    final var label = manager.ones(new Shape(1, 1, 2));
                    return List.of(new Batch(manager, new NDList(input), new NDList(label), 1, batchifier, batchifier, 0, 0));
                }

                @Override
                public void prepare(Progress progress) throws IOException, TranslateException {
                }
                
            }, null);
        }
        // SequentialBlock;
    }
}

What have you tried to solve it?

  1. check the parameter structure of DJL. but overwhelmed.

Environment Info

  • Windows, pytorch-engine@0.26.0, pytorch-native-cu121@2.1.1, api@0.26.0
  • Maven, pom.xml
        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-engine</artifactId>
            <version>0.26.0</version>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-native-cu121</artifactId>
            <classifier>win-x86_64</classifier>
            <version>2.1.1</version>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>ai.djl.pytorch</groupId>
            <artifactId>pytorch-jni</artifactId>
            <version>2.1.1-0.26.0</version>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>ai.djl</groupId>
            <artifactId>api</artifactId>
            <version>0.26.0</version>
        </dependency>

@pmingkr pmingkr added the bug Something isn't working label Apr 3, 2024
@Daewju
Copy link

Daewju commented Apr 7, 2024

I'm running into the same issue with an LSTM on CUDA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants