Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EmbeddingSequenceLayer not working with BertIterator #9481

Closed
thomas-trendsoft opened this issue Oct 11, 2021 · 1 comment · Fixed by #9493
Closed

EmbeddingSequenceLayer not working with BertIterator #9481

thomas-trendsoft opened this issue Oct 11, 2021 · 1 comment · Fixed by #9493

Comments

@thomas-trendsoft
Copy link

Issue Description

I try to build a EmbeddingSequenceLayer training with some BertIterator. My expectation was that the model would do a fit.
The current behavious is an DataType Exception, see below on additional information.

Version Information

Please indicate relevant versions, including, if relevant:

  • Deeplearning4j version = M1.1
  • Platform information = MacOS
  • CUDA = no

Additional Information

Code:

public static void main(String[] args) throws IOException, InterruptedException {
int sentlen = 100;

	LabeledSentenceProvider    provider;
	
	provider = ... some provider ...
	
	BertWordPieceTokenizerFactory tokenizerFactory = new BertWordPieceTokenizerFactory(new File("./vocab.txt"), false, true, Charsets.UTF_8);
	
	BertIterator iter = new BertIterator.Builder()
			.tokenizer(tokenizerFactory)
			.lengthHandling(LengthHandling.FIXED_LENGTH, sentlen)
			.minibatchSize(10)
			.sentenceProvider(provider)
			.task(Task.UNSUPERVISED)
			.vocabMap(tokenizerFactory.getVocab())
			.featureArrays(BertIterator.FeatureArrays.INDICES_MASK)
			.masker(new BertMaskedLMMasker(new Random(12345), 0.2, 0.5, 0.5))
			.unsupervisedLabelFormat(BertIterator.UnsupervisedLabelFormat.RANK3_NCL)
			.maskToken("[MASK]")
			.build();
	
	
	int vocabsize = tokenizerFactory.getVocab().size();
	
	HashMap<String,InputPreProcessor> preproc = new HashMap<>();
	preproc.put("output", new RnnToFeedForwardPreProcessor(RNNFormat.NCW));
	
	ComputationGraphConfiguration.GraphBuilder builder = new NeuralNetConfiguration.Builder()
            .seed(42345)
            .l2(0.0001)
            .weightInit(WeightInit.XAVIER)
            .updater(new Adam(0.0015))
            .graphBuilder();
        
	builder.setInputTypes(InputType.recurrent(vocabsize, sentlen, RNNFormat.NCW));
	builder.addInputs("token");
	
	System.out.println("VOCAB size: " + vocabsize);
	
	// embedding tokens layer
	builder.addLayer("emb", new EmbeddingSequenceLayer.Builder()
			.nIn(vocabsize)
			.nOut(756)
			.build() , "token");
	
	// try a single transfer block first
	
	// attention multi head 
	builder.addLayer("attention1",new SelfAttentionLayer.Builder()
			.nIn(756)
			.nOut(756)
			.nHeads(2)
			.projectInput(true)
			.build()
			, "emb");
	
	// feed forward
	builder.addLayer("ffint1", new LSTM.Builder()
			.nOut(756)
			.build(), "attention1");
	

	
	// make it output
	builder.addLayer("output", new RnnOutputLayer.Builder()
			.nOut(vocabsize)
			.dataFormat(RNNFormat.NCW)
			.activation(Activation.SOFTMAX)
			.build(), "ffint1");
	
	builder.setOutputs("output");

	
	//ComputationGraphConfiguration
	ComputationGraph model = new ComputationGraph(builder.build());
	
	model.fit(iter);
	
	
}

`

StackTrace:

Exception in thread "main" java.lang.IllegalArgumentException: Op.X must have same data type as Op.Y: X.datatype=FLOAT, Y.datatype=INT at org.nd4j.common.base.Preconditions.throwEx(Preconditions.java:633) at org.nd4j.common.base.Preconditions.checkArgument(Preconditions.java:134) at org.nd4j.linalg.api.ops.BaseBroadcastOp.validateDataTypes(BaseBroadcastOp.java:200) at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:889) at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:879) at org.nd4j.linalg.factory.Broadcast.mul(Broadcast.java:149) at org.deeplearning4j.nn.layers.feedforward.embedding.EmbeddingSequenceLayer.backpropGradient(EmbeddingSequenceLayer.java:64) at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doBackward(LayerVertex.java:148) at org.deeplearning4j.nn.graph.ComputationGraph.calcBackpropGradients(ComputationGraph.java:2772) at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1381) at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1341) at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:174) at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:61) at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52) at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper(ComputationGraph.java:1165) at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1115) at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1082)

Contributing

Sorry no fix, only a big thanks for your great work.

@treo
Copy link
Member

treo commented Oct 11, 2021

https://github.com/eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-nlp-parent/deeplearning4j-nlp/src/main/java/org/deeplearning4j/iterator/BertIterator.java#L206 uses int[] to create the mask arrays.

This looks like it was missed when we added support for keeping the original array type on creation instead of forcing all tensors to be either float or double.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants