Using LODTensor instead of Tensor in every operator. #3717

reyoung · 2017-08-28T04:46:24Z

Since we try to use LODTensor instead of Tensor as our basic data type for Input/Output variables of operators, we should:

Using LODTensor instead of Tensor in current code. For example, in current implementation, like here, we should use Input<LODTensor>() instead of Input<Tensor>().
How does output inherit LOD information from one of the operator's inputs?
- Should that be registered in OpInfo. e.g. Register Output["out"] inherits input["X"]'s lod information?
- Or is that should be hand written in operator implementation by each developer. (which could be very noising).

The text was updated successfully, but these errors were encountered:

wangkuiyi · 2017-08-28T04:50:42Z

Just to add a link to #3693, where I have a protobuf message description of the LODTensor's shape.

reyoung · 2017-08-28T04:59:50Z

Current operators are using Tensor heavily. By command

grep -n '<Tensor>' `find paddle/operators/ -name '*_op.h' -o -name '*_op.cc'`

we found 128 usages. I just copy usage of minus_op.h and minus_op.cc here.

// In compute method
    auto* left_tensor = context.Input<framework::Tensor>("X");
    auto* right_tensor = context.Input<framework::Tensor>("Y");
    auto* out_tensor = context.Output<framework::Tensor>("Out");

    out_tensor->mutable_data<T>(context.GetPlace());
    auto& dev = context.GetEigenDevice<Place>();
    framework::EigenVector<T>::Flatten(*out_tensor).device(dev) =
        framework::EigenVector<T>::Flatten(*left_tensor) -
        framework::EigenVector<T>::Flatten(*right_tensor);

// In InferShape method
    auto *left_tensor = ctx.Input<framework::Tensor>("X");
    auto *right_tensor = ctx.Input<framework::Tensor>("Y");

    PADDLE_ENFORCE_EQ(
        framework::product(left_tensor->dims()),
        framework::product(right_tensor->dims()),
        "Minus operator must take two tensor with same num of elements");
    ctx.Output<framework::Tensor>("Out")->Resize(left_tensor->dims());

On another hand, only Tensor is exposed to Python by pybind. How to expose LODTensor? How to use LODTensor in Python?

QiJune · 2017-08-28T07:20:21Z

Paddle's existing solution

Paddle has implemented highly optimized recurrent networks which can handle variable-length sequence without padding. And I believe that we can learn something from the original design. The solution is as follows:

Paddle use Argument as the input and output of Layer. And Argument is a struct of value and sequence info.

struct Argument {
  ......
  MatrixPtr value;
  ICpuGpuVectorPtr sequenceStartPositions;
  ICpuGpuVectorPtr subSequenceStartPositions;
  ......
};

And different layer will handle Argument respectively. For layer who does not need sequence info, it will just use data value and pass sequence info if necessary; for layer who need sequence info, it will both use value and sequence info.

Let's take SequencePoolLayer and FullyConnectedLayer as examples:

forward method of SequencePoolLayer:

void SequencePoolLayer::forward(PassType passType) {
  Layer::forward(passType);

  const Argument& input = getInput(0);
  CHECK(input.hasSeq() || input.hasSubseq())
      << "Input should be a sequence or subsequence for layer " << getName();

  newBatchSize_ = type_ ? input.getNumSubSequences() : input.getNumSequences();
  size_t dim = getSize();
  // check
  CHECK_EQ(dim, input.value->getWidth());
  startPositions_ =
      type_ ? input.subSequenceStartPositions : input.sequenceStartPositions;
  auto starts = startPositions_->getVector(false);
  CHECK_EQ(starts->getData()[newBatchSize_], input.getBatchSize());
  CHECK_EQ(newBatchSize_, starts->getSize() - 1);
  ......
  if (type_) {
    CHECK(input.subSequenceStartPositions)
        << "when trans_type = seq, input must hasSubseq";
    output_.degradeSequence(input);
  }
  ......
  resetOutput(newBatchSize_, dim);
}

forward method of FullyConnectedLayer:

void FullyConnectedLayer::forward(PassType passType) {
  ......
  Layer::forward(passType);
  MatrixPtr outV = getOutputValue();

  for (size_t i = 0; i != inputLayers_.size(); ++i) {
    auto input = getInput(i);
    CHECK(input.value) << "The input of 'fc' layer must be matrix";
    REGISTER_TIMER_INFO("FwMulTimer", getName().c_str());
    i == 0 ? outV->mul(*input.value, *weights_[i]->getW(), 1, 0)
           : outV->mul(*input.value, *weights_[i]->getW(), 1, 1);
  }

  /* add the bias-vector */
  if (biases_.get() != NULL) {
    REGISTER_TIMER_INFO("FwBiasTimer", getName().c_str());
    outV->addBias(*(biases_->getW()), 1);
  }

  /* activation */ {
    REGISTER_TIMER_INFO("FwAtvTimer", getName().c_str());
    forwardActivation();
  }
  ......
}

SequencePoolLayer will use sequence info whereas FullyConnectedLayer will just pass sequence info to next layer.

And for the transmission of sequence info, every derived layer class have to call forward method in base class, Just like FullyConnectedLayer does.

virtual void forward(PassType passType) {
    passType_ = passType;
    if (!inputLayers_.empty() && needSequenceInfo_) {
      const Argument& input = getInput(0);
      output_.sequenceStartPositions = input.sequenceStartPositions;
      output_.subSequenceStartPositions = input.subSequenceStartPositions;
      output_.cpuSequenceDims = input.cpuSequenceDims;
    }
  }

In Python api, there are actually 4 * 3 = 12 kinds of input data.

Four data types:

dense vector
sparse binary vector
sparse float vector
integer

Three sequence types:

non sequence
sequence
sub sequence

And DataProviderConverter is defined to convert Python input data to C++ Argument.

New solution

If we follows the design of Paddle formerly, we can have LOD Tensor like this:

class LOD : public std::vector<Vector<size_t>> {
public:
  LOD SliceLevels(size_t level_begin, size_t level_end) const;
  LOD SliceInLevel(size_t level, size_t elem_begin, size_t elem_end) const;
};
  
class LODTensor {
private:
  LOD* lod_
  Tensor* data_;  
};

After we replace Tensor in current codes with LODTensor, only sequence related Op will handle both lod_ and data_ field in both InferShape and Run method, and other Op will handle data_ field and pass lod_ field.

LODTensor will exposed to Python, users can set sequence info and data directly. But for the consistency with v2 api, we need to implement a converter which takes data reader in, and produce LODTensor out.

To use composition, we can unify the data type and avoid potential type deduction.

The cost is that there will be an additional LOD pointer field, which takes 4 bytes. If we have 1000 tensors which do not contain sequence info, it will use 4KB memory more.

luotao1 · 2017-08-28T07:53:09Z

For layer who does not need sequence info, it will just use data value; for layer who need sequence info, it will both use value and sequence info.

所有的layer都有sequence info。在上面的例子中，fc输出的sequence info默认是和输入的一样，见layer.h。

virtual void forward(PassType passType) {
    passType_ = passType;
    if (!inputLayers_.empty() && needSequenceInfo_) {
      const Argument& input = getInput(0);
      output_.sequenceStartPositions = input.sequenceStartPositions;
      output_.subSequenceStartPositions = input.subSequenceStartPositions;
      output_.cpuSequenceDims = input.cpuSequenceDims;
    }
  }

这样设计的目的是：比如maxlayer1->fc->maxlayer2。
如果fc不带sequence info，那么到maxlayer2的时候，就没有sequence info可以用了。

After we replace Tensor in current codes with LODTensor, only sequence related Op will handle both lod_ and data_ field in both InferShape and Run method, and other Op will handle data_ field.

所以所有的Op都需要带有lod_和data_两个field。

QiJune · 2017-08-28T07:58:24Z

@luotao1 Thanks for pointing out! I will updated my comments accordingly.

wangkuiyi · 2017-08-28T20:02:48Z

In my mind, LODTensor is a subclass of Tensor, so it inherits all methods from Tensor.

I agree with @QiJune that LoD is optional, and with @luotao1 that most layers would just copy LoD information from input to output.

QiJune · 2017-08-29T02:35:24Z

@wangkuiyi @reyoung @Superjom
LODTensor具有传染性，一旦输入数据拥有了LOD field，那么输出也就拥有LOD field。当然，一些Op实现了sequence pooling，可以把LOD信息给抹去。对于这些Op而言，输入数据类型与输出数据类型是不一致的。

举例如下：

input_data = LODTensor
LODTensor = Op1(LODTensor)
LODTensor = Op2(LODTensor)
Tensor = Op3(LODTensor)

这些特定的Op，比如Op3，输入必须是LODTensor，输出数据类型必须是Tensor。而对于其他的Op，比如Op1，Op2，则输出类型与输入类型保持一致！即如果输入是Tensor，输出也是Tensor；如果输入是LODTensor，输出也是LODTensor。

所以，需要有一个InferType的过程。在InferType的时候，需要判断输入数据的类型。

对于一般的Op，其输出的类型与输入保持一致
对于特殊的Op，其输入数据类型必须为LODTensor，其输出数据类型必须为Tensor

而且InferType必须要在InferShape之前完成，因为InferShape给数据设置size的时候，需要知道数据的类型是什么

但是Type信息实际上是在运行时，根据用户输入数据的类型才能拿到的。而variable的GetMutable接口，必须在编译期确定一个类型。

这不是一个矛盾吗？

reyoung · 2017-08-29T03:04:47Z

不需要有infertype的过程。之前听 @wangkuiyi @Superjom 的讨论，我们要将所有Op的输入和输出都改成LODTensor，无论这个Op使用不使用LOD信息。

QiJune · 2017-08-29T03:58:41Z

那现在要做的事情有三个：

把C++中目前使用Tensor的地方都改成LODTensor
Python端也需要改为暴露LODTensor
对于有多个输入，多个输出的Operator，需要设计一种机制，来确定每个输出的LOD信息从那个特定的输入获取到的

@wangkuiyi @reyoung @Superjom 我理解的对吗？

reyoung assigned Superjomn and wangkuiyi Aug 28, 2017

QiJune self-assigned this Aug 28, 2017

This was referenced Aug 29, 2017

[discussion]关于编译时和运行时的区别 #3726

Closed

add compile vs runtime discussion #3728

Closed

qingqing01 mentioned this issue Sep 5, 2017

Expose LODTensor to Python by pybind. #3883

Closed

This was referenced Sep 12, 2017

[WIP] Using LoDTensor instead of Tensor in every operator. #4048

Closed

Replace Tensor by LoDTensor in each operator. #4047

Closed

Using LoDTensor instead of Tensor in every operator. #4083

Merged

qingqing01 closed this as completed in #4083 Sep 14, 2017

luotao1 mentioned this issue Mar 12, 2018

What should do to add MKLDNN kernel #8305

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using LODTensor instead of Tensor in every operator. #3717

Using LODTensor instead of Tensor in every operator. #3717

reyoung commented Aug 28, 2017 •

edited

Loading

wangkuiyi commented Aug 28, 2017 •

edited

Loading

reyoung commented Aug 28, 2017 •

edited

Loading

QiJune commented Aug 28, 2017 •

edited

Loading

luotao1 commented Aug 28, 2017

QiJune commented Aug 28, 2017

wangkuiyi commented Aug 28, 2017

QiJune commented Aug 29, 2017 •

edited

Loading

reyoung commented Aug 29, 2017

QiJune commented Aug 29, 2017 •

edited

Loading

Using LODTensor instead of Tensor in every operator. #3717

Using LODTensor instead of Tensor in every operator. #3717

Comments

reyoung commented Aug 28, 2017 • edited Loading

wangkuiyi commented Aug 28, 2017 • edited Loading

reyoung commented Aug 28, 2017 • edited Loading

QiJune commented Aug 28, 2017 • edited Loading

Paddle's existing solution

New solution

luotao1 commented Aug 28, 2017

QiJune commented Aug 28, 2017

wangkuiyi commented Aug 28, 2017

QiJune commented Aug 29, 2017 • edited Loading

reyoung commented Aug 29, 2017

QiJune commented Aug 29, 2017 • edited Loading

reyoung commented Aug 28, 2017 •

edited

Loading

wangkuiyi commented Aug 28, 2017 •

edited

Loading

reyoung commented Aug 28, 2017 •

edited

Loading

QiJune commented Aug 28, 2017 •

edited

Loading

QiJune commented Aug 29, 2017 •

edited

Loading

QiJune commented Aug 29, 2017 •

edited

Loading