Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using LODTensor instead of Tensor in every operator. #3717

Closed
2 tasks
reyoung opened this issue Aug 28, 2017 · 9 comments
Closed
2 tasks

Using LODTensor instead of Tensor in every operator. #3717

reyoung opened this issue Aug 28, 2017 · 9 comments
Assignees

Comments

@reyoung
Copy link
Collaborator

reyoung commented Aug 28, 2017

Since we try to use LODTensor instead of Tensor as our basic data type for Input/Output variables of operators, we should:

  • Using LODTensor instead of Tensor in current code. For example, in current implementation, like here, we should use Input<LODTensor>() instead of Input<Tensor>().

  • How does output inherit LOD information from one of the operator's inputs?

    • Should that be registered in OpInfo. e.g. Register Output["out"] inherits input["X"]'s lod information?
    • Or is that should be hand written in operator implementation by each developer. (which could be very noising).
@wangkuiyi
Copy link
Collaborator

wangkuiyi commented Aug 28, 2017

Just to add a link to #3693, where I have a protobuf message description of the LODTensor's shape.

@reyoung
Copy link
Collaborator Author

reyoung commented Aug 28, 2017

Current operators are using Tensor heavily. By command

grep -n '<Tensor>' `find paddle/operators/ -name '*_op.h' -o -name '*_op.cc'`

we found 128 usages. I just copy usage of minus_op.h and minus_op.cc here.

// In compute method
    auto* left_tensor = context.Input<framework::Tensor>("X");
    auto* right_tensor = context.Input<framework::Tensor>("Y");
    auto* out_tensor = context.Output<framework::Tensor>("Out");

    out_tensor->mutable_data<T>(context.GetPlace());
    auto& dev = context.GetEigenDevice<Place>();
    framework::EigenVector<T>::Flatten(*out_tensor).device(dev) =
        framework::EigenVector<T>::Flatten(*left_tensor) -
        framework::EigenVector<T>::Flatten(*right_tensor);
// In InferShape method
    auto *left_tensor = ctx.Input<framework::Tensor>("X");
    auto *right_tensor = ctx.Input<framework::Tensor>("Y");

    PADDLE_ENFORCE_EQ(
        framework::product(left_tensor->dims()),
        framework::product(right_tensor->dims()),
        "Minus operator must take two tensor with same num of elements");
    ctx.Output<framework::Tensor>("Out")->Resize(left_tensor->dims());

On another hand, only Tensor is exposed to Python by pybind. How to expose LODTensor? How to use LODTensor in Python?

@QiJune QiJune self-assigned this Aug 28, 2017
@QiJune
Copy link
Member

QiJune commented Aug 28, 2017

Paddle's existing solution

Paddle has implemented highly optimized recurrent networks which can handle variable-length sequence without padding. And I believe that we can learn something from the original design. The solution is as follows:

Paddle use Argument as the input and output of Layer. And Argument is a struct of value and sequence info.

struct Argument {
  ......
  MatrixPtr value;
  ICpuGpuVectorPtr sequenceStartPositions;
  ICpuGpuVectorPtr subSequenceStartPositions;
  ......
};

And different layer will handle Argument respectively. For layer who does not need sequence info, it will just use data value and pass sequence info if necessary; for layer who need sequence info, it will both use value and sequence info.

Let's take SequencePoolLayer and FullyConnectedLayer as examples:

forward method of SequencePoolLayer:

void SequencePoolLayer::forward(PassType passType) {
  Layer::forward(passType);

  const Argument& input = getInput(0);
  CHECK(input.hasSeq() || input.hasSubseq())
      << "Input should be a sequence or subsequence for layer " << getName();

  newBatchSize_ = type_ ? input.getNumSubSequences() : input.getNumSequences();
  size_t dim = getSize();
  // check
  CHECK_EQ(dim, input.value->getWidth());
  startPositions_ =
      type_ ? input.subSequenceStartPositions : input.sequenceStartPositions;
  auto starts = startPositions_->getVector(false);
  CHECK_EQ(starts->getData()[newBatchSize_], input.getBatchSize());
  CHECK_EQ(newBatchSize_, starts->getSize() - 1);
  ......
  if (type_) {
    CHECK(input.subSequenceStartPositions)
        << "when trans_type = seq, input must hasSubseq";
    output_.degradeSequence(input);
  }
  ......
  resetOutput(newBatchSize_, dim);
}

forward method of FullyConnectedLayer:

void FullyConnectedLayer::forward(PassType passType) {
  ......
  Layer::forward(passType);
  MatrixPtr outV = getOutputValue();

  for (size_t i = 0; i != inputLayers_.size(); ++i) {
    auto input = getInput(i);
    CHECK(input.value) << "The input of 'fc' layer must be matrix";
    REGISTER_TIMER_INFO("FwMulTimer", getName().c_str());
    i == 0 ? outV->mul(*input.value, *weights_[i]->getW(), 1, 0)
           : outV->mul(*input.value, *weights_[i]->getW(), 1, 1);
  }

  /* add the bias-vector */
  if (biases_.get() != NULL) {
    REGISTER_TIMER_INFO("FwBiasTimer", getName().c_str());
    outV->addBias(*(biases_->getW()), 1);
  }

  /* activation */ {
    REGISTER_TIMER_INFO("FwAtvTimer", getName().c_str());
    forwardActivation();
  }
  ......
}

SequencePoolLayer will use sequence info whereas FullyConnectedLayer will just pass sequence info to next layer.

And for the transmission of sequence info, every derived layer class have to call forward method in base class, Just like FullyConnectedLayer does.

virtual void forward(PassType passType) {
    passType_ = passType;
    if (!inputLayers_.empty() && needSequenceInfo_) {
      const Argument& input = getInput(0);
      output_.sequenceStartPositions = input.sequenceStartPositions;
      output_.subSequenceStartPositions = input.subSequenceStartPositions;
      output_.cpuSequenceDims = input.cpuSequenceDims;
    }
  }

In Python api, there are actually 4 * 3 = 12 kinds of input data.

Four data types:

  • dense vector
  • sparse binary vector
  • sparse float vector
  • integer

Three sequence types:

  • non sequence
  • sequence
  • sub sequence

And DataProviderConverter is defined to convert Python input data to C++ Argument.

New solution

If we follows the design of Paddle formerly, we can have LOD Tensor like this:

class LOD : public std::vector<Vector<size_t>> {
public:
  LOD SliceLevels(size_t level_begin, size_t level_end) const;
  LOD SliceInLevel(size_t level, size_t elem_begin, size_t elem_end) const;
};
  
class LODTensor {
private:
  LOD* lod_
  Tensor* data_;  
};

After we replace Tensor in current codes with LODTensor, only sequence related Op will handle both lod_ and data_ field in both InferShape and Run method, and other Op will handle data_ field and pass lod_ field.

LODTensor will exposed to Python, users can set sequence info and data directly. But for the consistency with v2 api, we need to implement a converter which takes data reader in, and produce LODTensor out.

To use composition, we can unify the data type and avoid potential type deduction.

The cost is that there will be an additional LOD pointer field, which takes 4 bytes. If we have 1000 tensors which do not contain sequence info, it will use 4KB memory more.

@luotao1
Copy link
Contributor

luotao1 commented Aug 28, 2017

For layer who does not need sequence info, it will just use data value; for layer who need sequence info, it will both use value and sequence info.

所有的layer都有sequence info。在上面的例子中,fc输出的sequence info默认是和输入的一样,见layer.h

virtual void forward(PassType passType) {
    passType_ = passType;
    if (!inputLayers_.empty() && needSequenceInfo_) {
      const Argument& input = getInput(0);
      output_.sequenceStartPositions = input.sequenceStartPositions;
      output_.subSequenceStartPositions = input.subSequenceStartPositions;
      output_.cpuSequenceDims = input.cpuSequenceDims;
    }
  }

这样设计的目的是:比如maxlayer1->fc->maxlayer2。
如果fc不带sequence info,那么到maxlayer2的时候,就没有sequence info可以用了。

After we replace Tensor in current codes with LODTensor, only sequence related Op will handle both lod_ and data_ field in both InferShape and Run method, and other Op will handle data_ field.

所以所有的Op都需要带有lod_和data_两个field。

@QiJune
Copy link
Member

QiJune commented Aug 28, 2017

@luotao1 Thanks for pointing out! I will updated my comments accordingly.

@wangkuiyi
Copy link
Collaborator

In my mind, LODTensor is a subclass of Tensor, so it inherits all methods from Tensor.

I agree with @QiJune that LoD is optional, and with @luotao1 that most layers would just copy LoD information from input to output.

@QiJune
Copy link
Member

QiJune commented Aug 29, 2017

@wangkuiyi @reyoung @Superjom
LODTensor具有传染性,一旦输入数据拥有了LOD field,那么输出也就拥有LOD field。当然,一些Op实现了sequence pooling,可以把LOD信息给抹去。对于这些Op而言,输入数据类型与输出数据类型是不一致的。

举例如下:

input_data = LODTensor
LODTensor = Op1(LODTensor)
LODTensor = Op2(LODTensor)
Tensor = Op3(LODTensor)

这些特定的Op,比如Op3,输入必须是LODTensor,输出数据类型必须是Tensor。而对于其他的Op,比如Op1,Op2,则输出类型与输入类型保持一致!即如果输入是Tensor,输出也是Tensor;如果输入是LODTensor,输出也是LODTensor。

所以,需要有一个InferType的过程。在InferType的时候,需要判断输入数据的类型。

  • 对于一般的Op,其输出的类型与输入保持一致
  • 对于特殊的Op,其输入数据类型必须为LODTensor,其输出数据类型必须为Tensor

而且InferType必须要在InferShape之前完成,因为InferShape给数据设置size的时候,需要知道数据的类型是什么

但是Type信息实际上是在运行时,根据用户输入数据的类型才能拿到的。而variable的GetMutable接口,必须在编译期确定一个类型。

这不是一个矛盾吗?

@reyoung
Copy link
Collaborator Author

reyoung commented Aug 29, 2017

不需要有infertype的过程。之前听 @wangkuiyi @Superjom 的讨论,我们要将所有Op的输入和输出都改成LODTensor,无论这个Op使用不使用LOD信息。

@QiJune
Copy link
Member

QiJune commented Aug 29, 2017

那现在要做的事情有三个:

  • 把C++中目前使用Tensor的地方都改成LODTensor
  • Python端也需要改为暴露LODTensor
  • 对于有多个输入,多个输出的Operator,需要设计一种机制,来确定每个输出的LOD信息从那个特定的输入获取到的

@wangkuiyi @reyoung @Superjom 我理解的对吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants