Re-define the type system #8352

wangkuiyi · 2018-02-10T03:13:19Z

Related with #8339

This PR is not supposed to work. It is just a reference to how we need to re-design Fluid's type system.

jacquesqiao · 2018-02-10T12:00:26Z

paddle/framework/framework.proto

  }
+  optional LoDTensorDesc lod_tensor = 3;


LoDTensorDesc => LoDTensor?

jacquesqiao · 2018-02-10T12:03:19Z

paddle/framework/framework.proto

+  required Type type = 1;
+
+  message Tensor {
+    required Type data_type = 1;


Is the data_type here for Tensor only POD types？

You are right. Our C++ implementation of Tensor requires that elements have to be C++ POD types:

https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/tensor.h#L169

It looks to me that we can check in C++ code that data_type must be of POD types. But I am open to better solutions. What do you think?

As suggested by @wangkuiyi , one way could be to leave it as it is and check in the C++ code that data_type must be a POD. Another way could be to define this inside protobuf. We can split Type into 2 different enums :

enum PodType { // POD types BOOL = 0; INT16 = 1; INT32 = 2; INT64 = 3; FP16 = 4; FP32 = 5; FP64 = 6; } enum CompositeType { LOD_TENSOR = 1; // LoDTensor SELECTED_ROWS = 2; // Tensor FEED_MINIBATCH = 3; FETCH_LIST = 4; STEP_SCOPES = 5; LOD_RANK_TABLE = 6; // LOD_TENSOR_ARRAY = 7; // replaced by array-of-tensors. c.f. ArrayType // PLACE_LIST = 8; // replaced by array-of-place. c.f. ArrayType READER = 7; // ReaderType }

Once we have done this, we can define a field type in VarType as follows:

message VarType { oneof type { PodType ptype = 1; CompositeType ctype = 2; } } ```protobuf This way we can capture the requirement of our Tensor and LODTensor inside protobuf message. The tensor can be defined as follows: ```protobuf message Tensor { required PodType data_type = 1; repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480] }

What do you guys think @wangkuiyi and @jacquesqiao ?

jacquesqiao · 2018-02-10T12:08:46Z

paddle/framework/framework.proto

+  }
+  optional Array array = 4;
+
+  message Reader { repeated LoDTensorDesc lod_tensor = 1; }


LoDTensorDesc => LoDTensor

jacquesqiao · 2018-02-10T12:59:37Z

paddle/framework/framework.proto

+    FP64 = 6;
+
+    // other types, that might need additonal descriptions
+    LOD_TENSOR = 1; // LoDTensorType


because other types are in the same enum Type with POD types, the index should not be duplicated. So the index should not start from 1.

…type_system

chengduoZH · 2018-02-11T02:11:54Z

paddle/fluid/framework/framework.proto


-message ReaderDesc { repeated LoDTensorDesc lod_tensor = 1; }
+  message Array { VarType elem_type = 1; }


Should the VarType be repeated?
And message Array can be used as array of VarType in which the VarType should be the same and n-tuple in which the VarType of the array can be different, right?

I don't think the VarType in the array should be repeated. This is because the array will have all elements of the same VarType. In an n-tuple, the VarType will be repeated.

I agree with Abhinav's comment.

@abhinavarora Thanks, you are right, I finally know where I was wrong.
We should add message Tuple {repeated VarType tuple_type = 1;}

Yes this has also been discussed here: #8336 Instead of VarDesc we will have VarType as has been done in this PR.

Yes, channel of channels is a commonly used pattern used with CSP

If a channel can hold so many types, we cannot make shape inference easily at compile time.

I think the Tuple and Array should be two different types.

The Array can hold elements with the same type and shape at compile type. The shape of array_read can be inferenced at compile time only if the array holds the same time.

The length of Array can only be decided at runtime since there are array_write operators for RNN. If we make array can hold different types, we cannot inference type at compile time.

The Tuple is an immutable type. The length of Tuple and its elements will not be changed at compile time.

@reyoung You are right, Tuple and Array are 2 different types. The type of the array elements will be known at compile time. Tuple should be immutable and its length and elements will not change. I am just a little curious how we could implement thin our cpp code. Today, we map the LODTensorArray directly to a Vector<LODTensor>. For an Array would we have to create mappings for all types?

would we have to create mappings for all types?

I am curious about this too.

abhinavarora

Thank you for the PR! This was exactly as we discussed on Friday. I have few more questions that I have asked in the comments.

abhinavarora · 2018-02-11T03:26:07Z

Since @wangkuiyi was unable to see my comment, I am pasting it here again:

There are 2 ways to make sure that Tensors and LODTensors only contain POD data types. One way, as suggested by @wangkuiyi, is to leave it as it is and check in the C++ code that data_type must be a POD. Another way could be to define this inside protobuf. We can split Type into 2 different enums :

enum PODType {
    // POD types
    BOOL = 0;
    INT16 = 1;
    INT32 = 2;
    INT64 = 3;
    FP16 = 4;
    FP32 = 5;
    FP64 = 6;
}

enum CompositeType {
    LOD_TENSOR = 1;    // LoDTensor
    SELECTED_ROWS = 2; // Tensor
    FEED_MINIBATCH = 3;
    FETCH_LIST = 4;
    STEP_SCOPES = 5;
    LOD_RANK_TABLE = 6;
    // LOD_TENSOR_ARRAY = 7; // replaced by array-of-tensors. c.f. ArrayType
    // PLACE_LIST = 8; // replaced by array-of-place. c.f. ArrayType
    READER = 7; // ReaderType
}

Once we have done this, we can define a field type in VarType as follows:

message VarType {
    oneof type {
        PODType ptype = 1;
        CompositeType ctype = 2;
    }
   ...
}

This way we can capture the requirement of our Tensor and LODTensor inside protobuf message. The tensor can be defined as follows:

message Tensor {
    required PodType data_type = 1;
    repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
  }

More information about oneof in protobuf can be found here
What do you guys think @wangkuiyi and @jacquesqiao ?

kavyasrinet · 2018-02-11T03:29:51Z

Isn't this similar to what @wangkuiyi has proposed in the PR , except for splitting the VarType into two separate enums ? Or am I missing something ?

kavyasrinet

Thank you so much for starting off the PR. The refinement done here should suffice for now and is the same idea as we had discussed on Friday.

abhinavarora · 2018-02-11T03:31:08Z

@kavyasrinet The above comment was in response to the following thread -> #8352 (comment)

kavyasrinet · 2018-02-11T03:32:27Z

I see. Thanks for the clarification, I agree with both the approaches.

JiayiFeng · 2018-02-11T04:15:06Z

paddle/fluid/framework/framework.proto

-    LOD_TENSOR_ARRAY = 7;
-    PLACE_LIST = 8;
-    READER = 9;
-  }
  required string name = 1;
  required VarType type = 2;


Can VarDesc::type be POD type like BOOL, INT16? Or it can only be LOD_TENSOR, SELECTED_ROWS...?

VarType::Type contains POD type.

reyoung · 2018-02-12T04:33:59Z

paddle/fluid/framework/framework.proto

+    FETCH_LIST = 10;
+    STEP_SCOPES = 11;
+    LOD_RANK_TABLE = 12;
+    // LOD_TENSOR_ARRAY = 7; // replaced by array-of-tensors. c.f. ArrayType


All tensors in the LOD_TENSOR_ARRAY will hold the same type and the same shape in compile time.

Good point. So LoDTensorArray is not Arary. We got that. Thanks for the reminder!

jacquesqiao · 2018-02-12T07:06:23Z

@abhinavarora @wangkuiyi I personally prefer the first way to separate the two kinds of Type: PODType and CompositeType. There are several reasons:

In the future, if we want to add some new POD type to our framework.proto, like int8, it will be better to add it to the end of PODType::FP64 but not Type::LOD_RANK_TABLE.
IsPOD will be easier to implement with oneof, we did not need to repeat all the types in POD when do the check in C++ and Python.

wangkuiyi · 2018-02-12T18:34:22Z

@jacquesqiao @abhinavarora @reyoung and everyone: Let us use a single enum and do NOT separate POD types from composite types, as I posted in the PR.

I understand your point is due to the fact that the composite type Tensor requires that its elements are POD types. Let us follow this logic and imagine that in the future we might have an additional composite type, whose elements must be, say, tensors. How are we going to handle this? Do we need to separate LOD_TENSOR and SELECTED_ROW out from CompositeType and make it a new enum named TensorType? If we do so, we will have

enum PODType {...}
enum TensorType {...}
enum CompositeType {...}
message VarType {
  oneof {
    PODType
    TensorType
    CompositeType
  }
  ...
}

There are two critical problems with the above strategy:

It is intractable to have an asymptotically infinite amount of enums with the development of Fluid.

Logically, TensorType and CompositeType are not peers to each other and cannot simply be described by the oneof keyword. Instead, we must have methods like:

bool IsPOD(const VarType&);
bool IsComposite(const VarType&); // returns true for tensors and other composite types
bool isTensor(const VarType&); // returns true for tensors but not other composite types

paddle-bot-old · 2020-05-22T06:40:20Z

Since you haven't replied for a long time, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您长期未回复，我们将关闭这个issue/pr。
若问题未解决或有后续问题，请随时重新打开，我们会继续跟进。

Re-define the type system

e415000

wangkuiyi requested review from abhinavarora, reyoung, jacquesqiao, kavyasrinet, JiayiFeng and chengduoZH February 10, 2018 03:13

jacquesqiao reviewed Feb 10, 2018

View reviewed changes

wangkuiyi added 2 commits February 10, 2018 13:39

Merge branch 'develop' of http://github.com/paddlepaddle/paddle into …

89a64e0

…type_system

In response to comments from Longfei

1d57804

chengduoZH reviewed Feb 11, 2018

View reviewed changes

abhinavarora reviewed Feb 11, 2018

View reviewed changes

kavyasrinet reviewed Feb 11, 2018

View reviewed changes

JiayiFeng reviewed Feb 11, 2018

View reviewed changes

reyoung reviewed Feb 12, 2018

View reviewed changes

kavyasrinet added this to Doing in Re-design the type system of Fluid Feb 12, 2018

abhinavarora mentioned this pull request Feb 12, 2018

Separate VarType from VarDesc in framework.proto and fix all related compiler errors #8414

Merged

kavyasrinet mentioned this pull request Feb 14, 2018

Move POD_Type inside of VarType #8445

Closed

abhinavarora mentioned this pull request Feb 14, 2018

Move DataType enum inside VarType #8447

Merged

This was referenced Feb 22, 2018

Implement the type of data in channel #8284

Closed

Add tuple type #8519

Merged

paddle-bot-old bot closed this May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-define the type system #8352

Re-define the type system #8352

wangkuiyi commented Feb 10, 2018

jacquesqiao Feb 10, 2018

jacquesqiao Feb 10, 2018

wangkuiyi Feb 10, 2018

abhinavarora Feb 11, 2018 •

edited

Loading

jacquesqiao Feb 10, 2018

jacquesqiao Feb 10, 2018

chengduoZH Feb 11, 2018

abhinavarora Feb 11, 2018

kavyasrinet Feb 11, 2018

chengduoZH Feb 11, 2018

kavyasrinet Feb 11, 2018

wangkuiyi Feb 11, 2018

reyoung Feb 12, 2018

reyoung Feb 12, 2018

abhinavarora Feb 12, 2018

chengduoZH Feb 22, 2018 •

edited

Loading

abhinavarora left a comment

abhinavarora commented Feb 11, 2018 •

edited by wangkuiyi

Loading

kavyasrinet commented Feb 11, 2018

kavyasrinet left a comment

abhinavarora commented Feb 11, 2018

kavyasrinet commented Feb 11, 2018

JiayiFeng Feb 11, 2018

jacquesqiao Feb 11, 2018

reyoung Feb 12, 2018 •

edited

Loading

wangkuiyi Feb 17, 2018

jacquesqiao commented Feb 12, 2018 •

edited

Loading

wangkuiyi commented Feb 12, 2018 •

edited

Loading

paddle-bot-old bot commented May 22, 2020


		message ReaderDesc { repeated LoDTensorDesc lod_tensor = 1; }
		message Array { VarType elem_type = 1; }

Re-define the type system #8352

Re-define the type system #8352

Conversation

wangkuiyi commented Feb 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavarora Feb 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chengduoZH Feb 22, 2018 • edited Loading

Choose a reason for hiding this comment

abhinavarora left a comment

Choose a reason for hiding this comment

abhinavarora commented Feb 11, 2018 • edited by wangkuiyi Loading

kavyasrinet commented Feb 11, 2018

kavyasrinet left a comment

Choose a reason for hiding this comment

abhinavarora commented Feb 11, 2018

kavyasrinet commented Feb 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung Feb 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacquesqiao commented Feb 12, 2018 • edited Loading

wangkuiyi commented Feb 12, 2018 • edited Loading

paddle-bot-old bot commented May 22, 2020

abhinavarora Feb 11, 2018 •

edited

Loading

chengduoZH Feb 22, 2018 •

edited

Loading

abhinavarora commented Feb 11, 2018 •

edited by wangkuiyi

Loading

reyoung Feb 12, 2018 •

edited

Loading

jacquesqiao commented Feb 12, 2018 •

edited

Loading

wangkuiyi commented Feb 12, 2018 •

edited

Loading