Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor Structure Definition #1

Closed
tqchen opened this issue Feb 24, 2017 · 23 comments
Closed

Tensor Structure Definition #1

tqchen opened this issue Feb 24, 2017 · 23 comments

Comments

@tqchen
Copy link
Member

tqchen commented Feb 24, 2017

migrated from apache/mxnet#4735

@tqchen
Copy link
Member Author

tqchen commented Feb 24, 2017

I take a stab in making the basic tensor structure here https://github.com/dmlc/dlpack/blob/master/include/dlpack/dlpack.h based on existing discussions

@tqchen
Copy link
Member Author

tqchen commented Feb 27, 2017

I would like to open a vote discussion for current data structure in dlpack.h, please voice your opinion or simply vote by +1 if you agree. Everyone is welcomed to vote and voice their opinion. If there is no objection in the proposal in one week, we will consider it finalized.

@soumith @edgarriba @bhack @piiswrong @Yangqing

@Yangqing
Copy link
Contributor

Thanks @tqchen ! It looks great in general, a few minor comments:

  • What is kCPUPinned? I understand the implication (i.e. cudaMallocHost) but it seems that the implementation won't distinguish kCPU vs kCPUPinned. A bit more documentation would be great.

  • For DLTypeCodeKind, it might be good to explicitly specify the length, like int8_t, int16_t, int32_t and int64_t. Using int and long has been a headache because compilers have different ways of dealing with them, even templating (see https://github.com/caffe2/caffe2/blob/master/cmake/MiscCheck.cmake#L5-L22)

  • For DLContext - did you intend to use enum instead of int?

  • For DLTensor - I like the opaque pointer! This has been something I was very vocal about but not that well accepted by some frameworks.

@bhack
Copy link
Collaborator

bhack commented Feb 27, 2017

/cc @naibaf7

@tqchen
Copy link
Member Author

tqchen commented Feb 27, 2017

@Yangqing One thing that hangs on the DLDataType, is to make it future compatibile with weird types, like int1_t.

But you are right that having such thing makes it a bit hard for templatizing, maybe we can consider use bit packing to pack the code and bits into a single int.

@bhack
Copy link
Collaborator

bhack commented Feb 27, 2017

Taking a look to TF data types seems that bool could be useful. What about quantized?

@tqchen
Copy link
Member Author

tqchen commented Feb 27, 2017

The current data type cane expression quantized data types like int4_t, by specifying bits=4 and type_code=kInt.

I think what @Yangqing mentioned is being able to use type constant as template argument, using a struct kinda of goes against that. Maybe using a bit packed integer(with same layout as struct) helps to resolve that problem

@tqchen
Copy link
Member Author

tqchen commented Feb 27, 2017

@Yangqing Consider open a PR with respect your suggested changes?

@Yangqing
Copy link
Contributor

@tqchen Might be a bit tricky due to my day duties but I'll try at my earliest availability :)

@Yangqing
Copy link
Contributor

RE templating - I was basically promoting the use of explicit int32_t, int64_t instead of int and long, because that makes cross-platform much easier. I don't think having a struct / enum blocks it and I am not against it :)

@tqchen
Copy link
Member Author

tqchen commented Feb 27, 2017

Agree with explicit sized types.

@bhack
Copy link
Collaborator

bhack commented Feb 27, 2017

Yes. Also if 4bit integer as I remember it is not covered by fixed width integer types right?

@Yangqing
Copy link
Contributor

@bhack for anything under 8 bit we probably need to be creative.

@tqchen
Copy link
Member Author

tqchen commented Feb 27, 2017

Normally you do not want to have a array with int4, instead it will be something like int4x32 (a vector type with 32 4bit integer), so operations are vectorized. With the existing type system, it is

type_code=int, bits=4, lanes=32

@bhack
Copy link
Collaborator

bhack commented Feb 27, 2017

And for larger registers?

@bhack
Copy link
Collaborator

bhack commented Feb 27, 2017

We will handle this with lanes?

@tqchen
Copy link
Member Author

tqchen commented Feb 27, 2017

The tensor structure does not define the register types, it only defines the flags for the data types. I suppose the operator implementation should deal with vectorization and we know before hand that the vector type can be loaded safely from the storage without worrying about alignment, e.g. opencl have type float4.

@bhack
Copy link
Collaborator

bhack commented Feb 27, 2017

Ok it is only flag releated.

@Yangqing Yangqing mentioned this issue Feb 28, 2017
@Yangqing
Copy link
Contributor

/cc @bwasti FYI. What if we have the use case of int5 or int7? :)

(Not saying that we will need those in actual computation unless we have specialized hardware, but might be interesting in serialization if we ever need it.)

@tqchen
Copy link
Member Author

tqchen commented Feb 28, 2017

The usecase of int5 or int7 could be a good reason to keep bit encoded in type codes, but maybe only defined common enums to let user choose from.

@tqchen
Copy link
Member Author

tqchen commented Mar 11, 2017

I am going to close the vote in two days, given it is two weeks since the last change and we will view the data structure as pinned.

@bhack
Copy link
Collaborator

bhack commented Mar 11, 2017

Ok

@Yangqing
Copy link
Contributor

sgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants