tensor draft for review #2645

JiayiFeng · 2017-06-28T07:38:25Z

This is only a draft of tensor implementation for review and it has not been fully tested.
So DO NOT MERGE THIS PR.
Comments are welcome if you have any idea or suggestion for my code.

In addition, during my coding work I came up with several other questions that need to be discussed:

Do we need to rename functions from Majel (such as member functions of DDim and Place) to follow google c++ style?
In ported Majel code, we use glog to do asserting jobs, while in the other part of refactored Paddle we plan to use our own macro(paddle/platform/assert.h). Are we going to unify them?
Stride is removed in current tensor design. Is that going to be a problem?

reyoung · 2017-06-28T08:28:49Z

paddle/framework/tensor.cc

+  return static_cast<const T*>(holder->Ptr());
+}
+
+bool Tensor::NeedReset() {


NeedReset () const

pkuyym · 2017-06-28T08:38:11Z

paddle/framework/tensor.cc

+
+template <typename T>
+void CopyFrom(const Tensor& src) {
+  if ((void*)&src == (void*)this) {


Maybe need more checking? Is CopyFrom always success?

There are some checking in Tensor::Data(). I can't find out more potential problems for the moment... Do you have any idea?

qingqing01 · 2017-06-28T08:19:07Z

paddle/framework/tensor.cc

+  T* src_ptr = src.Data<T>();
+  T* dst_ptr = MutableData<T>(src.Dims());
+  for (int i = 0; i < len; ++i) {
+    dst_ptr[i] = src_ptr[i];


This is not correct for tensor with GPUPlace.

Yes, it's just for CPU at present. GPU part will be implemented later.

jacquesqiao · 2017-06-28T08:59:47Z

paddle/framework/tensor.cc

+
+void Tensor::Resize(const DDim& dims) {
+  dims_ = dims;
+  return;


Why add so many return at the end of a void func()

It's just my personal habit... I'm used to adding return; to mark all return points of a void function.
Of course, they can be removed, if they break common code style or might cause confusion.

Please strictly follow a common style as we work as a team. Let's remove all unnecessary code.

jacquesqiao · 2017-06-28T09:04:28Z

paddle/framework/tensor.cc

+  if (product(dims) != product(dims_)) {
+    // TODO: error: "Reshape() can not change tensor's numel".
+  }
+  _dims = dim;


_dims ==> dims_
what is the differents between Resize and Reshape, have a check step?

Resize() can change tensor's numel so the data block might be re-allocated, while Reshape() never change numel and all data will certainly be retained. When users invoke Resize() and Reshape(), they have different expectations about how we will handle their data (retain or discard).

However, from the implementation perspective, the difference is really just a checking step.

reyoung · 2017-06-28T09:10:12Z

paddle/framework/tensor.cc

+namespace framework {
+
+template <typename T>
+const T* Tensor::Data() const {


Template should implement in header.

reyoung · 2017-06-28T09:14:22Z

paddle/framework/tensor.cc

+
+void Tensor::Reshape(const DDim& dims) {
+  if (product(dims) != product(dims_)) {
+    // TODO: error: "Reshape() can not change tensor's numel".


Why can not reshape change tensor's numel?

In each mini-batch training, each output of Op should be Reshape, because the shape of input tensor could be changed.

Users can use Resize() to change tensor's numel.

reyoung · 2017-06-28T09:24:28Z

paddle/framework/tensor.h

+          size_(size) {}
+
+    virtual std::type_info TypeInfo() const { return typeid(T); }
+    virtual void* Ptr() const { return static_cast<void*>(ptr_.get()); }


Maybe Ptr() should not be added in interface, it should return T* like

struct Placeholder { virtual ~Placeholder() {} // for rtti }; template <typename T> struct PlaceholderImpl : public Placeholder { T* Ptr() const { return ptr_.get(); } }; template <typename T> const T* Data() const { auto holder = std::dynamic_pointer_cast<PlaceholderImpl<T>>(holder_); ASSERT(holder != nullptr); return holder->Ptr(); }

Using static_cast everywhere make compiler cannot check type for us.

See PR #2647

Xreki · 2017-06-28T09:37:43Z

paddle/framework/tensor.cc

+const T* Tensor::Data() const {
+  PADDLE_ASSERT(holder_ != nullptr);
+  PADDLE_ASSERT(holder_->Place() == place_);
+  PADDLE_ASSERT(holder_->Size() >= product(dims_) * sizeof(T));


Do we need to check the type of holder_ here?

It is worth discussing, I think.

Xreki · 2017-06-28T09:43:15Z

paddle/framework/tensor.cc

+
+bool Tensor::NeedReset() {
+  return (holder_ == nullptr || holder_->Place() != place_ ||
+          holder_->Size() < product(dims_) * sizeof(T));


There is no T. Do there need a template? And again, do we need to check the type of holder_ here?

QiJune · 2017-06-28T10:06:27Z

paddle/framework/tensor.cc

+
+void Tensor::ShareData(const Tensor& src) {
+  if (src.NeedReset()) {
+    // TODO: error: "Src tensor need to be reseted before calling ShareData()".


What's the meaning of reset? Does it actually mean resize?

NeedReset means the underlying memory block needs to be re-allocated.

hedaoyuan · 2017-06-28T12:38:22Z

paddle/framework/tensor.cc

+  return holder_;
+}
+
+const DDim& Tensor::Dims() const { return dims_; }


Put these simple functions into the header file, the compiler can do inline optimization.

hedaoyuan · 2017-06-28T12:47:48Z

paddle/framework/tensor.h

+    int len = product(src.Dims());
+    T* src_ptr = src.Data<T>();
+    T* dst_ptr = MutableData<T>(src.Dims());
+    for (int i = 0; i < len; ++i) {


Why not use memcpy?

wangkuiyi

I am not sure we need many methods defined here. By the Occam's Razor, I'd delete those that are not mandetory.

wangkuiyi · 2017-06-28T20:24:33Z

paddle/framework/tensor.cc

+
+void Tensor::Resize(const DDim& dims) {
+  dims_ = dims;
+  return;


Please strictly follow a common style as we work as a team. Let's remove all unnecessary code.

wangkuiyi · 2017-06-28T20:25:13Z

paddle/framework/tensor.cc

+
+int Tensor::Numel() const { return product(dims_); }
+
+void Tensor::Resize(const DDim& dims) {


In my design, I don't see that we need Resize. Indeed, I followed @qingqing01 's suggestion and removed the ability of the constructor to set the size.

Do you mean that if users want to change tensor's size, they can invoke mutable_data(DDim) directly?

wangkuiyi · 2017-06-28T20:25:57Z

paddle/framework/tensor.cc

+  return;
+}
+
+const std::shared_ptr<Tensor::Placeholder>& Tensor::Holder() const {


I think that users don't need to and shouldn't know the concept "holder", which is defined as a private type inside Tensor in my design.

wangkuiyi · 2017-06-28T20:27:12Z

paddle/framework/tensor.h

+
+  // must be POD types
+  template <typename T, typename = std::enable_if<std::is_pod<T>::value>::type>
+  T* MutableData() {


Please follow Google C++ style guide carefully -- there is a reason there of using the name mutable_data.

Accessors and mutators (get and set functions) may be named like variables. These often correspond to actual member variables, but this is not required.

Yes, it' ok to using mutable_data().

By the way, do we need to rename functions in DDim and Place to follow Google C++ style?

I think it is a good idea.

wangkuiyi · 2017-06-28T20:27:30Z

paddle/framework/tensor.h

+  }
+
+  template <typename T>
+  void CopyFrom(const Tensor& src) {


Why we need CopyFrom?

We may need CopyFrom when the src Tensor is on GPU. We may need to copy GPU Tensor to CPU and do something such as writing to a file.

I agree with @Xreki .

Sounds like what we need is a method Serialize that fills in a protobuf message; instead a Copy or CopyFrom?

void Tensor::Serialize(TensorProto* proto) { if (is_gpu_place(Place()) { Tensor cpu_tensor; cudaMemcpy(cpu_tensor.mutable_data(Size(), CPUPlace()), data(), Size()); cpu_tensor.Serialize(proto); } else { // fill in proto } }

… feature/tensor

Xreki · 2017-06-29T09:21:53Z

paddle/framework/tensor.h

+  }
+
+  template <typename T>
+  bool NeedReset() const {


I think @hedaoyuan 's comment means these functions can be inlined by using inline keyword.

I think there is no inline keyword that will also be inline.

…tensor_draft

wangkuiyi

This PR started with many features not necessary or shouldn't be there (partially due to a misunderstanding of the lazy memory allocation described in https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/tensor.md).

As the review procedure going on, it's getting closer and closer to #2611.

I'd suggest that we close this PR, and merge #2611. After that, we can change it to use shared_ptr other than unique_ptr and support shared data.

wangkuiyi · 2017-06-30T19:57:15Z

paddle/framework/tensor.h

+  using paddle::platform::get_place;
+
+ public:
+  explicit Tensor(DDim dims) : dims_(dims), place_(get_place()) {}


Please be aware that I followed @qingqing01 's suggestion 3 days ago #2611 (review) and removed Tensor::Tensor, Tensor::place_, Tensor::dims_ in my PR #2611 so to make a concise and consistent syntax.

wangkuiyi · 2017-06-30T19:57:32Z

paddle/framework/tensor.h

+  }
+
+  template <typename T>
+  bool NeedReset() const {


NeedReset is not something should be exposed.

wangkuiyi · 2017-06-30T19:57:59Z

paddle/framework/tensor.h

+  }
+
+  // must be POD types
+  template <typename T, typename = std::enable_if<std::is_pod<T>::value>::type>


Also, I followed @qingqing01 's suggestion and removed multiple signatures of mutable_data.

wangkuiyi · 2017-06-30T19:58:22Z

paddle/framework/tensor.h

+    return mutable_data<T>();
+  }
+
+  int Rank() const { return arity(dims_); }


This should be removed following @qingqing01 's suggestion.

wangkuiyi · 2017-06-30T19:58:37Z

paddle/framework/tensor.h

+
+  int Rank() const { return arity(dims_); }
+
+  int Numel() const { return product(dims_); }


This is should be removed too.

wangkuiyi · 2017-06-30T19:59:11Z

paddle/framework/tensor.h

+
+  int Numel() const { return product(dims_); }
+
+  void Reshape(const DDim& dims) {


There is no need for Reshape, because we can simply call mutable_data(new_place, new_dim).

wangkuiyi · 2017-06-30T20:01:22Z

paddle/framework/tensor.h

+  const paddle::platform::Place& Place() const { return place_; }
+
+  template <typename T>
+  bool IsType() const {


IsType should be removed because we can interpret a Tensor of any type -- just call mutable_data<T>(place, dim).

wangkuiyi · 2017-06-30T20:01:51Z

paddle/framework/tensor.h

+    size_t size_;  // size of the memory block.
+  };
+
+  std::shared_ptr<Placeholder> holder_;  // holds the memory block if allocated.


Following @qingqing01 's suggestion, dims_ and place should be removed.

JiayiFeng · 2017-07-01T02:30:58Z

I agree to close this PR and merge #2611.

wangkuiyi · 2017-07-03T00:47:57Z

@Canpio Could you please approve PR #2611 ? Thanks.

JiayiFeng · 2017-07-03T02:22:59Z

see #2611

tensor draft for review

8912719

JiayiFeng requested review from reyoung, wangkuiyi, pkuyym, luotao1, qingqing01, Xreki, hedaoyuan and QiJune June 28, 2017 07:42

JiayiFeng mentioned this pull request Jun 28, 2017

Add tensor.h #2611

Merged

reyoung reviewed Jun 28, 2017

View reviewed changes

pkuyym reviewed Jun 28, 2017

View reviewed changes

qingqing01 reviewed Jun 28, 2017

View reviewed changes

jacquesqiao reviewed Jun 28, 2017

View reviewed changes

fix some mistyping

271ee6d

reyoung reviewed Jun 28, 2017

View reviewed changes

Xreki reviewed Jun 28, 2017

View reviewed changes

QiJune reviewed Jun 28, 2017

View reviewed changes

move all template implementations to tensor.h

db4347e

hedaoyuan reviewed Jun 28, 2017

View reviewed changes

wangkuiyi reviewed Jun 28, 2017

View reviewed changes

JiayiFeng and others added 3 commits June 29, 2017 15:25

move all tensor implementation into .h file

b73b8a0

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ebf06ad

… feature/tensor

Replace product(src.Dims()) by Numel() in tensor.

67fe709

Xreki reviewed Jun 29, 2017

View reviewed changes

jacquesqiao added this to Doing in PaddlePaddle Refactoring: Phase 1 Jun 29, 2017

JiayiFeng added 2 commits June 29, 2017 19:42

remove CopyFrom() and Resize()

d73ee9e

Merge branch 'tensor_draft' of https://github.com/Canpio/Paddle into …

6366d12

…tensor_draft

wangkuiyi requested changes Jun 30, 2017

View reviewed changes

JiayiFeng closed this Jul 3, 2017

JiayiFeng deleted the tensor_draft branch August 1, 2017 18:55

reyoung moved this from Doing to Done in PaddlePaddle Refactoring: Phase 1 Aug 2, 2017


		int Tensor::Numel() const { return product(dims_); }

		void Tensor::Resize(const DDim& dims) {


		int Rank() const { return arity(dims_); }

		int Numel() const { return product(dims_); }


		int Numel() const { return product(dims_); }

		void Reshape(const DDim& dims) {

tensor draft for review #2645

tensor draft for review #2645

Conversation

JiayiFeng commented Jun 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiayiFeng Jun 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiayiFeng Jun 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacquesqiao Jun 28, 2017 • edited Loading

Choose a reason for hiding this comment

JiayiFeng Jun 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangkuiyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiayiFeng Jun 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiayiFeng Jun 29, 2017 • edited Loading

Choose a reason for hiding this comment

wangkuiyi Jun 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangkuiyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JiayiFeng commented Jul 1, 2017

wangkuiyi commented Jul 3, 2017 • edited Loading

JiayiFeng commented Jul 3, 2017

JiayiFeng commented Jun 28, 2017 •

edited

Loading

JiayiFeng Jun 28, 2017 •

edited

Loading

JiayiFeng Jun 28, 2017 •

edited

Loading

jacquesqiao Jun 28, 2017 •

edited

Loading

JiayiFeng Jun 28, 2017 •

edited

Loading

JiayiFeng Jun 29, 2017 •

edited

Loading

JiayiFeng Jun 29, 2017 •

edited

Loading

wangkuiyi Jun 29, 2017 •

edited

Loading

wangkuiyi commented Jul 3, 2017 •

edited

Loading