Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go optimizer: integrate Go with optimizer library #2610

Merged
merged 28 commits into from
Jul 3, 2017

Conversation

dzhwinter
Copy link
Contributor

fix #2516
related #2560

// TODO(zhihong): move compile flags to cmake go_library
#cgo pkg-config: protobuf
#cgo CFLAGS: -I ../../
#cgo LDFLAGS: /Users/dzh/.go/src/github.com/PaddlePaddle/Paddle/build/go/pserver/cclient/libpaddle_go_optimizer.a -lstdc++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/Users/dzh/.go/src/github.com need to be a relative path: #cgo LDFLAGS: -L${SRCDIR}/xxx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cgo cannot get cmake predefined variable here. use relative path here
fix done.

c := paramWithConfigs.Config
var cbuffer unsafe.Pointer
cbuffer = unsafe.Pointer(&p.Content[0])
o.opt = C.paddle_create_optimizer((*C.uchar)(&c[0]), C.int(len(c)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&p.Content[0] is a pointer allocated in Go's stack, which is owned by Go. The optimizer will own the parameter buffer. So we need to either make a copy of &p.Content[0] here, or inside the implementation of paddle_element_type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix done.

if len(p.Content) != len(g.Content) {
return fmt.Errorf("Name: %s, parameter and gradient length not match, parameter: %d, gradient: %d", p.Name, len(p.Content), len(g.Content))
func (o *optimizer) GetWeights(p *Parameter) error {
// FIXME: get weigths from optimizer has bug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimizer is working before, we should not check in code that has bug and make optimizer not working. I think we need to debug the bug before merge in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

r := C.paddle_update_parameter(o.opt, unsafe.Pointer(&p.Content[0]), C.paddle_element_type(p.ElementType), unsafe.Pointer(&g.Content[0]), C.int(len(g.Content)))
// FIXME: do we need a copy? discard g.Content by GC ok
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to make a copy of &g.Content[0], since C.paddle_update_parameter will not use &g.Content[0] anymore after return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

opt *optimizer
paramMap map[string]Parameter
mu sync.Mutex
// injection from parameter to optimizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the comment here, is this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

func (o *optimizer) UpdateParameter(p Parameter, g Gradient) error {
if len(p.Content) != len(g.Content) {
return fmt.Errorf("Name: %s, parameter and gradient length not match, parameter: %d, gradient: %d", p.Name, len(p.Content), len(g.Content))
func (o *optimizer) GetWeights(p *Parameter) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe func (o *optimizer) GetWeights() []byte is easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix done

ElementType: Int32,
}
p.Content = []byte{1, 3}
config, err := ioutil.ReadFile("./cclient/test/testdata/optimizer.pb.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file is not generated nor pushed to git.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is pushed under the testdata folder. here it is
https://github.com/dzhwinter/Paddle/blob/242df4f07bda6de94399cc60b31cb22f27a94434/go/pserver/cclient/test/testdata/optimizer.pb.txt
Binary file can not shown in PR reviews. : )

var nullPtr = unsafe.Pointer(uintptr(0))

type optimizer struct {
opt *C.struct_paddle_optimizer
// used in GetParam, reconstruct Parameter from optimizer
ElementType ElementType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ElementType ElementType -> elementType ElementType. We should not expose anything more than necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

var nullPtr = unsafe.Pointer(uintptr(0))

type optimizer struct {
opt *C.struct_paddle_optimizer
// used in GetParam, reconstruct Parameter from optimizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the code ElementType ElementType is already self documenting, if user want to know where ElementType is used, he can just search through the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

r := C.paddle_update_parameter(o.opt, unsafe.Pointer(&p.Content[0]), C.paddle_element_type(p.ElementType), unsafe.Pointer(&g.Content[0]), C.int(len(g.Content)))
fmt.Println(g)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cbuffer_len := int(unsafe.Sizeof(p.Content[0])) * len(p.Content)
cbuffer = C.malloc(C.size_t(cbuffer_len))
C.memcpy(cbuffer, unsafe.Pointer(&p.Content[0]), C.size_t(cbuffer_len))
o.opt = C.paddle_create_optimizer((*C.uchar)(&c[0]), C.int(len(c)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a functionfunc (o *optimizer) Release() that releases the C++ optimizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a funcition CleanUp

  76func (o *optimizer) Cleanup() {
  77  if unsafe.Pointer(o.opt) != nullPtr {
  78    C.paddle_release_optimizer(o.opt)
  79    o.opt = (*C.struct_paddle_optimizer)(nullPtr)
  80  }
  81}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! My bad!

p := paramWithConfigs.Param
c := paramWithConfigs.Config
var cbuffer unsafe.Pointer
cbuffer_len := int(unsafe.Sizeof(p.Content[0])) * len(p.Content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's a good idea to log here for Param.ElementType, len(Param.Content), len(Config.Content).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. Done.

C.memcpy(cbuffer, unsafe.Pointer(&p.Content[0]), C.size_t(cbuffer_len))
o.opt = C.paddle_create_optimizer((*C.uchar)(&c[0]), C.int(len(c)),
C.paddle_element_type(p.ElementType), cbuffer, C.int(len(p.Content)),
(*C.char)(nullPtr), 0)
Copy link
Contributor

@helinwang helinwang Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a TODO here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought CleanUp is enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my bad, I thought (*C.char)(nullPtr), 0 is for parameter config, but it's actually for recover state, which is not necessary here.

Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM++ Maybe need to rebase, because there are merge conflict.

// #cgo pkg-config: protobuf
// #cgo CFLAGS: -I ../../
// //FIXME: ldflags contain "build" path
// #cgo LDFLAGS: ../../build/go/pserver/cclient/libpaddle_go_optimizer.a -lstdc++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add -lm after -lstdc++, or build will error at:

# github.com/PaddlePaddle/Paddle/go/pserver
/usr/bin/ld: ../../build/go/pserver/cclient/libpaddle_go_optimizer.a(adam_optimizer.cc.o): undefined reference to symbol 'pow@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libm.so.6: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no any part use math library, I cannot reproduce it in my computer and docker environment. Will talk with you offline
anyway, add the flags.

@typhoonzero
Copy link
Contributor

In client.go:SendGrads:

for err := range errCh {
		if err != nil {
			return err
		}

		recv++
		if recv == len(grads) {
			break
		}
	}

should not return err, or the function will stop waiting for other goroutines to finish, causing unexpected errors, and should log the error.

@typhoonzero
Copy link
Contributor

typhoonzero commented Jul 1, 2017

client_test.go create parameter.Content with size n*100, then at optimizer.cc doing Tensor* parameter = new Tensor(reinterpret_cast<float*>(param_buffer), num_bytes);, num_bytes is still the byte array size, but the re-casted data param_buffer will be 4 times less, because float is of 4 bytes.

I tried to use

Tensor* parameter =
      new Tensor(reinterpret_cast<float*>(param_buffer), num_bytes/4);

and tested ok. But better fix should be at optimizer.go:

        C.memcpy(cbuffer, unsafe.Pointer(&p.Content[0]), C.size_t(len(p.Content)))
	o.opt = C.paddle_create_optimizer((*C.uchar)(&c[0]), C.int(len(c)),
		C.paddle_element_type(p.ElementType), cbuffer, C.int(len(p.Content)),
		(*C.char)(nullPtr), 0)

to get proper size from the element type, instead of C.int(len(p.Content))

@dzhwinter
Copy link
Contributor Author

Thanks a lot! fix it @typhoonzero

@dzhwinter dzhwinter merged commit f448edf into PaddlePaddle:develop Jul 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate new Pserver C optimizer lib with Go.
3 participants