Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Bug of group2ctx? Wrong device placement? #7934

Open
x10000year opened this issue Sep 18, 2017 · 8 comments
Open

Bug of group2ctx? Wrong device placement? #7934

x10000year opened this issue Sep 18, 2017 · 8 comments
Labels
Symbol v1.x Targeting v1.x branch

Comments

@x10000year
Copy link

x10000year commented Sep 18, 2017

For the following code:

x = mx.symbol.MyOp()
exe = x.bind(mx.gpu(), {}, group2ctx={"a": mx.cpu(), "b": mx.gpu()})
exe.forward()

where MyOp is a custom operator written in c++, which prints "CPU" if it is run in cpu context, or prints "GPU" if run in gpu context. MyOp has no input.

I don't use mx.AttrScope to specify the group of x, so default context should be used for x. However, the above code prints "CPU", which means that x is run in cpu context. Why?

If I set group2ctx={"b": mx.cpu(), "a": mx.gpu()}. Then it prints "GPU".

After more tests, I found that the group that has the alphabetically smaller name is chosen for x. Very strange.

Is this a bug? How device placement works?

@formath
Copy link
Contributor

formath commented Sep 20, 2017

During PlaceDevice, if an op can't be assigned a device id, its device id will be set 0. And to assign a default context is done after PlaceDevice, so it will not works actually. 0 indicates which context depends on the group2ctx. More deeply, it is the index when traversing a map, so alphabetical. It is should be a bug. @tqchen @piiswrong

@x10000year
Copy link
Author

@formath What do you mean by "if an op can't be assigned a device id"? How op's device id is assigned?

@formath
Copy link
Contributor

formath commented Sep 20, 2017

Depends on the device id of its first control node or input node. Your op may not assigned by this.

@x10000year
Copy link
Author

@formath So if a symbol a has input b, and b is assigned to cpu device, then a will also be automatically assigned to cpu device regardless of the default context? Why it is designed in this way? This is surprising and no documentation for this is found.

@formath
Copy link
Contributor

formath commented Sep 20, 2017

exe = x.bind(mx.gpu(), {}, group2ctx={"a": mx.cpu(), "b": mx.gpu()}) The default context just indicate this executor run on this context, but not the node or op in symbol.

@x10000year
Copy link
Author

@formath Is executor just a scheduler? What does it mean by "executor run on this context"? What executor runs? Parameter updating?

@szha
Copy link
Member

szha commented Dec 22, 2017

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

@kalyc
Copy link
Contributor

kalyc commented Jun 12, 2018

@x10000year Thanks for submitting the issue - were you able to resolve it?

@szha szha added Symbol v1.x Targeting v1.x branch and removed needs triage labels Jul 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Symbol v1.x Targeting v1.x branch
Projects
None yet
Development

No branches or pull requests

4 participants