Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Format][C++] Recommended/required value for ArrowDeviceArray.device_id int in case of CPU data #40801

Closed
jorisvandenbossche opened this issue Mar 26, 2024 · 7 comments

Comments

@jorisvandenbossche
Copy link
Member

In #40717 (review), @paleolimbot noted that the current implementation of the C Device interface in Arrow C++ is using a device_id of -1 when exporting CPU data (while nanoarrow is using 0).

The spec itself doesn't say anything about this (https://arrow.apache.org/docs/format/CDeviceDataInterface.html#c.ArrowDeviceArray.device_id):

Mandatory. The device id to identify a specific device if multiple devices of this type are on the system. The semantics of the id will be hardware dependent, but we use an int64_t to future-proof the id as devices change over time.

Should we recommend or require a specific value for this case?

I noticed that the DLPack specification, from which the device type/id structure was taken, does specify to use 0 (https://dmlc.github.io/dlpack/latest/c_api.html#c.DLDevice.device_id):

The device index. For vanilla CPU memory, pinned memory, or managed memory, this is set to 0.

Should we adopt the same guideline for our spec?

cc @pitrou @zeroshade @kkraus14

@zeroshade
Copy link
Member

Since the device_id should never be utilized or have any meaning in the case of CPU memory I'm fine with it either being unspecified or being explicitly specified to be 0. In my opinion it's an arbitrary decision / value

@pitrou
Copy link
Member

pitrou commented Mar 26, 2024

Perhaps we could add something like: "If the device type does not have a notion of device id, it is recommended to return 0 as a convention".

@kkraus14
Copy link
Contributor

Keeping aligned with dlpack and using 0 probably makes the most sense but there is some downsides.

If we wanted to optionally denote which NUMA node CPU memory is on for example, being able to use numa node 0 would be good. Using -1 as the default would be more ideal in this situation.

For managed memory, if that memory has been touched on a specific GPU then it would be local to that GPU and while it could be accessed by any device, it would be expensive to do so where it would be ideal to use an actual device id for that. If it hasn't been touched yet then again something like -1 would be ideal to indicate that it's not known if the memory is local anywhere.

@zeroshade
Copy link
Member

Given @kkraus14's comments, I think we'd be better served with recommending -1 then

@raulcd
Copy link
Member

raulcd commented Apr 9, 2024

My understanding is that this is a doc clarification, should we add it for 16.0.0?

@paleolimbot
Copy link
Member

I'll give it a go in the next few minutes!

raulcd pushed a commit that referenced this issue Apr 10, 2024
… C Device data interface (#41101)

### Rationale for this change

It is not explicit what the value of the `ArrowDeviceArray::device_id` should be when a given device type has no notion of a device identifier (e.g., there is always only one).

### What changes are included in this PR?

The text was clarified to recommend a value of -1. This was the value already used by Arrow C++.

### Are these changes tested?

No tests needed (documentation)

### Are there any user-facing changes?

No
* GitHub Issue: #40801

Authored-by: Dewey Dunnington <dewey@voltrondata.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
@raulcd
Copy link
Member

raulcd commented Apr 10, 2024

Issue resolved by pull request 41101
#41101

@raulcd raulcd closed this as completed Apr 10, 2024
raulcd pushed a commit that referenced this issue Apr 11, 2024
… C Device data interface (#41101)

### Rationale for this change

It is not explicit what the value of the `ArrowDeviceArray::device_id` should be when a given device type has no notion of a device identifier (e.g., there is always only one).

### What changes are included in this PR?

The text was clarified to recommend a value of -1. This was the value already used by Arrow C++.

### Are these changes tested?

No tests needed (documentation)

### Are there any user-facing changes?

No
* GitHub Issue: #40801

Authored-by: Dewey Dunnington <dewey@voltrondata.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
vibhatha pushed a commit to vibhatha/arrow that referenced this issue Apr 15, 2024
… Arrow C Device data interface (apache#41101)

### Rationale for this change

It is not explicit what the value of the `ArrowDeviceArray::device_id` should be when a given device type has no notion of a device identifier (e.g., there is always only one).

### What changes are included in this PR?

The text was clarified to recommend a value of -1. This was the value already used by Arrow C++.

### Are these changes tested?

No tests needed (documentation)

### Are there any user-facing changes?

No
* GitHub Issue: apache#40801

Authored-by: Dewey Dunnington <dewey@voltrondata.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
tolleybot pushed a commit to tmct/arrow that referenced this issue May 2, 2024
… Arrow C Device data interface (apache#41101)

### Rationale for this change

It is not explicit what the value of the `ArrowDeviceArray::device_id` should be when a given device type has no notion of a device identifier (e.g., there is always only one).

### What changes are included in this PR?

The text was clarified to recommend a value of -1. This was the value already used by Arrow C++.

### Are these changes tested?

No tests needed (documentation)

### Are there any user-facing changes?

No
* GitHub Issue: apache#40801

Authored-by: Dewey Dunnington <dewey@voltrondata.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
vibhatha pushed a commit to vibhatha/arrow that referenced this issue May 25, 2024
… Arrow C Device data interface (apache#41101)

### Rationale for this change

It is not explicit what the value of the `ArrowDeviceArray::device_id` should be when a given device type has no notion of a device identifier (e.g., there is always only one).

### What changes are included in this PR?

The text was clarified to recommend a value of -1. This was the value already used by Arrow C++.

### Are these changes tested?

No tests needed (documentation)

### Are there any user-facing changes?

No
* GitHub Issue: apache#40801

Authored-by: Dewey Dunnington <dewey@voltrondata.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants