Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] .nbytes should release the GIL while calculating the data size #39096

Closed
hendrikmakait opened this issue Dec 6, 2023 · 1 comment · Fixed by #39097
Closed

[Python] .nbytes should release the GIL while calculating the data size #39096

hendrikmakait opened this issue Dec 6, 2023 · 1 comment · Fixed by #39097

Comments

@hendrikmakait
Copy link
Contributor

hendrikmakait commented Dec 6, 2023

Describe the enhancement requested

The .nbytes holds the GIL while computing the data size in C++, which has caused performance issues in Dask because threads were blocking each other. Am I missing a valid reason for blocking the GIL during the computation? From my understanding, there should be no possible race conditions. Similarly, I noticed that .get_total_buffer_size holds the GIL as well, but I'm not sure if that's a problem because the computation is much cheaper.

Component(s)

Python

@hendrikmakait hendrikmakait changed the title [Python] .nbytes blocks the GIL while calculating the data size [Python] .nbytes should release the GIL while calculating the data size Dec 6, 2023
@jorisvandenbossche
Copy link
Member

No, I don't think there is a specific reason for not releasing the GIL, apart from missing to do that when this was added (or assuming this was cheap, I think initially nbytes returned the cheap number)

jorisvandenbossche pushed a commit that referenced this issue Dec 7, 2023
### Rationale for this change

The `.nbytes` holds the GIL while computing the data size in C++, which has caused performance issues in Dask because threads were blocking each other

See #39096

### Are these changes tested?

I am not sure if additional tests are necessary here. If so, I'm happy to add them but would welcome some pointers.

### Are there any user-facing changes?

No

* Closes: #39096

Authored-by: Hendrik Makait <hendrik@makait.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche added this to the 15.0.0 milestone Dec 7, 2023
mapleFU pushed a commit to mapleFU/arrow that referenced this issue Dec 13, 2023
### Rationale for this change

The `.nbytes` holds the GIL while computing the data size in C++, which has caused performance issues in Dask because threads were blocking each other

See apache#39096

### Are these changes tested?

I am not sure if additional tests are necessary here. If so, I'm happy to add them but would welcome some pointers.

### Are there any user-facing changes?

No

* Closes: apache#39096

Authored-by: Hendrik Makait <hendrik@makait.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
### Rationale for this change

The `.nbytes` holds the GIL while computing the data size in C++, which has caused performance issues in Dask because threads were blocking each other

See apache#39096

### Are these changes tested?

I am not sure if additional tests are necessary here. If so, I'm happy to add them but would welcome some pointers.

### Are there any user-facing changes?

No

* Closes: apache#39096

Authored-by: Hendrik Makait <hendrik@makait.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants