-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Race condition in _pandas_api#is_data_frame #39313
Comments
take |
pitrou
added a commit
that referenced
this issue
Jan 9, 2024
…39314) ### Rationale for this change See: ``` cdef inline bint _have_pandas_internal(self): if not self._tried_importing_pandas: self._check_import(raise_=False) return self._have_pandas ``` The method `_check_import`: 1) sets `_tried_importing_pandas` to true 2) does some things which take time... 3) sets `_have_pandas` to true (if we indeed do have pandas) Suppose thread 1 calls `_have_pandas_internal`. If thread 1 is at step 2 while thread 2 calls `_have_pandas_internal`, `_have_pandas_internal` may incorrectly return False for thread 2 as thread 1 has set `_tried_importing_pandas` to true, but has not yet (but will) set `_have_pandas` to True. `_have_pandas_internal` will return True for thread 1. After my fix, `_have_pandas_internal` will not return an incorrect value in the scenario described above. It would instead result in a redundant, but (I believe) harmless, invocation of `_check_import`. ### What changes are included in this PR? Changes ordering of "trying to import pandas" and "recording that pandas import has been tried" ### Are these changes tested? yes, see test committed ### Are there any user-facing changes? This PR resolves a user-facing race condition #39313 * Closes: #39313 Lead-authored-by: Thomas Jarosz <thomas.jarosz@c3.ai> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
clayburn
pushed a commit
to clayburn/arrow
that referenced
this issue
Jan 23, 2024
…ort (apache#39314) ### Rationale for this change See: ``` cdef inline bint _have_pandas_internal(self): if not self._tried_importing_pandas: self._check_import(raise_=False) return self._have_pandas ``` The method `_check_import`: 1) sets `_tried_importing_pandas` to true 2) does some things which take time... 3) sets `_have_pandas` to true (if we indeed do have pandas) Suppose thread 1 calls `_have_pandas_internal`. If thread 1 is at step 2 while thread 2 calls `_have_pandas_internal`, `_have_pandas_internal` may incorrectly return False for thread 2 as thread 1 has set `_tried_importing_pandas` to true, but has not yet (but will) set `_have_pandas` to True. `_have_pandas_internal` will return True for thread 1. After my fix, `_have_pandas_internal` will not return an incorrect value in the scenario described above. It would instead result in a redundant, but (I believe) harmless, invocation of `_check_import`. ### What changes are included in this PR? Changes ordering of "trying to import pandas" and "recording that pandas import has been tried" ### Are these changes tested? yes, see test committed ### Are there any user-facing changes? This PR resolves a user-facing race condition apache#39313 * Closes: apache#39313 Lead-authored-by: Thomas Jarosz <thomas.jarosz@c3.ai> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss
pushed a commit
to dgreiss/arrow
that referenced
this issue
Feb 19, 2024
…ort (apache#39314) ### Rationale for this change See: ``` cdef inline bint _have_pandas_internal(self): if not self._tried_importing_pandas: self._check_import(raise_=False) return self._have_pandas ``` The method `_check_import`: 1) sets `_tried_importing_pandas` to true 2) does some things which take time... 3) sets `_have_pandas` to true (if we indeed do have pandas) Suppose thread 1 calls `_have_pandas_internal`. If thread 1 is at step 2 while thread 2 calls `_have_pandas_internal`, `_have_pandas_internal` may incorrectly return False for thread 2 as thread 1 has set `_tried_importing_pandas` to true, but has not yet (but will) set `_have_pandas` to True. `_have_pandas_internal` will return True for thread 1. After my fix, `_have_pandas_internal` will not return an incorrect value in the scenario described above. It would instead result in a redundant, but (I believe) harmless, invocation of `_check_import`. ### What changes are included in this PR? Changes ordering of "trying to import pandas" and "recording that pandas import has been tried" ### Are these changes tested? yes, see test committed ### Are there any user-facing changes? This PR resolves a user-facing race condition apache#39313 * Closes: apache#39313 Lead-authored-by: Thomas Jarosz <thomas.jarosz@c3.ai> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
raulcd
pushed a commit
that referenced
this issue
Feb 20, 2024
…39314) ### Rationale for this change See: ``` cdef inline bint _have_pandas_internal(self): if not self._tried_importing_pandas: self._check_import(raise_=False) return self._have_pandas ``` The method `_check_import`: 1) sets `_tried_importing_pandas` to true 2) does some things which take time... 3) sets `_have_pandas` to true (if we indeed do have pandas) Suppose thread 1 calls `_have_pandas_internal`. If thread 1 is at step 2 while thread 2 calls `_have_pandas_internal`, `_have_pandas_internal` may incorrectly return False for thread 2 as thread 1 has set `_tried_importing_pandas` to true, but has not yet (but will) set `_have_pandas` to True. `_have_pandas_internal` will return True for thread 1. After my fix, `_have_pandas_internal` will not return an incorrect value in the scenario described above. It would instead result in a redundant, but (I believe) harmless, invocation of `_check_import`. ### What changes are included in this PR? Changes ordering of "trying to import pandas" and "recording that pandas import has been tried" ### Are these changes tested? yes, see test committed ### Are there any user-facing changes? This PR resolves a user-facing race condition #39313 * Closes: #39313 Lead-authored-by: Thomas Jarosz <thomas.jarosz@c3.ai> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
zanmato1984
pushed a commit
to zanmato1984/arrow
that referenced
this issue
Feb 28, 2024
…ort (apache#39314) ### Rationale for this change See: ``` cdef inline bint _have_pandas_internal(self): if not self._tried_importing_pandas: self._check_import(raise_=False) return self._have_pandas ``` The method `_check_import`: 1) sets `_tried_importing_pandas` to true 2) does some things which take time... 3) sets `_have_pandas` to true (if we indeed do have pandas) Suppose thread 1 calls `_have_pandas_internal`. If thread 1 is at step 2 while thread 2 calls `_have_pandas_internal`, `_have_pandas_internal` may incorrectly return False for thread 2 as thread 1 has set `_tried_importing_pandas` to true, but has not yet (but will) set `_have_pandas` to True. `_have_pandas_internal` will return True for thread 1. After my fix, `_have_pandas_internal` will not return an incorrect value in the scenario described above. It would instead result in a redundant, but (I believe) harmless, invocation of `_check_import`. ### What changes are included in this PR? Changes ordering of "trying to import pandas" and "recording that pandas import has been tried" ### Are these changes tested? yes, see test committed ### Are there any user-facing changes? This PR resolves a user-facing race condition apache#39313 * Closes: apache#39313 Lead-authored-by: Thomas Jarosz <thomas.jarosz@c3.ai> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
Concurrent invocation of
_pandas_api#is_data_frame
can result in incorrect behavior (returning false when provided a dataframe). This can cause upstream issues when using higher-level public arrow APIs (such aswrite_feather
).I have authored a pytest attached below which reproduces the issue:
Component(s)
Python
The text was updated successfully, but these errors were encountered: