Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] "iso_calendar" kernel returns incorrect results for array length > 32 #38655

Closed
mroeschke opened this issue Nov 9, 2023 · 2 comments · Fixed by #39360
Closed

[C++] "iso_calendar" kernel returns incorrect results for array length > 32 #38655

mroeschke opened this issue Nov 9, 2023 · 2 comments · Fixed by #39360
Assignees
Labels
Component: Python Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Milestone

Comments

@mroeschke
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

In [1]: import pyarrow as pa

In [2]: pa.__version__
Out[2]: '14.0.0'

In [3]: import pyarrow.compute as pc

In [4]: from datetime import  datetime

In [5]: arr = pa.array([datetime(2019, 1, 3, 5, 11)]*50)

In [6]: pc.iso_calendar(arr)
Out[6]: 
<pyarrow.lib.StructArray object at 0x1500ff640>
-- is_valid: all not null
-- child 0 type: int64
  [
    2019,
    2019,
    2019,
    2019,
    2019,
    2019,
    2019,
    2019,
    2019,
    2019,
    ...
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0
  ]
-- child 1 type: int64
  [
    2019,
    2019,
    2019,
    2019,
    2019,
    2019,
    2019,
    2019,
    2019,
    2019,
    ...
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0
  ]
-- child 2 type: int64
  [
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    ...
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0
  ]

In [7]: pc.iso_calendar(arr)
zsh: segmentation fault  ipython

Component(s)

Python

@mroeschke mroeschke changed the title pyarrow.compute.iso_calendar returns incorrect results and segfaults for multiple repeated timestamps [Python] pyarrow.compute.iso_calendar returns incorrect results and segfaults for multiple repeated timestamps Nov 9, 2023
@jorisvandenbossche jorisvandenbossche changed the title [Python] pyarrow.compute.iso_calendar returns incorrect results and segfaults for multiple repeated timestamps [C++] "iso_calendar" kernel returns incorrect results for array length > 32 Nov 14, 2023
@jorisvandenbossche
Copy link
Member

Thanks! I can't reproduce the segfault, but it's of course obviously wrong anyway.

It seems the repeated values don't matter, I can also reproduce this with a date range, and it starts to happens once above 32 values:

arr = pa.array(pd.date_range("2019-01-01", periods=33, freq="3H"))
pc.iso_calendar(arr)

cc @rok

@rok
Copy link
Member

rok commented Nov 20, 2023

Oh, that is interesting. I'll take a look this week.

jorisvandenbossche added a commit that referenced this issue Jan 23, 2024
…rray length > 32 (#39360)

### Rationale for this change

When defining `StructArray`'s field builders for `ISOCalendar` we don't pre-allocate memory and then use unsafe append. This causes the resulting array to be at most 32 rows long.

### What changes are included in this PR?

This introduces required memory pre-allocation in the `ISOCalendar` c++ kernel.

### Are these changes tested?

This adds a test for the Python wrapper.

### Are there any user-facing changes?

Fixes the behavior of `iso_calendar` kernel.
* Closes: #38655

Lead-authored-by: Rok Mihevc <rok@mihevc.org>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche added this to the 15.0.1 milestone Jan 23, 2024
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
… for array length > 32 (apache#39360)

### Rationale for this change

When defining `StructArray`'s field builders for `ISOCalendar` we don't pre-allocate memory and then use unsafe append. This causes the resulting array to be at most 32 rows long.

### What changes are included in this PR?

This introduces required memory pre-allocation in the `ISOCalendar` c++ kernel.

### Are these changes tested?

This adds a test for the Python wrapper.

### Are there any user-facing changes?

Fixes the behavior of `iso_calendar` kernel.
* Closes: apache#38655

Lead-authored-by: Rok Mihevc <rok@mihevc.org>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
raulcd pushed a commit that referenced this issue Feb 20, 2024
…rray length > 32 (#39360)

### Rationale for this change

When defining `StructArray`'s field builders for `ISOCalendar` we don't pre-allocate memory and then use unsafe append. This causes the resulting array to be at most 32 rows long.

### What changes are included in this PR?

This introduces required memory pre-allocation in the `ISOCalendar` c++ kernel.

### Are these changes tested?

This adds a test for the Python wrapper.

### Are there any user-facing changes?

Fixes the behavior of `iso_calendar` kernel.
* Closes: #38655

Lead-authored-by: Rok Mihevc <rok@mihevc.org>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@amoeba amoeba added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Feb 28, 2024
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
… for array length > 32 (apache#39360)

### Rationale for this change

When defining `StructArray`'s field builders for `ISOCalendar` we don't pre-allocate memory and then use unsafe append. This causes the resulting array to be at most 32 rows long.

### What changes are included in this PR?

This introduces required memory pre-allocation in the `ISOCalendar` c++ kernel.

### Are these changes tested?

This adds a test for the Python wrapper.

### Are there any user-facing changes?

Fixes the behavior of `iso_calendar` kernel.
* Closes: apache#38655

Lead-authored-by: Rok Mihevc <rok@mihevc.org>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
… for array length > 32 (apache#39360)

### Rationale for this change

When defining `StructArray`'s field builders for `ISOCalendar` we don't pre-allocate memory and then use unsafe append. This causes the resulting array to be at most 32 rows long.

### What changes are included in this PR?

This introduces required memory pre-allocation in the `ISOCalendar` c++ kernel.

### Are these changes tested?

This adds a test for the Python wrapper.

### Are there any user-facing changes?

Fixes the behavior of `iso_calendar` kernel.
* Closes: apache#38655

Lead-authored-by: Rok Mihevc <rok@mihevc.org>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Python Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants