Closes #4984: ak.array with negative numbers still has problems #4985

1RyanK · 2025-10-06T15:07:51Z

I mainly was trying to fix issues surrounding things like ak.array([-2**200]), but while I was messing around in ak.array I noticed a few other things that needed fixing.

This is needed for #4593.

Summary of Changes

Added an early str check to raise a clear TypeError for scalar string inputs.
Tightened iterable handling to convert only generators/ranges to lists, leaving np.ndarray and pd.Series intact.
Refined unsigned integer inference to require non-negative values and include values ≥ 2⁶³.
Added automatic bigint inference for object arrays containing only integers.
Reworked negative bigint handling to properly construct signed bigints using a sign mask.
Added tests for large negative bigint values, range conversion, unsigned inference, and mixed-sign behavior.

Purpose:
Fixes incorrect bigint conversion for large negative numbers and cleans up input normalization and dtype inference logic.

Closes #4984: ak.array with negative numbers still has problems

ajpotts

I think this is good for now. We need to refactor this ak.array function to clean up the logic, but I created a separate issue for that: 4990

arkouda/numpy/pdarraycreation.py

…lems

jaketrookman

Looks good

drculhane

I ran the unit tests, and also the one specific example you cited when you created the issue. Looks good. I concur that ak.array has become a bit of a mess, and glad to see that we now have an issue for that, too.

…es (#5044) While PR #4985 technically fixed the issue of general negative bigint problems, it caused a performance regression. I believe this is primarily due to the overhead of creating the sign array. Instead, here's basically what happens in the code: ```python any_neg = np.any(flat < 0) req_bits: int if any_neg: req_bits = max(flat.max().bit_length(), (-flat.min()).bit_length()) + 1 else: req_bits = flat.max().bit_length() ``` Then the code figures out how many times to pull off `uint64` limbs, rather than waiting until everything is zero (which doesn't happen in the negative case, it just continues to be -1). The code runs out to ~~two~~ three separate Chapel functions depending on the case. If the input is just an `int64` or `uint64` array, it converts it directly to bigint (similar for `float64` or some other kind of floating point input). If the input is numpy's version of a bigint array, it goes to the multi-limb version (unfortunately, this is the case even if everything comes out to just one limb, but I think the performance loss here is not that bad). If the input data had any negative values, it treats the limbs as signed (all bits are positive and the top bit is negative). However, it's hard to create a bigint like this (AFAIK). So Chapel-side, it creates a signs array, strips off the top bit of every limb and treats it as a bool to reference later. Either way, it goes to the "Horner fold" step, which, as ChatGPT tells me, is possibly a faster way to create the bigints. Previously the code was bit shifting the limbs into the right spot and then adding them to the bigint value. The idea here is that you start with the highest limb of the data, then you bitshift it and add in the next limb, bitshift what you have and add in the next limb, and so on. You can read more about the generic version of this [here](https://en.wikipedia.org/wiki/Horner%27s_method) (take x = `2**64`). Then it adds in the signed bit as necessary. It also handles the case where `max_bits` is not -1. Hopefully any loss in performance is made up by a few factors: 1. Previously the Python code was stripping the limbs off by modding by `2**64` and then integer dividing by the same value. I think it could speed the code up to do a bitmask by `2**64 - 1` and then bitshift by 64. 2. Supposedly the Horner fold has better performance. 3. If the input is only one limb of `int64` or `uint64` data, it should go to the single limb version and that should run quicker. ~~This also handles all cases of numeric data to bigint output in a single function, so if the performance is back up, then I can cut out some code in the array function.~~ I went ahead and cut the old bigint code out. Closes #5043: Investigate performance loss from negative bigint changes

1RyanK force-pushed the 4984-ak.array_with_negative_numbers_still_has_problems branch 7 times, most recently from 8b1ba0f to 3f7eb0f Compare October 9, 2025 12:37

1RyanK marked this pull request as ready for review October 9, 2025 13:20

1RyanK requested review from ajpotts, drculhane and jaketrookman October 9, 2025 13:20

1RyanK added the blocking This is blocking a developer from completing a task they are actively working. label Oct 9, 2025

ajpotts approved these changes Oct 10, 2025

View reviewed changes

arkouda/numpy/pdarraycreation.py Outdated Show resolved Hide resolved

arkouda/numpy/pdarraycreation.py Outdated Show resolved Hide resolved

1RyanK force-pushed the 4984-ak.array_with_negative_numbers_still_has_problems branch 2 times, most recently from c65fa9e to 1ce06f9 Compare October 14, 2025 15:19

Closes Bears-R-Us#4984: ak.array with negative numbers still has prob…

a783f45

…lems

1RyanK force-pushed the 4984-ak.array_with_negative_numbers_still_has_problems branch from 1ce06f9 to a783f45 Compare October 15, 2025 23:55

jaketrookman approved these changes Oct 17, 2025

View reviewed changes

drculhane approved these changes Oct 17, 2025

View reviewed changes

ajpotts added this pull request to the merge queue Oct 17, 2025

Merged via the queue into Bears-R-Us:main with commit 353ca4e Oct 17, 2025
21 checks passed

1RyanK mentioned this pull request Nov 13, 2025

Closes #5043: Investigate performance loss from negative bigint changes #5044

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #4984: ak.array with negative numbers still has problems #4985

Closes #4984: ak.array with negative numbers still has problems #4985

Uh oh!

1RyanK commented Oct 6, 2025 •

edited

Loading

Uh oh!

ajpotts left a comment

Uh oh!

Uh oh!

Uh oh!

jaketrookman left a comment

Uh oh!

drculhane left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Closes #4984: ak.array with negative numbers still has problems #4985

Closes #4984: ak.array with negative numbers still has problems #4985

Uh oh!

Conversation

1RyanK commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajpotts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jaketrookman left a comment

Choose a reason for hiding this comment

Uh oh!

drculhane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

1RyanK commented Oct 6, 2025 •

edited

Loading