Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][CI] Hypothesis tests are failing (test-conda-python-3.11-hypothesis crossbow build) #40379

Closed
jorisvandenbossche opened this issue Mar 6, 2024 · 1 comment

Comments

@jorisvandenbossche
Copy link
Member

See eg https://github.com/ursacomputing/crossbow/actions/runs/8104553122/job/22151381641

They started failing between 2024-02-29 and 2024-02-28. First failure https://github.com/ursacomputing/crossbow/actions/runs/8089011272/job/22104197020 shows an error about

________________________ test_array_to_pylist_roundtrip ________________________

    @h.given(past.all_arrays)
>   def test_array_to_pylist_roundtrip(arr):

opt/conda/envs/arrow/lib/python3.11/site-packages/pyarrow/tests/test_convert_builtin.py:2209: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
opt/conda/envs/arrow/lib/python3.11/site-packages/pyarrow/tests/strategies.py:316: in arrays
    value = st.binary(min_size=ty.byte_width, max_size=ty.byte_width)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   ValueError: Less than one byte
E   while generating 'arr' from arrays(type=one_of(one_of(one_of(one_of(one_of(one_of(one_of(just(DataType(null)), just(DataType(bool)), one_of(one_of(sampled_from([DataType(int8), DataType(int16), DataType(int32), DataType(int64)]), sampled_from([DataType(uint8), DataType(uint16), DataType(uint32), DataType(uint64)])), sampled_from([DataType(halffloat), DataType(float), DataType(double)]), builds(<cyfunction decimal128 at 0x7f8d784937a0>, precision=integers(min_value=1, max_value=38), scale=integers(min_value=1, max_value=38)), builds(<cyfunction decimal256 at 0x7f8d78493890>, precision=integers(min_value=1, max_value=76), scale=integers(min_value=1, max_value=76))), one_of(sampled_from([DataType(date32[day]), DataType(date64[ms])]), sampled_from([Time32Type(time32[s]), Time32Type(time32[ms]), Time64Type(time64[us]), Time64Type(time64[ns])]), builds(<cyfunction timestamp at 0x7f8d78492e40>, tz=one_of(none(), timezones(), timezones()), unit=sampled_from(['s', 'ms', 'us', 'ns'])), builds(<cyfunction duration at 0x7f8d78493110>, sampled_from(['s', 'ms', 'us', 'ns'])), just(DataType(month_day_nano_interval))), one_of(just(DataType(binary)), just(DataType(string)), just(DataType(large_binary)), just(DataType(large_string)), builds(<cyfunction binary at 0x7f8d78493b60>, integers(min_value=0, max_value=16)))), one_of(one_of(builds(<cyfunction list_ at 0x7f8d78498140>, one_of(just(DataType(null)), just(DataType(bool)), one_of(one_of(sampled_from([DataType(int8), DataType(int16), DataType(int32), DataType(int64)]), sampled_from([DataType(uint8), DataType(uint16), DataType(uint32), DataType(uint64)])), sampled_from([DataType(halffloat), DataType(float), DataType(double)]), builds(<cyfunction decimal128 at 0x7f8d784937a0>, precision=integers(min_value=1, max_value=38), scale=integers(min_value=1, max_value=38)), builds(<cyfunction decimal256 at 0x7f8d78493890>, precision=integers(min_value=1, max_value=76), scale=integers(min_value=1, max_value=76))), one_of(sampled_from([DataType(date32[day]), DataType(date64[ms])]), sampled_from([Time32Type(time32[s]), Time32Type(time32[ms]), Time64Type(time64[us]), Time64Type(time64[ns])]), builds(<cyfunction timestamp at 0x7f8d78492e40>, tz=one_of(none(), timezones(), timezones()), unit=sampled_from(['s', 'ms', 'us', 'ns'])), builds(<cyfunction duration at 0x7f8d78493110>, sampled_from(['s', 'ms', 'us', 'ns'])), just(DataType(month_day_nano_interval))), one_of(just(DataType(binary)), just(DataType(string)), just(DataType(large_binary)), just(DataType(large_string)), builds(<cyfunction binary at 0x7f8d78493b60>, integers(min_value=0, max_value=16))))), builds(<cyfunction large_list at 0x7f8d78498230>, one_of(just(DataType(null)), just(DataType(bool)), one_of(one_of(sampled_from([DataType(int8), DataType(int16), DataType(int32), DataType(int64)]), sampled_from([DataType(uint8), DataType(uint16), DataType(uint32), DataType(uint64)])), sampled_from([DataType(halffloat), DataType(float), DataType(double)]), builds(<cyfunction decimal128 at 0x7f8d784937a0>, precision=integers(min_value=1, max_value=38), scale=integers(min_value=1, max_value=38)), builds(<cyfunction decimal256 at 0x7f8d78493890>, precision=integers(min_value=1, max_value=76), scale=integers(min_value=1, max_value=76))), one_of(sampled_from([DataType(date32[day]), DataType(date64[ms])]), sampled_from([Time32Type(time32[s]), Time32Type(time32[ms]), Time64Type(time64[us]), Time64Type(time64[ns])]), builds(<cyfunction timestamp at 0x7f8d78492e40>, tz=one_of(none(), timezones(), timezones()), unit=sampled_from(['s', 'ms', 'us', 'ns'])), builds(<cyfunction duration at 0x7f8d78493110>, sampled_from(['s', 'ms', 'us', 'ns'])), just(DataType(month_day_nano_interval))), one_of(just(DataType(binary)), just(DataType(string)), just(DataType(large_binary)), just(DataType(large_string)), builds(<cyfunction binary at 0x7f8d78493b60>, integers(min_value=0, max_value=16)))))), builds(<cyfunction list_ at 0x7f8d78498140>, one_of(just(DataType(null)), just(DataType(bool)), one_of(one_of(sampled_from([DataType(int8), DataType(int16), DataType(int32), DataType(int64)]), sampled_from([DataType(uint8), DataType(uint16), DataType(uint32), DataType(uint64)])), sampled_from([DataType(halffloat), DataType(float), DataType(double)]), builds(<cyfunction decimal128 at 0x7f8d784937a0>, precision=integers(min_value=1, max_value=38), scale=integers(min_value=1, max_value=38)), builds(<cyfunction decimal256 at 0x7f8d78493890>, precision=integers(min_value=1, max_value=76), scale=integers(min_value=1, max_value=76))), one_of(sampled_from([DataType(date32[day]), DataType(date64[ms])]), sampled_from([Time32Type(time32[s]), Time32Type(time32[ms]), Time64Type(time64[us]), Time64Type(time64[ns])]), builds(<cyfunction timestamp at 0x7f8d78492e40>, tz=one_of(none(), timezones(), timezones()), unit=sampled_from(['s', 'ms', 'us', 'ns'])), builds(<cyfunction duration at 0x7f8d78493110>, sampled_from(['s', 'ms', 'us', 'ns'])), just(DataType(month_day_nano_interval))), one_of(just(DataType(binary)), just(DataType(string)), just(DataType(large_binary)), just(DataType(large_string)), builds(<cyfunction binary at 0x7f8d78493b60>, integers(min_value=0, max_value=16)))), integers(min_value=0, max_value=16)))), struct_types()), builds(<cyfunction dictionary at 0x7f8d784985f0>, sampled_from([DataType(int8), DataType(int16), DataType(int32), DataType(int64)]), one_of(just(DataType(bool)), one_of(sampled_from([DataType(int8), DataType(int16), DataType(int32), DataType(int64)]), sampled_from([DataType(uint8), DataType(uint16), DataType(uint32), DataType(uint64)])), sampled_from([DataType(float), DataType(double)]), just(DataType(binary)), just(DataType(string)), builds(<cyfunction binary at 0x7f8d78493b60>, integers(min_value=0, max_value=16))))), map_types()), one_of(one_of(builds(<cyfunction list_ at 0x7f8d78498140>, (deferred@140245131593456)), builds(<cyfunction large_list at 0x7f8d78498230>, (deferred@140245131593456))), builds(<cyfunction list_ at 0x7f8d78498140>, (deferred@140245131593456), integers(min_value=0, max_value=16)))), struct_types(item_strategy=(deferred@140245131593456))))

pyarrow/types.pxi:279: ValueError

so that might be related to #39592 ( although I don't directly see how)

Later (the last two days), more failures were appearing, most likely due to #40160, which edited a hypothesis strategy, but we forgot to trigger the hypothesis tests in that PR.

jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Mar 7, 2024
pitrou pushed a commit that referenced this issue Mar 7, 2024
#40381)

### Rationale for this change

Fixing the hypothesis tests:

- fixup untested changes to the strategies from #40160
- fix a bug in the `byte_width` attribute discovered by hypothesis (introduced by #39592)

* GitHub Issue: #40379

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
@pitrou pitrou added this to the 16.0.0 milestone Mar 7, 2024
@pitrou
Copy link
Member

pitrou commented Mar 7, 2024

Issue resolved by pull request 40381
#40381

@pitrou pitrou closed this as completed Mar 7, 2024
thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
…s tests (apache#40381)

### Rationale for this change

Fixing the hypothesis tests:

- fixup untested changes to the strategies from apache#40160
- fix a bug in the `byte_width` attribute discovered by hypothesis (introduced by apache#39592)

* GitHub Issue: apache#40379

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants