Clean-up and slightly optimise `extra.array_api.ArrayStrategy` #3105

honno · 2021-09-21T18:50:14Z

Following up from #3065 where I mentioned cleaning some internal logic up (no public API changes). Personally I find it easier to follow now, although if folk disagree then that'd be good to know!

I also thought to not repeat check_set_value()'s logic on values that have already been seen, as I think we can be pretty assured that values will be deterministically promoted once assigned into an array. With numpy.array_api this gives a ~15% performance boost locally when I compared and averaged the results of typical xps.arrays() runs using this scrappy script... note I'm generally new to benchmarking and Hypothesis is obviously a tricky beast here, so I might need to revisit this.

Zac-HD

Very nice! It took me a while to be convinced that the edge cases all worked, but with the comments below done I'll be happy to merge such a perf improvement. Thanks again 🤩

Zac-HD · 2021-09-22T10:30:33Z

hypothesis-python/src/hypothesis/extra/array_api.py

-                f"of dtype {type(val)} to array of dtype {result.dtype}."
-            ) from e
-        self.check_set_value(val, result[i], strategy)
+        self.check_hist = set()


This is pretty tricky!

It's safe to share between multiple tests cases if and only if we never add disallowed elements to the set.
A comment to this effect would be very useful; it took me a while to convince myself it was OK.

We have potentially unbounded memory usage, if e.g. you set max_examples=10**9 and leave int64 arrays overnight. This can be cheaply mitigated by checking once per array (at the start of do_draw) whether the set has more than say 100K elements, and clearing it if so.

I've made a comment which hopefully clarifies how check_hist works, hope that's okay.

Thanks for catching the potential memory problems—I've made a reset mechanism at the start of do_draw() to that effect.

if len(self.check_hist) >= 100_000: self.check_hist = set()

This does hurt the performance gains. I tried tracking the cache size in a separate attribute, but len() for sets is really quite fast that it's not worth the cognitive load, and also using a list + size attribute was comparable in performance. So right now this PR seems improve performance by 5%-10% for typical uses of xps.arrays().

I wrote test cases to cover both points... they are both admittedly a bit funky heh.

Hmm, I'd rather not have that drop. Try a range up to say 10M? I think 80MB overhead is probably acceptable if it's a nontrivial perf improvement for arrays!

If you're benchmarking it would also be interesting to see perf without any checks at all - in some cases we could manage up-front analysis proving that we don't need them.

Finally, sharing the check_hist between strategies would also improve the hit rate: perhaps a global defaultdict(set) keyed off the dtype, and assign to an instance variable in __init__? (and then self.check_hist.clear()` instead of assigning a new set)

perhaps a global defaultdict(set) keyed off the dtype

Just noting there's a potential hashing problem here, although @asmeurer mentioned to me recently that it's probably safe to assume dtypes will be hashable. I could always explore using the dtype name (strings) too.

hypothesis-python/src/hypothesis/extra/array_api.py

hypothesis-python/tests/array_api/test_arrays.py

Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>

honno · 2021-09-22T17:00:11Z

For future reference, something like this can reach all the error messages.

>>> xps.arrays(dtype=xp.uint8, shape=10, unique=True, elements=st.characters(), fill=st.nothing()).example()
InvalidArgument: Generated elements [H, , ..., 裖, 򘶍] from strategy <hypothesis.strategies._internal.strings.OneCharStringStrategy object at 0x7f4a514e6ee0> could not be converted to array of dtype uint8. Consider if elements of type <class 'str'> are compatible with uint8.
>>> xps.arrays(dtype=xp.uint8, shape=5, unique=True, elements=st.characters() | st.floats(), fill=st.nothing()).example()
InvalidArgument: Generated elements ['5', '@', 'g', 2.2250738585072014e-308, '𫌓'] from strategy one_of(characters(), floats()) could not be converted to array of dtype uint8. Consider if elements of types (<class 'float'>, <class 'str'>) are compatible with uint8.
>>> xps.arrays(dtype=xp.uint8, shape=10, elements=st.integers(500, 510)).example()
InvalidArgument: Generated array element 500 from strategy integers(500, 510) cannot be represented with dtype uint8. Array module numpy.array_api instead represents the element as 244. Consider using a more precise elements strategy, for example passing the width argument to floats().
>>> xps.arrays(dtype=xp.uint8, shape=10, fill=st.characters()).example()
InvalidArgument: Could not create full array of dtype=uint8 with fill value 'I'
>>> xps.arrays(dtype=xp.uint8, shape=10, fill=st.characters()).example()
InvalidArgument: Generated array element '0' from strategy <hypothesis.strategies._internal.strings.OneCharStringStrategy object at 0x7fc6e39dbd60> cannot be represented with dtype uint8. Array module numpy.array_api instead represents the element as 0. Consider using a more precise elements strategy, for example passing the width argument to floats().
>>> xps.arrays(dtype=xp.uint8, shape=10, elements=st.characters(), fill=st.just(10)).example()
InvalidArgument: Could not add generated array element '0' of type <class 'str'> to array of dtype uint8.

rsokl · 2021-09-22T18:45:56Z

Are there any concerns that users might provide an elements strategy that produces unhashable values?

E.g. this would no longer work, I believe:

>>> hnp.arrays(
...     dtype="float",
...     shape=(4, 2),
...     elements=st.just(np.array(1)),
...     fill=st.nothing(),
... ).example()
array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

Perhaps it is worthwhile to have this optimization be skipped over gracefully in this sort of scenario (although I see that the same issue would occur when unique=True).

I doubt that this would occur very often, but I could imagine a strategy that serves up 0D arrays as elements, for example, being used to "feed" another strategy.

honno · 2021-09-22T19:03:30Z

Are there any concerns that users might provide an elements strategy that produces unhashable values?

Oh yeah, nice catch! So we could cache the casted builtin we find anyway, but that would mean using array equalities with builtins in the if val in self.check_hist: line... which is going to defeat the purpose of avoiding expensive checks!

Regarding set behaviour, I do think you're onto something but yeah it requires rethinking how we check values. I'll have to sleep on both things anywho.

Write a test case for 0d arrays as the element strategy
Figure out how 1d+ arrays should work, if at all (see what extra.numpy does)

If folk want, I'm happy to close this and open a PR tomorrow with just the clean-up, w/ array-as-elements-strategy test(s).

Zac-HD · 2021-09-23T03:06:58Z

High-level suggestion: let's try splitting into "ArrayStrategy which may need (expensive) checks", and "FastArrayStrategy which we know does not need any checks on element and fill strategies". How much faster is the latter?

(just make it unsafe for now; if it's +20% or better as I expect I'll write or help with the element-strategy-inspection logic)

honno · 2021-09-23T08:08:50Z

Using this benchmark script with -n 50, I get a ~6.5% performance gain with no checks ala "FastArrayStrategy" (when compared to master), and ~2.5% with the current bounded cached checks (~4.5% when unbounded).

Results

# master
normal          16.675665184000536
unique          24.683974191000743
no fill         22.35953766900002
no fill unique  26.520580290999533
total           90.23975733500083

# no checks / "FastArrayStrategy" 
normal          16.634187734999614
unique          23.857498895999925
no fill         19.61512034900079
no fill unique  24.16638457299996
total           84.27319155300029

# this PR with bounded memory check
normal          17.363103243000296
unique          25.210598065999875
no fill         20.39611107300061
no fill unique  25.09125650900023
total           88.06106889100101

# this PR without bounded memory check
normal          17.36311352499979
unique          24.74212677700052
no fill         20.376063377999344
no fill unique  23.49789137100015
total           85.9791950509998

I was using smaller benchmark loops and more precise strategies before (e.g. dtype=xp.uint8, shape=10 instead of dtype=xps.numeric_dtypes(), shape=xps.array_shapes()), which is interesting to note, but yea means the performance gain isn't as significant as I initially had thought 😅

In terms of benchmarking, I have no idea about how Hypothesis generally caches things, e.g. if Hypothesis is caching something, could I benchmark repeated loops of mock test cases without triggering such mechanisms?

As we can expect, check_hist caching helps high collision scenarios like no fill and small element ranges e.g. xps.from_dtype(xp.uint8). In practical terms, that seems like a relatively rare use case?

Zac-HD · 2021-09-23T11:40:46Z

Is that with a shared, or a per-strategy cache? Because the latter would be much less effective with varied shapes/dtypes (desugars to separate underlying strategies!).

honno · 2021-09-23T11:48:51Z

Is that with a shared, or a per-strategy cache?

Per-strategy! Yep your shared suggestion seems promising—might not get round to an implementation this week, will keep you posted when I do.

Zac-HD · 2021-09-23T12:00:22Z

If that's per-strategy and we still get 2-5% gains, I'm pretty optimistic again! A quick sanity-check is that I'd expect the gains to increase with max_examples for a while (probably only for shared caches) as the hit-rate improves.

I have a fair backlog of other issues I've been meaning to work on, so I'm going to leave this one to you unless you ask me to step in and help with something.

honno · 2021-09-26T20:27:25Z

I've implemented @Zac-HD's shared cache idea. I benchmarked by timing pytest tests/array_api/ (avg. of 3 runs). I removed the test_check_hist_* tests before benchmarking HEAD and HEAD~1.

commit	gain	time
`HEAD` (shared cache)	+15.5%	25.27
`HEAD~1` (per-strategy cache)	+15%	25.42
`master` (no cache)	+0%	29.89

Also interestingly, here is my PR and current master with no checks. (Note test_may_not_use_overflowing_integers expectedly fails, but it only takes ~0.08s overall in HEAD when I look at "slowest 20 durations".)

commit	gain	time
`HEAD~1` w/ no checks	+16%	25.12
`master` w/ no checks	+14.4%	25.60

Of course there's some variation in pytest runs, so I wouldn't care about one or two percent differences in benchmarks.

So unless our... hypothesis... that shared caching would really help is infact wrong, I've probably messed something up. I did write a test case that the cache does infact get shared in test_check_hist_shared_between_instances, hmm. I'll have to sleep on it!

Note I kept the bounded performance checks when benchmarking HEAD and HEAD~1. I'll have to see how it hurts performance again, and experiment with improvements.

Also next time I'll benchmark with just test_arrays.py.

Theelx · 2021-09-26T21:42:43Z

hypothesis-python/src/hypothesis/extra/array_api.py


    def do_draw(self, data):
        if 0 in self.shape:
            return self.xp.zeros(self.shape, dtype=self.dtype)

+        # We reset check_hist when it reaches an arbitrarily large size to
+        # prevent unbounded memory usage.
+        if len(self.check_hist[self.dtype]) >= 100_000:


Have you considered resetting the set at 157284 items? A python set reallocates for new elements following the pattern 0 --> 4 --> 18 --> 76 --> 306 --> 1228 --> 4914 --> 19660 --> 78642 --> 157285 --> 314572 --> ..., where once the set reaches 0, 4, 18, etc items, it resizes the set behind the scenes to prepare for using more memory. By resetting right before the allocation at 157285 items, it allows the cache to grow larger without allowing it to resize and prepare space for another 150k more items.

You could also reset at 78641 of course, I haven't tested the speed either way. Unless it's set to 75_000, 100_000, 150_000, or some nice looking number, it'd be useful to include in the comments why you chose the number you did though. I found out these numbers with some basic brute-force tests as I can't be bothered to put effort into understanding the C source code, however if somebody has time then they can check out https://github.com/python/cpython/blob/f25f2e2e8c8e48490d22b0cdf67f575608701f6f/Objects/setobject.c#L177. Be warned though, it doesn't follow a clean doubling after 50k items though, and I'm not sure why.

75_000 would be a good reset point then; it's > 2**16 and below the realloc threshold. I'd be very surprised if this made a measurable difference to performance though.

Performance-wise, it won't make a huge difference, at most a few hundredths of a second every time it clears. Memory-wise though, it prevents python from allocating a bunch more space than it needs to, which can make a difference of 4 MB of RAM for each such set that is prevented from resizing larger (a set with 150k or so elements takes up 8MB of space, not including the elements in it, while a set with 75k takes up 4MB).

Zac-HD · 2021-09-27T06:35:21Z

Hmm, if nochecks is +16% and shared-cache is +15.5% then we seem to be very close to the maximum possible gain from caching (and in particular it doesn't seem worth adding a no-cache special case).

So unless our... hypothesis... that shared caching would really help is infact wrong, I've probably messed something up. I did write a test case that the cache does infact get shared in test_check_hist_shared_between_instances, hmm. I'll have to sleep on it!

There's a cache on arrays(), so if you construct with the same arguments you'll get the same underlying ArrayStrategy object! This probably explains why the non-shared cache didn't degrade much, though it still might if you had e.g. varied elements strategies.

Overall: I'd be happy to merge a fairly simple caching strategy for any >= 10% speedup.

@Theelx

Thanks @Theelx

honno · 2021-09-27T12:13:29Z

Benchmarking pytest tests/array_api/test_arrays.py (excluding test_check_hist_*, avg. of 5 runs) doesn't show as great a performance gain 😕

I also tried putting the check_hist variable into the module level (to not interfere with the strategy caching stuff... right?), as opposed to being an ArrayStrategy class attribute—seems significantly better (just couldn't figure out how to test it today).

version	gain	time
`HEAD` \w module-level cache	+5.6%	14.06
`HEAD`	+1.5%	14.56
`master`	+0.0%	14.80

I'm confused because xps.arrays() is barely used outside of test_arrays.py... yet again, need to revisit how I benchmark this.

Theelx · 2021-09-27T17:18:02Z

When I try to run the benchmark scripts with numpy 1.21.2, I get this error:

from numpy import array_api as xp
ImportError: cannot import name 'array_api' from 'numpy'

What version of numpy are you using?

Edit: replacing that with import numpy as xp works fine.

honno · 2021-09-27T17:24:09Z

What version of numpy are you using?

@Theelx Ah the Array API isn't in a release yet, you'll have to build it yourself, or get the nightly build:

pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy

Edit: replacing that with import numpy as xp works fine.

Note numpy is very similiar to numpy.array_api, but there are tons of subtle differences. For purposes of how it's used in Hypothesis, those differences don't matter (we infact mock the Array API with just numpy proper, see mock_xp at the bottom of array_api.py.

honno · 2021-09-28T08:28:56Z

I replaced my use of self.check_hist with ArrayStrategy.check_hist inside of ArrayStrategy, which I believe allows value-checking caching to bypass the existing strategy-calls-with-identical-args caching—I think I'm correctly testing that this is the case in test_check_hist_persists_between_instances.

I'll work out a benchmark today/tomorrow.

honno · 2021-09-28T15:03:40Z

Good news, I'm pretty happy with this benchmarking script now. It basically mocks running pytest test/array_api/test_arrays a few times... also I'm sure to leave my computer alone when it's running haha.

Bad news, caching might not be worth it. Same goes for doing away with checks completely.

version	gain	time
`HEAD` but no checks	+5.1%	13.52 (+ 0.08 for failing tests)
`HEAD` (`ArrayStrategy.check_hist`)	+2.0%	14.04
`HEAD` with no caching (control)	+0.0%	14.33

Raw

HEAD (ArrayStrategy.check_hist)
 14.40
 14.22
 13.73
 13.93
 13.85
 14.04
 14.08
 14.00
 14.09
 14.05
------
 14.04  (min=13.73, max=14.40, dev=0.18)

HEAD but no caching

 14.47
 14.00
 14.27
 14.47
 14.39
 14.31
 14.14
 14.72
 14.33
 14.24
------
 14.33  (min=14.00, max=14.72, dev=0.19)

HEAD but no set value checking
 13.90
 13.67
 13.63
 13.57
 13.42
 13.40
 13.38
 13.38
 13.42
 13.49
------
 13.52  (min=13.38, max=13.90, dev=0.16)

HEAD~1 but self.check_hist

 14.16
 13.93
 13.79
 14.04
 14.04
 13.98
 14.08
 14.06
 13.89
 14.04
------
 14.00  (min=13.79, max=14.16, dev=0.10)

HEAD but module-level check_hist
 14.18
 13.99
 13.89
 13.77
 13.94
 13.97
 14.09
 14.05
 14.07
 14.06
------
 14.00  (min=13.77, max=14.18, dev=0.11)

@Zac-HD Anything else you'd like me to explore? I could see about benchmarking high-collision scenarios, although as I said before I don't know if they're used much.

If you'd like to ditch this for now, I'll be sure to benchmark a no checks scenario again when other array libraries get compliant-enough for extra.array_api. That would be interesting on its own, and could lead to exploring caching. Not to be pre-emptive, but just to keep in mind that benchmarking builtin casting would be a good starting point:

$ python -c "from numpy import array_api as xp; from timeit import timeit; print(timeit(lambda: int(xp.asarray(42, dtype=xp.uint8))))"
3.403678069000307
$ python -c "import torch; from timeit import timeit; print(timeit(lambda: int(torch.as_tensor(42, dtype=torch.uint8))))"
3.1453401199978543

Hopefully you still want the clean-up part of this PR—if so I can drop the caching stuff here, or start a new one with just that. I'll save what I've done here anyway, but it might be a good idea to keep this PR closed as-is, as especially the tests could be interesting for the future. I also need to explore element strategies generating 0d arrays.

Zac-HD · 2021-09-29T00:46:54Z

I think our conclusions are that caching doesn't give enough of a speedup to justify the code complexity, and a no-checks path probably doesn't either (though I might experiment with this later).

On that basis, let's close this PR to leave it as a linkable reference. I'll be happy to take a PR inlining set_element() for the cleanup part at your convenience 😀

honno added 2 commits September 21, 2021 19:53

Cleaned-up extra.array_api.ArrayStrategy, slightly optimised checks

c76c3ab

Add RELEASE.rst

5c95616

honno force-pushed the optimise-array-strat branch from 58db14d to 5c95616 Compare September 21, 2021 18:53

Zac-HD reviewed Sep 22, 2021

View reviewed changes

honno and others added 3 commits September 22, 2021 17:26

Prevent unbounded check_hist, add related comments and tests

f0b692b

Deterministic error message for no-fill, multi-types path

bbc69a4

Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>

Remove dots that break comment links

78d2dde

honno requested a review from Zac-HD September 22, 2021 17:38

Comment how check_hist sharing test works

da5a173

honno force-pushed the optimise-array-strat branch from 0b99c6e to da5a173 Compare September 22, 2021 17:48

This comment has been minimized.

Sign in to view

Shared cache between ArrayStrategy instances

06e2593

honno force-pushed the optimise-array-strat branch from 0c496ee to 06e2593 Compare September 26, 2021 20:39

Theelx reviewed Sep 26, 2021

View reviewed changes

Bound caches to 75_000 rather than 100_000

05aedb2

Thanks @Theelx

Use check_hist as a class attribute and not instance attribute

cd12a9b

Zac-HD closed this Sep 29, 2021

honno mentioned this pull request Sep 29, 2021

Clean up extra.array_api.ArrayStrategy #3109

Merged

honno deleted the optimise-array-strat branch February 28, 2024 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean-up and slightly optimise `extra.array_api.ArrayStrategy` #3105

Clean-up and slightly optimise `extra.array_api.ArrayStrategy` #3105

honno commented Sep 21, 2021

Zac-HD left a comment

Zac-HD Sep 22, 2021

honno Sep 22, 2021

Zac-HD Sep 23, 2021

honno Sep 23, 2021 •

edited

honno commented Sep 22, 2021 •

edited

This comment has been minimized.

This comment has been minimized.

rsokl commented Sep 22, 2021 •

edited

honno commented Sep 22, 2021 •

edited

Zac-HD commented Sep 23, 2021

honno commented Sep 23, 2021 •

edited

Zac-HD commented Sep 23, 2021

honno commented Sep 23, 2021

Zac-HD commented Sep 23, 2021

honno commented Sep 26, 2021 •

edited

Theelx Sep 26, 2021

Theelx Sep 26, 2021

Zac-HD Sep 27, 2021

Theelx Sep 27, 2021 •

edited

Zac-HD commented Sep 27, 2021

honno commented Sep 27, 2021 •

edited

Theelx commented Sep 27, 2021 •

edited

honno commented Sep 27, 2021 •

edited

honno commented Sep 28, 2021 •

edited

honno commented Sep 28, 2021 •

edited

Zac-HD commented Sep 29, 2021

Clean-up and slightly optimise extra.array_api.ArrayStrategy #3105

Clean-up and slightly optimise extra.array_api.ArrayStrategy #3105

Conversation

honno commented Sep 21, 2021

Zac-HD left a comment

Choose a reason for hiding this comment

Zac-HD Sep 22, 2021

Choose a reason for hiding this comment

honno Sep 22, 2021

Choose a reason for hiding this comment

Zac-HD Sep 23, 2021

Choose a reason for hiding this comment

honno Sep 23, 2021 • edited

Choose a reason for hiding this comment

honno commented Sep 22, 2021 • edited

This comment has been minimized.

This comment has been minimized.

rsokl commented Sep 22, 2021 • edited

honno commented Sep 22, 2021 • edited

Zac-HD commented Sep 23, 2021

honno commented Sep 23, 2021 • edited

Zac-HD commented Sep 23, 2021

honno commented Sep 23, 2021

Zac-HD commented Sep 23, 2021

honno commented Sep 26, 2021 • edited

Theelx Sep 26, 2021

Choose a reason for hiding this comment

Theelx Sep 26, 2021

Choose a reason for hiding this comment

Zac-HD Sep 27, 2021

Choose a reason for hiding this comment

Theelx Sep 27, 2021 • edited

Choose a reason for hiding this comment

Zac-HD commented Sep 27, 2021

honno commented Sep 27, 2021 • edited

Theelx commented Sep 27, 2021 • edited

honno commented Sep 27, 2021 • edited

honno commented Sep 28, 2021 • edited

honno commented Sep 28, 2021 • edited

Zac-HD commented Sep 29, 2021

Clean-up and slightly optimise `extra.array_api.ArrayStrategy` #3105

Clean-up and slightly optimise `extra.array_api.ArrayStrategy` #3105

honno Sep 23, 2021 •

edited

honno commented Sep 22, 2021 •

edited

rsokl commented Sep 22, 2021 •

edited

honno commented Sep 22, 2021 •

edited

honno commented Sep 23, 2021 •

edited

honno commented Sep 26, 2021 •

edited

Theelx Sep 27, 2021 •

edited

honno commented Sep 27, 2021 •

edited

Theelx commented Sep 27, 2021 •

edited

honno commented Sep 27, 2021 •

edited

honno commented Sep 28, 2021 •

edited

honno commented Sep 28, 2021 •

edited