Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Suite: AbstractArrays and Numbers #98

Open
willtebbutt opened this issue Dec 28, 2020 · 12 comments
Open

Test Suite: AbstractArrays and Numbers #98

willtebbutt opened this issue Dec 28, 2020 · 12 comments
Labels
enhancement New feature or request

Comments

@willtebbutt
Copy link
Member

willtebbutt commented Dec 28, 2020

This PR is the latest to propose a rule for AbstractArrays that, in my opinion, is too broadly-typed. While I obviously sympathise with rule authors wanting to do this (it feels like the natural thing to do), we know that it causes real problems.

I wonder whether we should implement a test util comprising rrule_tests and frule_tests on a suite of different types of arrays that we insist any new rule that purports to support AbstractArrays, or some subtype thereof, must pass. The idea being would would include

  1. examples of every concrete type of array that we can find in Base / the standard libraries
  2. examples from popular external packages, such as StaticArrays, FillArrays, and BlockArrays.

Crucially, I propose that the tests should pass if, for all arrays in the suite, either

  1. the standard rule tests pass, or
  2. no rule is found for that array.

This means that the rule author can pick and choose the kinds of arrays that they support, and obvious "gaming" of the tests would be addressed at review time (i.e. the idea is not to implement a rule of AbstractArrays and then implement rules that return nothing for each non-Array array in the test suite).

This approach is obviously not a panacea since you clearly can't test against all concrete subtypes of AbstractArray that will ever be written. Rather, the goal is to have enough coverage of types of arrays that break assumptions that hold for Array-like AbstractArrays to ensure that rule authors and reviewers don't accidentally implement something too loosely-typed. My hope is that this will encourage rule authors (and reviewers) to be cognisant of the issues surrounding implementing rules for abstract types.

While Numbers have generally been less troublesome in practice, they obviously suffer from the same issues and have caused problems from time to time. It might make sense to consider a similar thing for Numbers.

Maybe there are some other common types where this stuff goes wrong on a regular basis?

I'm not entirely sure how this should look in practice (do we want a formal testing function that actually runs a load of tests, or do we just want to provide a global const Vector of subtypes of AbstractArray that we insist gets used when writing tests?), but if people like the general idea then we can think a little more carefully about how to make it work.

@nickrobinson251 nickrobinson251 added the enhancement New feature or request label Dec 28, 2020
@nickrobinson251
Copy link
Contributor

nickrobinson251 commented Jan 1, 2021

potentially a case where this would be useful to have: JuliaDiff/ChainRules.jl#337 (comment)

I made a label on ChainRules.jl for issues/PRs where something like this may be useful (or which more generally raise questions about type constraints)
https://github.com/JuliaDiff/ChainRules.jl/labels/type%20constraints
https://github.com/JuliaDiff/ChainRules.jl/pulls?q=is%3Apr+label%3A%22type+constraints%22+

@mcabbott
Copy link
Member

mcabbott commented Jan 1, 2021

Testing generic code more deeply is a great idea, but I'd vote for a less adversarial phrasing phrasing here. We're not trying to keep hackers away from some automated bug million-dollar bug bounty; we are trying to supplement human eyeballs looking for oversights. And trying to make things that are widely useful, even if this is more work than treating just Vector{Float64}.

Perhaps we can simultaneously cut a lot of boilerplate out of existing tests -- so many of them go through this dance of making 3 sets of random arrays all the time, and extending that to more types gets ever more complicated. Maybe rrule_test should just accept sizes (for the most common situation of testing on arrays) and then it could easily try several things of that size:

rrule_test(f, Rand(3,))  # test on randn(3), randn(ComplexF64, 3), 1:3, @SVector(rand(3))
rrule_test(f, Rand(3,3))  # test on randn(3,3), and Diagonal(rand(3)), and Hermitian(rand(ComplexF64,3,3))
rrule_test(f, Rand(2,3,4)) # test on randn(2,3,4), and PermutedDimsArray(rand(2,3,4), (1,2,3))

I don't quite know what the list is, but I propose that it should not be an exhaustive combinatorial explosion. It needs some structural zeros, some immutable arrays, some complex & some integers. To try to cover the various ways we've seen things go wrong. (And again, the goal is to look for mistakes in generic code, which isn't specialising on the particular types.) Here Rand is some trivial struct to signal this, so as to allow scalar arguments like rrule_test(norm, Rand(3), 2) without confusion (although obviously other interfaces can be imagined.)

@willtebbutt
Copy link
Member Author

willtebbutt commented Jan 1, 2021

Testing generic code more deeply is a great idea, but I'd vote for a less adversarial phrasing phrasing here. We're not trying to keep hackers away from some automated bug million-dollar bug bounty; we are trying to supplement human eyeballs looking for oversights. And trying to make things that are widely useful, even if this is more work than treating just Vector{Float64}.

Apologies for coming across as adversarial -- this wasn't my intention. I agree with your sentiment that this is about supporting rule-implementers by giving them better tooling to make it easier to do a good job of implementing rules.

Perhaps we can simultaneously cut a lot of boilerplate out of existing tests -- so many of them go through this dance of making 3 sets of random arrays all the time, and extending that to more types gets ever more complicated. Maybe rrule_test should just accept sizes (for the most common situation of testing on arrays) and then it could easily try several things of that size:

This definitely seems like a good idea.

I don't quite know what the list is, but I propose that it should not be an exhaustive combinatorial explosion.

Agreed -- the tests already take long enough to evaluate 😬 .

@mcabbott
Copy link
Member

mcabbott commented Jan 3, 2021

Here's a sketch of how a testing thing might work: mcabbott@6964a69

Maybe someone with more of the test set loaded into cortex can say whether this seems like the right sort of thing. Should there be an easy way to specify just Array, and what should it look like?

@willtebbutt
Copy link
Member Author

I like this approach. One quick question though: how do you envision ensuring an author that a certain combination of things get tested? Manually call rrule_test?

@mcabbott
Copy link
Member

mcabbott commented Jan 7, 2021

I'm not sure I follow, but I was not envisaging providing a way to specify an arbitrary list of types. You could still test any one particular weird thing as you can today, in addition to or instead of whatever standard list Rand uses.

I wrote it with a Real/Complex option, but perhaps that's a waste if almost all rules support complex numbers. Perhaps it would be nice to allow (say) Rand(Array, 3, 3) to mean "test with real & complex numbers here, but no strange types" as that seems more common?

@willtebbutt
Copy link
Member Author

I'm not sure I follow, but I was not envisaging providing a way to specify an arbitrary list of types. You could still test any one particular weird thing as you can today, in addition to or instead of whatever standard list Rand uses.

Great, that makes sense to me.

I wrote it with a Real/Complex option, but perhaps that's a waste if almost all rules support complex numbers. Perhaps it would be nice to allow (say) Rand(Array, 3, 3) to mean "test with real & complex numbers here, but no strange types" as that seems more common?

I think generally testing with both Real and Complex is a good idea. By "strange types" you meaning something like quaternions, or some other less common number type?

@mcabbott
Copy link
Member

mcabbott commented Jan 7, 2021

Sorry, by "strange types" I mean the various non-Array types this tests (from LinearAlgebra, plus Base's view/permuted/reshaped etc.). Those are the main goal here, but perhaps a second mode which tests only real + complex Array would be nice to have, just to simplify existing tests.

Are there other modes worth adding? A way to test only real numbers probably isn't necessary. Arrays of booleans are special but perhaps also not worth it.

@willtebbutt
Copy link
Member Author

Oh I see. Good question. With your gist as it currently is, I agree that it makes sense to have a few different modes. However, if you were to refactor slightly so that the tests pass if no rule is found (i.e. more along the lines of what I proposed initially) then I'm not sure it makes sense to provide that option, since the utility of the tests would be to make sure that new rule doesn't accidentally catch cases that it didn't mean to catch. So I guess it depends on how we plan to use these tests.

@mcabbott
Copy link
Member

mcabbott commented Jan 7, 2021

OK. The reason I wasn't so keen on the automated approach is that accidentally writing the signature more narrowly than you intended seems like an easy mistake to make (e.g. you forget one <:) and this would then silently pass. I'd prefer it to clear from looking at the tests what's expected to work.

@willtebbutt
Copy link
Member Author

Ohh I see. Maybe these tests need to be able to operate in either of two modes then:

  1. limited scope: all types must return a rule, all rules must be correct
  2. broad scope: not all types must return a rule, all rules returned must be correct
    ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants