Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Create truly constrained non-deterministic numbers #2

Closed
ploeh opened this Issue · 9 comments

2 participants

Mark Seemann Nikos Baxevanis
Mark Seemann
Owner

Numbers are currently created using a strictly monotonically increasing sequence. This may be too deterministic, causing test developers to (inadvertently) take advantage of this trait, as a result creating brittle tests.

Numbers should be created according to an appropriate assumption about a proper default Equivalence Class for numbers.

For numbers (integers, decimal values and floats alike) the assumption is that a small positive integer is rarely a problem. While negative numbers or zero are frequently invalid in various APIs, positive numbers tend to be valid. Even when input can be a decimal or a float, integers (the mathematical concept, not the data type) tend to be valid too. Large numbers can sometimes cause problems for APIs, so small numbers are preferred.

That’s what originally caused me to come up with the algorithm of using a strictly monotonically increasing sequence of integers, because that was the simplest algorithm I could come up with that fit the criteria.

A better algorithm would be to pick random small (e.g. less than 256) positive integers, making sure that none are repeated, and then expand to a bigger range (e.g. less than 32767) until that one is all used up and so on. This could be a new optional ISpecimenBuilder in AutoFixture 2.x, but should be the default in AutoFixture 3.0.

Nikos Baxevanis
Owner

For each current range of positive numbers, we could pick the lower bound randomly and set the highest value as the upper bound. Then, we could first return the Boundary Values (the upper and lower bounds) and then continue with the rest of the numbers.

As an example, for small positive integers up to 256 we could have a range (lower bound, 256) where lower bound has been selected randomly.

If lower bound is 30, we could return: 30, 256, random, etc.

That way we may provide the best set of input values by using the Boundary Values of the default Equivalence Class.

Mark Seemann
Owner

Why would you want to randomly pick the lower bound?

Nikos Baxevanis
Owner

Actually, I assume that we start with the Boundary Values of the current range.

In the default Equivalence Class the lower bound could always be a very small number, possibly 1. In that case, we could define a non-deterministic set of values constrained to the current, default, range. That way we could randomly pick the lower bound to make sure that small positive numbers could be returned but not always by default.

To verify a particular behavior of the SUT, this may be overridden with a custom Equivalence Class. If we can also inject a custom range (e.g. as we can do with URI schemes) then we would have to use both lower and upper bounds.

Mark Seemann
Owner

Given that we already pick randomly from each set, I don't see a reason to make things more complicated than defining progressive sets like this:

[1, 255]
[256, 32767]
etc.

What purpose would it serve to pick the lower bound randomly?

Nikos Baxevanis
Owner

If we would start with the Boundary Values we would have 1 255 random random etc. Picking randomly the lower bound would avoid always returning 1.

If that's not the case for the default Equivalence Class there is no reason to pick the lower bound randomly.

Mark Seemann
Owner

It was never my intention that the algorithm had to start with the boundary values. The algorithm I had in mind was that it should start picking random numbers from the set [1, 255], and when that set was depleted, then random numbers from the set [256, 32767], etc.

Thus, there's a 1 in 255 chance that the first number would be 1, but I think that's perfectly fine. 1 is a good number :)

Nikos Baxevanis
Owner

How about naming the new ISpecimenBuilder UnorderedNumericSequenceGenerator? At first I though about the Random word as part of the name but since it never returns duplicates it is not that random..

The range turned up slightly different - The default range (before it is expanded) is 255. In that range, [1, 255], the code yields unique, unordered, numbers. When it runs out of numbers the range is expanded to [256, 65025] where 65025 is (255 * 255).

The reason for expanding the range in that way is because there is a constructor overload that allows the user to specify a custom range to start with (e.g. 10, which will yield unique numbers in the range [1, 10], [11, 100], [101, 10000], and so on).

So, basically, the only difference is that after [1, 255] instead of expanding to [256, 32767] it expands to [256, 65025], and so on - and there is also support for a custom range to start with.

How does that sound? :)

Mark Seemann
Owner

Well, I picked the original ranges to match the limit of CLS compliant primitives... The smallest CLS compliant primitive is System.Byte, whic has an upper limit of 255. The next CLS compliant primitive is Int16, which has an upper bound of 32767. The reason why this is interesting is that we tend to use the same sequence for various different data types.

This means that we are guaranteed that we can always generate 255 numbers before a byte will overflow, and we can always generate 32767 numbers before an Int16 will overflow. If we used 65025 as the next number, once we were past the first 255 numbers, there would be a 50 % risk that an Int16 would overflow.

If we want to be able to customize the ranges I guess it can be done with a simple sequence of limits, e.g. 255, 32767, 2147483647... but I don't think that's terribly important :)

About the name... Unordered is fine, but I'd personally prefer Random. The algorithm most certainly is random - it's just a question of whether or not you allow a picked number to return to the pool. What you were thinking about as 'more random' is what in university we called a random walk. The algorithm we are currently discussing isn't a random walk - it's more like a random selection process, just like in game of bingo where numbers are randomly drawn from a pool of numbers and, once drawn, the number isn't returned to the pool. It's still random - one would hope :)

Nikos Baxevanis moodmosaic was assigned
Nikos Baxevanis
Owner

Resolved in version 2.13.0 (changeset 2b72506)

Nikos Baxevanis moodmosaic closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.