Skip to content

Conversation

WebDrake
Copy link
Contributor

This patch fixes a problem where the public methods .index() or .popFront() might be called without the first value of the sample having been determined, which would then cause spurious results. The runtime initialization check currently performed in .front has been extended to those methods, with a number of tweaks to the implementation that result in an overall performance improvement.

In addition, the .index method has been made a @Property, which I take to be an oversight of the original code.

@monarchdodra
Copy link
Collaborator

I'm going to paste this on every pull that has indent changes from now on.

To view the diff ignoring whitespace changes, add ?w=1 to the end of the path. EG:

https://github.com/D-Programming-Language/phobos/pull/1533/files?w=1

Makes reviewing much easier in a lot of cases.

@WebDrake
Copy link
Contributor Author

Good call. It's a shame there isn't a meld-like view in GitHub that I'm aware of -- makes these things much easier desktop-side.

What do you think of the solution here? I was a little nervous about the delegate-based approach, whether it had safety implications, but it does make for a more elegant and overall faster solution.

@WebDrake
Copy link
Contributor Author

Oh, one technical note -- there's no unittest in place for the case where .popFront() is the first public function to be called, just because it's a bit tricky to work out how this should be handled in a unittest scenario. The measure of whether it's working or not is the statistical distribution of items in the (remainder of) the sample.

Without the check for initialization, then (with Algorithm D) the popFront'ed sample would just give you an evenly distributed sample of size (n - 1), which is obviously wrong -- if you instead take whole samples and then look at the statistical distribution of all points except the first, you'd get a bias against earlier values in the input sequence.

Probably this is something whose testing should be part of a separate quality-of-randomness test suite.

@WebDrake
Copy link
Contributor Author

There's no unittest in place for the case where .popFront() is the first public function to be called

Although if .popFront() didn't correctly account for this, it would fail, because skip would be null when called. So maybe I'm worrying too much.

@WebDrake
Copy link
Contributor Author

DON'T MERGE for now. I think there may be an issue with the current code. Closing, will re-open once it's fixed or shown to be a false alarm.

@WebDrake WebDrake closed this Aug 31, 2013
@WebDrake
Copy link
Contributor Author

The problem seems to be that inside the skipA() or skipD() functions, whichever is used, the perceived values of _toSelect and _available get out of sync with what they actually are (and are reported to be in other functions). Suggests that the delegate approach has some problems?

@WebDrake
Copy link
Contributor Author

WebDrake commented Sep 1, 2013

OK, the reason is now clear. A delegate is a reference type. When you do something like this:

auto sample = randomSample(...);
sample.popFront();
foreach (s; sample)
{
   ...
}

... the foreach loop takes a copy of the sample. This means that while all the internal values (_toSelect, _available, etc.) are copied, the delegate -- if non-null -- will point to the function in the original.

This doesn't matter if initializeFront() has not yet been called, because then the value of the delegate is null and it'll be reset to point to the internal function in the copy. But if it has been called before the foreach loop starts, it'll continue to point to the function in the original sample, and hence will continue to see the _toSelect and _available values from the original and not the copy. And hence it will generate wrong skip values.

It's (i) a clear sign that RandomSample also needs to be a reference range, and (ii) currently a fundamental flaw in the delegate-based approach, which is a shame, because it does allow for speedup :-(

@WebDrake WebDrake reopened this Sep 2, 2013
@WebDrake
Copy link
Contributor Author

WebDrake commented Sep 2, 2013

The added patches should fix the identified issue with the delegate. I owe thanks to Artur Skawina for suggesting this technique -- I would have reverted to an if (_algorithmA) approach!

As Artur points out there is a small amount of maintenance risk here should anyone subsequently modify the code to try and call the _skip function pointer directly. Perhaps a warning message not to do this?

@WebDrake
Copy link
Contributor Author

WebDrake commented Sep 2, 2013

Hmm, looks like my bounds for the new test case are too narrow :(

@WebDrake
Copy link
Contributor Author

WebDrake commented Sep 4, 2013

Any chance of getting some review here? :-)

I think there are 2 principal issues to address -- (i) is the delegate-based approach to the skip() function OK, or would it be preferable to go back to an if (_algorithmA) style approach? And (ii) is the statistical test introduced OK, or too much weight to carry for unittests? A .popFront() followed by foreach is probably good to keep, but we don't have to repeat it many times and check the numbers, that can be farmed out to a random test suite.

@monarchdodra
Copy link
Collaborator

Any chance of getting some review here? :-)

Apologies. I did a quick read, it looks ok in principle. I'll have to brush up on the delegate skills though (as well as study the thread you created with the difficulties you faced). Things would have been simpler with a simple enum (eg: enum State {NotYetInitialized, UseA, UseB, }), but if you made the effort to write a solution that works, there is no reason to reject it.

In regards to the delegate problems: Does D outright ban simple member function pointers? It seems it would have been simpler and safer when a struct stores it in itself...

@WebDrake
Copy link
Contributor Author

WebDrake commented Sep 5, 2013

I'm not at all precious or possessive about my code, so if another solution would be preferred I'm happy to implement it. Using delegates minimises the amount of if (...) tests needed but I can understand if the maintenance or readability cost is deemed too high.

@monarchdodra
Copy link
Collaborator

I'm not at all precious or possessive about my code, so if another solution would be preferred I'm happy to implement it. Using delegates minimises the amount of if (...) tests needed but I can understand if the maintenance or readability cost is deemed too high.

No, I understand what you did, and why, and it's a smart choice. In this particular case though, using delegates is working against us. Delegates store a pointer to this, which basically means you have an internal pointer, and these are banned for a reason (as you have probably noticed). You are working around it, but... just :puke:

I was going to suggest opAssign and postblit but: Still internal pointer, so there will still be ugly traps.

If a language feature is working against us, it means we are using the wrong language feature: "pointer to member functions, eg: delegates". In particular, if we already have "this", then we don't actually need the functions to be "member", do we? Why not just make skipA and skipD static functions, eg:

static skipA(ref typeof(this) that);
static skipD(ref typeof(this) that);

Then your code basically becomes:

if (!skip)
    skip = &skipA;
skip(this);

Thoughts?

@andralex
Copy link
Member

andralex commented Sep 5, 2013

Using an indirect call instead of a test is overkill, so that must go. I'm not sure how the benchmark was set up but the test is almost always better. Perhaps the matter is an icache spill which could be fixed by having an if test followed by two function calls (instead of inline code).

@WebDrake
Copy link
Contributor Author

WebDrake commented Sep 5, 2013

No, I understand what you did, and why, and it's a smart choice. In this particular case though, using delegates is working against us. Delegates store a pointer to this, which basically means you have an internal pointer, and these are banned for a reason (as you have probably noticed). You are working around it, but... just :puke:

Yea, I found myself feeling similar about the workaround to be honest. I'll tweak accordingly.

Using an indirect call instead of a test is overkill, so that must go. I'm not sure how the benchmark was set up but the test is almost always better. Perhaps the matter is an icache spill which could be fixed by having an if test followed by two function calls (instead of inline code).

OK, I'll give it a go. :-)

@WebDrake
Copy link
Contributor Author

WebDrake commented Sep 8, 2013

OK, I worked out why the indirect call was seemingly faster than the test.

The original skipD() function included a test that determined if Algorithm D really should be used, or whether it was better to switch to Algorithm A:

if ((_alphaInverse * _toSelect) > _available)
{
    _algorithmA = true;
    return skipA();
}

Once that's moved out of skipD() to the controller function, skip(), everything speeds up. So skip() now reads:

    private size_t skip()
    {
        assert(_skip != Skip.None);

        // Step D1: if the number of points still to select is greater
        // than a certain proportion of the remaining data points, i.e.
        // if n >= alpha * N where alpha = 1/13, we carry out the
        // sampling with Algorithm A.
        if (_skip == Skip.A)
        {
            return skipA();
        }
        else if ((_alphaInverse * _toSelect) > _available)
        {
            // We shouldn't get here unless the current selected
            // algorithm is D.
            assert(_skip == Skip.D);
            _skip = Skip.A;
            return skipA();
        }
        else
        {
            assert(_skip == Skip.D);
            return skipD();
        }
    }

This results in a significant speed-up over all the previous implementations.

You can check my benchmarking code here: http://ubuntuone.com/2dFQHQiv8dSgxTzyeD48XW

This .tar.gz file contains sample code plus four different versions of std.random: the first is std.random as in current Phobos, the second is the delegate-based approach, the 3rd is that of the penultimate commit (4ad68f9) and the 4th is the current final version in this pull request, i.e. with the extra check moved into skip().

The benchmark results contained in the .zip file were the result of compiling with:

ldmd2 -O -inline -release -noboundscheck sample.d test*/random.d

@WebDrake
Copy link
Contributor Author

WebDrake commented Sep 9, 2013

Assuming this version is OK, would you like me to clean up the patchset? I can probably reduce it to one single patch or a smaller series of more coherent ones.

@WebDrake
Copy link
Contributor Author

I take it these test failures aren't my fault ... :-\

Anyway, ping? :-)

@WebDrake
Copy link
Contributor Author

.... ping? I seem to be getting into test-failure hell here.

@monarchdodra
Copy link
Collaborator

I seem to be getting into test-failure hell here.

Don't worry about it. We all know there are odd issues in the auto tester. We don't actually wait for 10/10 to make a merge. A quick look at the auto tester history is good enough.

Anyway, ping? :-)

.... ping?

Yeah... sorry. It's on my to do list. It's on my todo list. I'll get to it sometime this week.

@WebDrake
Copy link
Contributor Author

Hey, it's all green. Ping? :-)

@monarchdodra
Copy link
Collaborator

Sorry for the delay. This looks good for pulling, but 2 things:

  1. http://d.puremagic.com/test-results/pull.ghtml?projectid=1&runid=740287

It's rare, but your test can fail. Randomly (literally) failing tests will be a pain for every one. You bumped up the tolerance for count1 (WebDrake@f480ac0), can you also bump count99's from 250 to 300 ?

  1. (if at all possible), could you squash a bit?

@WebDrake
Copy link
Contributor Author

Yes, I already bumped up the tolerance but not enough it seems :-( I think that in the long run this might be worth taking out and handing over to a more intense random test suite which will use larger sample size, but for now I'll leave it in with the increased tolerance as you suggest.

I'll update later today with a single-commit version of all this.

This patch fixes a problem where the public methods .index()
or .popFront() might be called without the first value of the
sample having been determined, which would then cause spurious
results.  The runtime initialization check currently performed
in .front has been extended to those methods.

The private boolean checks in the previous implementation have
been replaced with an enum indicating the algorithm to be used
(A, D or None) with None indicating that the sample has not
been initialized.

Step D1 of Algorithm D has been moved to the skip() function,
which results in a significant performance boost.

Unittests have been introduced to cover the cases where .index
or .popFront() are called before .front.

Finally, the .index method has been made a @Property, which I
take to be an oversight of the original code.
@WebDrake
Copy link
Contributor Author

Single-patch rebase, and I've updated the pull request summary. :-)

@WebDrake
Copy link
Contributor Author

Oh, and the patch includes updated tolerance for the .popFront() test as you requested.

@WebDrake
Copy link
Contributor Author

All tests green. Ping? :-)

monarchdodra added a commit that referenced this pull request Sep 26, 2013
Fix Issue 10322 - ensure RandomSample is initialized before use
@monarchdodra monarchdodra merged commit cedee16 into dlang:master Sep 26, 2013
@monarchdodra
Copy link
Collaborator

:-)

@WebDrake
Copy link
Contributor Author

Thanks! This was a fun one to work on -- I think I learned useful stuff doing it :-)

@WebDrake WebDrake deleted the randomsample-init branch September 26, 2013 19:37
assert(count0 == 0);
assert(count1 < 300, text("1: ", count1, " > 300."));
assert(4_700 < count99, text("99: ", count99, " < 4700."));
assert(count99 < 5_300, text("99: ", count99, " > 5300."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was always a possibility. This is rather a crude test -- mostly a sanity check -- and in the unittesting scenario we don't have time to run something really statistically rigorous. The upper/lower bounds were set to be quite forgiving, but there's always a possibility of a rare event where we do just get an extreme of variance.

If it repeats frequently then I'd look into it further, but most likely it's just a rare one-off that we should be aware of as possible. I suppose we could make the bounds even more forgiving, but given how many unittest runs have taken place without a failure, I'd rather not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeterministic unit tests are always dangerous. However, let's assume for the moment that there is no better way of writing the tests in question. In that case, the fact that the tests can fail with a small probability should be noted somewhere, and in such a way that it is obvious even to the non-statistically inclined (preferably in the immediate vicinity of the assertions in question).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you like it noted? Via an extra paragraph added to the comment above the asserts? Or in the assert messages themselves?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either of those would be fine. Just make it very obvious to anybody going to the failing line in a text editor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Thanks for bringing this to my attention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW it was obvious.

@schveiguy
Copy link
Member

Please, let's not have any non-deterministic tests in the test suite. Everything should be repeatable. This test is not only hurting because it fails, it's hurting because I can't have any idea why it would fail -- the seed used is gone.

I'd much rather use hard-coded seeds, and forget about testing the "quality" of the random algorithm. If anyone can find a seed that causes it to be poor quality, let's examine that, figure out if we have to add that seed, and be done with it.

I'm considering that for unit tests, we should change unpredictableSeed to be predictable when running Phobos unit tests. What do you all think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants