Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[issue 15229] Range-ified BigInt's Ctor to Accept Ranges of Characters #3876

Merged
merged 1 commit into from
Apr 29, 2016

Conversation

JackStouffer
Copy link
Member

  • This contains the commit from Added ReferenceBidirectionalRange to std.internal.test.dummyrange #3874 which this PR requires and was separated out for convenience
  • This is technically a breaking change because the ctor no longer errors on parsing problems, but throws
  • The decimal path requires only a forward range, but the hex path requires a bidirectional range. I would like to make it possible for people to pass in forward ranges if they are just using decimals, as bidirectional ranges are less common than forward ranges. But, I couldn't figure out how to make the signiture accept any forward range and only accept bidriectional ranges for hexs at run time.
  • For some reason I get a out of memory error on my machine for a specific unittest block. The error message is very unhelpful, so any guidance in this area would be appreciated.
  • If said unittest block is commented out, all other tests pass.

@JackStouffer JackStouffer force-pushed the bigint2 branch 2 times, most recently from c7c92d7 to 4f1d199 Compare December 18, 2015 14:37
@JackStouffer JackStouffer changed the title Range-ified BigInt's Ctor to Accept Ranges of Characters [issue 15229] Range-ified BigInt's Ctor to Accept Ranges of Characters Dec 18, 2015
@quickfur
Copy link
Member

quickfur commented Jan 6, 2016

Autotester is failing with some pretty nasty errors. Implementation bug, perhaps?

@JackStouffer
Copy link
Member Author

@quickfur See the last two bullet points

@quickfur
Copy link
Member

quickfur commented Jan 6, 2016

Does it help if you split the unittest into smaller chunks? Or is the code interdependent?

@JackStouffer
Copy link
Member Author

I will check that when I get the chance, but it makes me worried and I would like to know the root cause if possible.

@quickfur
Copy link
Member

quickfur commented Jan 6, 2016

I wasn't suggesting that as a solution, but more a way to narrow down what might be going wrong.

@JackStouffer
Copy link
Member Author

This assert is the culprit https://github.com/JackStouffer/phobos/blob/4f1d199538bbeaf47d14e700ae5915b3f9de8f69/std/bigint.d#L1151

commenting it out makes all tests pass. The only reason I can think of for it throwing an out of memory error is the toString function called via the toHex function in the test, as none of the rewritten functions allocate.

@quickfur
Copy link
Member

Thanks for the tip, and sorry for the delay.... I managed to reproduce the problem locally; I should be able to narrow it down to learn why it's happening. Definitely something fishy is going on with the code here...

return true;
}

auto len = (s.save.walkLength - first_non_zero + 15) / 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is the cause of your bug. In the while loop just before this, you already consumed all the leading zeroes, so walkLength no longer includes the zeroes in the count. However, you still subtract by first_non_zero. When the latter happens to be longer than the length of the remaining digits, a size_t underflow happens and wraps around to a very large value (to be precise, 4611686018427387903), which the following line then tries to allocate a buffer for. So the out-of-memory error should not be surprising. :-)

The fix is trivial: there is no need to subtract first_non_zero at all, since you have already discarded those zeroes in the preceding while loop. Remove that, and everything should work as expected.

(Aren't you glad there's a unittest that just happens to have a test case that triggered this bug? It would have slipped in unnoticed otherwise...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. In fact, you probably don't even need to compute first_non_zero, since the zeroes are already consumed by the loop, and you can just work with the remaining digits directly.

@JackStouffer JackStouffer force-pushed the bigint2 branch 2 times, most recently from 8792fd1 to f02e87f Compare January 15, 2016 03:21
@JackStouffer
Copy link
Member Author

Thank you for finding the root cause of the bug.

I have added a unit test for the decimal walk length bug that used to fail but with the latest changes passes.

Aren't you glad there's a unittest that just happens to have a test case that triggered this bug? It would have slipped in unnoticed otherwise

Thanks Don Clugston for your thorough unit tests!

@JackStouffer
Copy link
Member Author

There is still the issue of the forward vs bidirectional range issue I made in the original post. Here is a good solution IMO:

  • In existing code, all arguments passed to the Ctor are regular strings
  • Because of that, I can add a new Ctor that only covers forward ranges (excludes bidirectional ranges) and throws if it detects a hex value, while leaving the bidirectional Ctor option available.
  • All existing code will still work because it will go to the bidirectional option, and only new code could target the forward range Ctor

Opinions?


uint hi = biguintFromDecimal(tmp, s[firstNonZero..$]);
auto predict_length = (18 * 2 + 2 * s.save.walkLength) / 19;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are leading zeroes no longer stripped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@quickfur
Copy link
Member

Hmm. In theory, there is no need to require bidirectional ranges for fromHexString. Sure, the current code needs it because of foreach_reverse, but since you're already using walkLength, it doesn't seem too much worse to also count the number of actual hex digits in the input range (i.e., ignore any _'s).

Once you know the actual number of hex digits, say let's call it n, then you can calculate the shift position required for the first digit (n % number of hex digits per word), as well as which word in the BigInt it will end up in (n / number of hex digits per word). So you can actually iterate from front to back (i.e., require only a forward range) by decrementing the shift position and moving to the previous word in the BigInt.

@quickfur
Copy link
Member

P.S. in fact, counting actual hex digits (excluding _) will also give you a more accurate estimate of the number of words required to store the final value. So you could just replace walkLength with count using an appropriate predicate.

@quickfur
Copy link
Member

Oh actually, just noticed that by the time you get to biguintcore you've already filtered out _. So you could just use walkLength to compute the shift and word number directly.

@JackStouffer
Copy link
Member Author

Could you elaborate, or direct me to a resource that does already, on what you mean by "hex digits per word"?

@quickfur
Copy link
Member

Basically, BigInt stores its value as an array of words (currently either uint or ushort). Each word stores either 4 hex digits (ushort) or 8 hex digits (uint). So, if you are given a hex literal containing, say, n=21 digits, and let's say we're using ushort (4 digits per word), that means the final result will occupy 6 words (24 digits, basically n/4 rounded up). The first digit in the literal will end up as the 3rd digit of the 6th word (i.e., the (n%4)'th digit of the (n/4)'th word). So once you know this, you can process the digits in order, without needing bidirectionality: start with the 3rd digit of the 6th word, convert the first digit, then the 2nd digit of the 6th word, convert the second digit, etc., then the 4th digit of the 5th word for the 4th digit, etc.. When you reach the last digit you should be on the 1st digit of the 1st word.

To store a converted digit into the i'th digit of the j'th word, you just do tmp[j] |= (digit << (4*i)).

So basically, the initial scan of the forward range will determine how many hex digits you have, then you compute which digit of which word corresponds with the first digit, and then scan the forward range the 2nd time, converting each digit and writing them into the result.

@JackStouffer JackStouffer force-pushed the bigint2 branch 3 times, most recently from 58460d4 to a77d200 Compare March 22, 2016 19:04
@JackStouffer
Copy link
Member Author

@quickfur I don't have the time anymore to enhance this, but I think this is definitely an improvement over the current state of the code, so I think this should be pulled as is.

If this gets pulled, I will make an enhancement request on the issue tracker on the improvements that should be further made to this.

@quickfur
Copy link
Member

ping random Phobos reviewers @andralex @yebblies @burner @DmitryOlshansky @schveiguy
I don't have time to review this right now, but it's been sitting here for a long time. Can somebody else pick it up and see if it's merge-worthy?

// check for signs and if the string is a hex value
if (len > 1)
{
for (i = 0; i < 2; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this logic needlessly convoluted. Sign checking doesn't need to be interwoven with checking for 0x. It should be a very simple, 2-step process:

// Check sign
if (range.front == '+')
    range.popFront(); // skip '+'
else if (range.front == '-')
{
    neg = true;
    range.popFront();
}

// Check for hex prefix
if (range.save.startsWith("0x"))
    // is hex
else
   // is decimal

Furthermore, filtering out _ seems to be done prematurely. Do we really support literals of the form _-_0_x_1234_567A?? I would think that the negative sign and the 0x prefix should not allow any intervening _'s. So really, sign checking and hex prefix checking should be done first, and then _ gets filtered out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs do state "Underscores are permitted in any location", but that was a lie anyway, so I will update that.

@JackStouffer
Copy link
Member Author

@quickfur fixed

@quickfur
Copy link
Member

quickfur commented Apr 1, 2016

Oh no, autotester is failling with "invalid digit" errors from biguintcore. :-(

@JackStouffer
Copy link
Member Author

Works on my machine. Not quite sure what to do here.

@schveiguy
Copy link
Member

1731 contains nothing that would cause an exception:
https://github.com/JackStouffer/phobos/blob/bigint2/std/internal/math/biguintcore.d#L1731

WTF?

@dlang-bot
Copy link
Contributor

Fix Bugzilla Description
15229 BigInt(Range of chars) too

@JackStouffer
Copy link
Member Author

@quickfur @schveiguy Added back in supposedly redundant check, it's green now.

@schveiguy
Copy link
Member

64 bit tester now running out of memory. CC @braddr

@braddr
Copy link
Member

braddr commented Apr 6, 2016

Nothing wrong with the tester host itself, as far as I can tell. Something's pushed the memory usage up over the threshold or close enough to it to fail periodically.

@JackStouffer
Copy link
Member Author

Looks good now. Anything else need fixing?

@JackStouffer
Copy link
Member Author

Ping @quickfur @schveiguy

@JackStouffer
Copy link
Member Author

Ping. This has been sitting here ready to go for a while now.

import std.exception : enforce;
import std.conv : ConvException;

enforce!ConvException(!s.empty, "Can't initialize BigInt with an empty range");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use assert for such simple checks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JackStouffer
Copy link
Member Author

@wilzbach Fixed

// return true if OK; false if erroneous characters found
// FIXME: actually throws `ConvException` on error.
bool fromDecimalString(const(char)[] s) pure @trusted
/// return true if OK; false if erroneous characters found
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be documented?

@JackStouffer
Copy link
Member Author

@DmitryOlshansky Fixed

@DmitryOlshansky
Copy link
Member

Auto-merge toggled on

@wilzbach
Copy link
Member

Can we change the title so that the dlang bot finds this? :)

return true;
}

auto len = (s.save.walkLength + 15) / 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why we have the magic +15? If I read the below correctly (s.length/ 8) + 1 should work as we use at most 1/8 of tmp per char.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual trick of 16-1, so it rounds up correctly when divided by 16. However it is then multiplied by 4 so net total is /4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiplied by 4 so net total is /4

Imho it's 8 - see:

++partcount;
if (partcount == 8)

@DmitryOlshansky DmitryOlshansky merged commit 8c3f343 into dlang:master Apr 29, 2016
@JackStouffer
Copy link
Member Author

Thanks!

@JackStouffer JackStouffer deleted the bigint2 branch April 29, 2016 14:36
@aG0aep6G
Copy link
Contributor

This has caused a regression. immutable(BigInt)("123") doesn't work anymore. In Bugzilla: issue 17330.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants