Skip to content
This repository

functions to format and parse binary and hex strings #196

Closed
gavinking opened this Issue · 17 comments

5 participants

Gavin King Loic Rouchon Tom Bentley Tako Schotanus Stéphane Épardaud
Gavin King
Owner

This is really two features:

  1. parseInteger() currently only accepts decimal string representations
  2. we need Integer.string always produces decimal string representations

We need to also be able to handle hexadecimal and binary representations.

Arguably this functionality would be better off in ceylon.math, but since hex/bin literals are now part of the language, I guess it makes more sense here.

Loic Rouchon
Collaborator

I start to work on this, and I have a draft ceylon implementation of parseInteger(String string, Integer radix = 10) that seems to work.

But I don't really understand why this function was native.
Is that for performances reason (I don't think so)?
Is that to handle plateform dependent stuff like Long.MIN_VALUE and Long.MAX_VALUE?

Concerning the format part, is what you want a shared String format(Integer radix = 10) method on Integer class

Tom Bentley
Collaborator

@loicrouchon you're right that you need a native implementation to worry about things like:

parseInteger("1000000000000000000000000000000000")

And extra care needs to be taken with Long.MIN_VALUE (it's magnitude is bigger than MAX_VALUE's you cannot simply negate the integer you see after the - sign). So to be honest, I would keep the native, and just delegate to appropriate Java/JS APIs.

Personally I would prefer the method to be called stringOfRadix or something like that, format is a little general.

Tom Bentley
Collaborator

BTW, I have an unpushed ceylon.format knocking around on my harddrive which includes some similar things, should you be interested in helping finish it off.

Tako Schotanus
Collaborator

Personally I would prefer the method to be called stringOfRadix

And I would personally prefer format ;)

stringOfRadix I could imagine for some toplevel method, but if it's on Integer I'd say it's obvious enough what format would do and what kind of options/arguments you could expect. Again IMHO.

Tom Bentley
Collaborator

Well, we might one day want to have a Formattable interface with a format() method (indeed, I dimly recall the language module once having such a thing, probably intended to be used with String interpolation). And people might expect a method called format to be more configurable than merely handling different bases, such as thousands separators, and digit symbols (which is what ceylon.format is for). To me string is a low level kind of thing, and a variant which uses a different base is still low level, so makes sense to have a related name. But opinions clearly differ on this one.

Tako Schotanus
Collaborator

Well we could at least go for something slightly less verbose like radixString(), because the modifier of doesn't seem right, because it should really be something like With or Using. But in both cases we don't adhere to the naming rules where methods should have a verb meaning we'd get: convertToStringWithRadix() or stringifyUsingRadix(). Brrrr

Loic Rouchon
Collaborator

@tombentley I would be more incline of a format method taking a radix parameter but I understand your point and I'm not against finding a more suitable name, but none of the names proposal in that issue convinced me.

To go back to the parseInteger, there is common processing that we have to maintain both in JS and Java implementations: _ in Integer literals, factor suffix like 10k, + prefix, hex / binary notations

Concerning the MIN_VALUE part, I kept the logic of the Java implementation which is to parse the Integer as a negative one and then multiply it by -1 if it's a positive one, this covers the MIN_VALUE part.

My point is that instead of having all the parsing logic in both Java / JS implementations, I would rather have a common ceylon implementation with just native attributes for MIN_VALUE and MAX_VALUE (AFAIK those are the only backends related things we need).

Tom Bentley
Collaborator

To go back to the parseInteger, there is common processing that we have to maintain both in JS and Java
implementations: _ in Integer literals, factor suffix like 10k, + prefix, hex / binary notations

Ah yes, I admit I'd forgotten about those

Concerning the MIN_VALUE part, I kept the logic of the Java implementation which is to parse the Integer as a
negative one and then multiply it by -1 if it's a positive one, this covers the MIN_VALUE part.

Well at the moment those limits are not exposed in the language module at all, nor directly in ceylon.math. Part of the problem is knowing a good way to expose them. What's the maximum Integer in JS (which lacks a Integer type and makes do with floating point, as you probably know)? Is it the largest number we can represent? The largest number we can distinguish from it's successor? Or something else? Since it's a floating point type, it has a signed zero, and doesn't have the asymmetry of extremal values that Java's 2s complement integer types have.

It might be possible to implement something robust in the face of these platform differences, but I don't see it being at all simple. I'd love to be proved wrong though :-)

Loic Rouchon
Collaborator

It's not unusual to see algorithms based on MAX/MIN value of the Integer/Long range in Java, so I think that it would make sense to have that information in ceylon.language

For the JS part, as per ECMA specification (http://ecma262-5.com/ELS5_HTML.htm#Section_8.5) we could have for Integer MIN_VALUE = -2^53 and MAX_VALUE = 2^53-3 (2^53−2 seems to be NaN)

I have to check the spec again for clarifications of other special cases (for example,there is +0 and -0) to ensure we choose the right upper/lower bounds.

Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 ceylon implementation of parseInteger (with native java implemen…
…tation for minIntegerValue and maxIntegerValue)
a0aee74
Tom Bentley
Collaborator

M6

Loic Rouchon
Collaborator

I don't have that much time those days, but that's not the only reason why I didn't progress that much on the implementation.

I'm having troubles to implement native attributes minIntegerValue and maxIntegerValue for the javascript backend.

I didn't get any answers to that on the mailing list (https://groups.google.com/d/msg/ceylon-dev/RkMc13I1MCg/fnYE5HisXsQJ), maybe someone here will know how to do so

Loic Rouchon
Collaborator

Accoring to specifications (section 2.4.1. Numeric literals), decimal numbers may be separated by _ using group of 3 digits.
For binary it's group of 4 digits and for hexadecimal, it's 4 or 2.

My question is what behavior should we have when parsing integers in an other base than 2, 10 or 16?
For example, for 8 or other?

For now, I can see 4 possibles solutions:

  1. support only specific bases (2, 8, 10, 16, ???) in parseInteger
    Java supports parsing from base 2 to 36 so I would like to be compatible with that

  2. choose a behavior for every other bases (this is likely to add a kind of heavy logic)
    For each base we can have:
    2.1 no groups
    2.2 fix groups size
    2.3 all groups size supported

  3. relax the spec not to specify the group size
    (I don't like that one because, it would allow to have really weird grouping in input for bases with well-known conventions ex: fffff_fffff)

  4. remove all restrictions on grouping in function parseInteger but keep the restriction in the language number literals.
    This would allow to have less logic in the parseInteger (we will just ignore underscores) but it will introduce a discrepancy (do we have to care about it?) between the numeric literals and the string literals parsed by parseInteger

I think I would prefer the proposition 4 but I'm not 100% sure,

WDYT?

Gavin King
Owner
Enrique Zamudio chochos referenced this issue from a commit
Enrique Zamudio chochos Bit of work for #196 56fa1c4
Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 ceylon implementation of parseInteger (with native java implemen…
…tation for minIntegerValue and maxIntegerValue)
2544094
Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 Migration to @Ceylon(major=5) b5abe5f
Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 define min and max radix 26de611
Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 test for binary/hex parseInteger 63d7f10
Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 native js implementation of min/max Integer e1d6717
Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 parseInteger, handle digits grouping 69797f3
Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 parseInteger grouping and min/max value tests 074a2fd
Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 format integer f87a822
Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 format integer tests d857d02
Stéphane Épardaud
Owner

Why not allow grouping with no fixed size for groups? So any non-consecutive _ would be allowed for all radix:

  • 0_0_0_1
  • 00_01
  • 12_34_56
  • 123_456
  • 123_456_78
  • FF_EAB_BABE
Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 avoid to shift digits to add '-' in formatInteger 58cbb1d
Loic Rouchon loicrouchon referenced this issue from a commit in loicrouchon/ceylon.language
Loic Rouchon loicrouchon #196 add ceylondoc for min/max Integer value 2ab5532
Tom Bentley tombentley referenced this issue from a commit
Tom Bentley tombentley Merge git@github.com:loicrouchon/ceylon.language.git parse-hex-bin-in…
…teger

for #196, fixing a conflict in runtime-js/language-module.txt
b81f2c0
Gavin King gavinking closed this
Gavin King
Owner

This seems to be solved, right?

Thanks, @loicrouchon!! Appreciated.

Loic Rouchon
Collaborator

Well implementation is done, but @FroMage asked a question about how we decide to support digits grouping

Gavin King
Owner

Well, digit grouping seems to be working fine to me...

Tako Schotanus quintesse referenced this issue from a commit in quintesse/obr-merge-test
Enrique Zamudio chochos more progress on ceylon/ceylon-js ceylon/ceylon.language#196 - almost…
… there.
77dc89e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.