Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

functions to format and parse binary and hex strings #196

Closed
gavinking opened this Issue Feb 5, 2013 · 17 comments

Comments

Projects
None yet
5 participants
Owner

gavinking commented Feb 5, 2013

This is really two features:

  1. parseInteger() currently only accepts decimal string representations
  2. we need Integer.string always produces decimal string representations

We need to also be able to handle hexadecimal and binary representations.

Arguably this functionality would be better off in ceylon.math, but since hex/bin literals are now part of the language, I guess it makes more sense here.

Member

loicrouchon commented Feb 25, 2013

I start to work on this, and I have a draft ceylon implementation of parseInteger(String string, Integer radix = 10) that seems to work.

But I don't really understand why this function was native.
Is that for performances reason (I don't think so)?
Is that to handle plateform dependent stuff like Long.MIN_VALUE and Long.MAX_VALUE?

Concerning the format part, is what you want a shared String format(Integer radix = 10) method on Integer class

Member

tombentley commented Feb 25, 2013

@loicrouchon you're right that you need a native implementation to worry about things like:

parseInteger("1000000000000000000000000000000000")

And extra care needs to be taken with Long.MIN_VALUE (it's magnitude is bigger than MAX_VALUE's you cannot simply negate the integer you see after the - sign). So to be honest, I would keep the native, and just delegate to appropriate Java/JS APIs.

Personally I would prefer the method to be called stringOfRadix or something like that, format is a little general.

Member

tombentley commented Feb 25, 2013

BTW, I have an unpushed ceylon.format knocking around on my harddrive which includes some similar things, should you be interested in helping finish it off.

Owner

quintesse commented Feb 25, 2013

Personally I would prefer the method to be called stringOfRadix

And I would personally prefer format ;)

stringOfRadix I could imagine for some toplevel method, but if it's on Integer I'd say it's obvious enough what format would do and what kind of options/arguments you could expect. Again IMHO.

Member

tombentley commented Feb 25, 2013

Well, we might one day want to have a Formattable interface with a format() method (indeed, I dimly recall the language module once having such a thing, probably intended to be used with String interpolation). And people might expect a method called format to be more configurable than merely handling different bases, such as thousands separators, and digit symbols (which is what ceylon.format is for). To me string is a low level kind of thing, and a variant which uses a different base is still low level, so makes sense to have a related name. But opinions clearly differ on this one.

Owner

quintesse commented Feb 25, 2013

Well we could at least go for something slightly less verbose like radixString(), because the modifier of doesn't seem right, because it should really be something like With or Using. But in both cases we don't adhere to the naming rules where methods should have a verb meaning we'd get: convertToStringWithRadix() or stringifyUsingRadix(). Brrrr

Member

loicrouchon commented Feb 25, 2013

@tombentley I would be more incline of a format method taking a radix parameter but I understand your point and I'm not against finding a more suitable name, but none of the names proposal in that issue convinced me.

To go back to the parseInteger, there is common processing that we have to maintain both in JS and Java implementations: _ in Integer literals, factor suffix like 10k, + prefix, hex / binary notations

Concerning the MIN_VALUE part, I kept the logic of the Java implementation which is to parse the Integer as a negative one and then multiply it by -1 if it's a positive one, this covers the MIN_VALUE part.

My point is that instead of having all the parsing logic in both Java / JS implementations, I would rather have a common ceylon implementation with just native attributes for MIN_VALUE and MAX_VALUE (AFAIK those are the only backends related things we need).

Member

tombentley commented Feb 25, 2013

To go back to the parseInteger, there is common processing that we have to maintain both in JS and Java
implementations: _ in Integer literals, factor suffix like 10k, + prefix, hex / binary notations

Ah yes, I admit I'd forgotten about those

Concerning the MIN_VALUE part, I kept the logic of the Java implementation which is to parse the Integer as a
negative one and then multiply it by -1 if it's a positive one, this covers the MIN_VALUE part.

Well at the moment those limits are not exposed in the language module at all, nor directly in ceylon.math. Part of the problem is knowing a good way to expose them. What's the maximum Integer in JS (which lacks a Integer type and makes do with floating point, as you probably know)? Is it the largest number we can represent? The largest number we can distinguish from it's successor? Or something else? Since it's a floating point type, it has a signed zero, and doesn't have the asymmetry of extremal values that Java's 2s complement integer types have.

It might be possible to implement something robust in the face of these platform differences, but I don't see it being at all simple. I'd love to be proved wrong though :-)

Member

loicrouchon commented Feb 26, 2013

It's not unusual to see algorithms based on MAX/MIN value of the Integer/Long range in Java, so I think that it would make sense to have that information in ceylon.language

For the JS part, as per ECMA specification (http://ecma262-5.com/ELS5_HTML.htm#Section_8.5) we could have for Integer MIN_VALUE = -2^53 and MAX_VALUE = 2^53-3 (2^53−2 seems to be NaN)

I have to check the spec again for clarifications of other special cases (for example,there is +0 and -0) to ensure we choose the right upper/lower bounds.

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Mar 3, 2013

#196 ceylon implementation of parseInteger (with native java implemen…
…tation for minIntegerValue and maxIntegerValue)
Member

tombentley commented Mar 6, 2013

M6

Member

loicrouchon commented Mar 14, 2013

I don't have that much time those days, but that's not the only reason why I didn't progress that much on the implementation.

I'm having troubles to implement native attributes minIntegerValue and maxIntegerValue for the javascript backend.

I didn't get any answers to that on the mailing list (https://groups.google.com/d/msg/ceylon-dev/RkMc13I1MCg/fnYE5HisXsQJ), maybe someone here will know how to do so

Member

loicrouchon commented Mar 30, 2013

Accoring to specifications (section 2.4.1. Numeric literals), decimal numbers may be separated by _ using group of 3 digits.
For binary it's group of 4 digits and for hexadecimal, it's 4 or 2.

My question is what behavior should we have when parsing integers in an other base than 2, 10 or 16?
For example, for 8 or other?

For now, I can see 4 possibles solutions:

  1. support only specific bases (2, 8, 10, 16, ???) in parseInteger
    Java supports parsing from base 2 to 36 so I would like to be compatible with that
  2. choose a behavior for every other bases (this is likely to add a kind of heavy logic)
    For each base we can have:
    2.1 no groups
    2.2 fix groups size
    2.3 all groups size supported
  3. relax the spec not to specify the group size
    (I don't like that one because, it would allow to have really weird grouping in input for bases with well-known conventions ex: fffff_fffff)
  4. remove all restrictions on grouping in function parseInteger but keep the restriction in the language number literals.
    This would allow to have less logic in the parseInteger (we will just ignore underscores) but it will introduce a discrepancy (do we have to care about it?) between the numeric literals and the string literals parsed by parseInteger

I think I would prefer the proposition 4 but I'm not 100% sure,

WDYT?

Owner

gavinking commented Mar 30, 2013

I would say have parseInteger() only support _ grouping for bases 2, 10, 16.

On Sat, Mar 30, 2013 at 9:17 PM, Loic Rouchon notifications@github.com wrote:

Accoring to specifications (section 2.4.1. Numeric literals), decimal
numbers may be separated by _ using group of 3 digits.
For binary it's group of 4 digits and for hexadecimal, it's 4 or 2.

My question is what behavior should we have when parsing integers in an
other base than 2, 10 or 16?
For example, for 8 or other?

For now, I can see 4 possibles solutions:

support only specific bases (2, 8, 10, 16, ???) in parseInteger
Java supports parsing from base 2 to 36 so I would like to be compatible
with that

choose a behavior for every other bases (this is likely to add a kind of
heavy logic)
For each base we can have:
2.1 no groups
2.2 fix groups size
2.3 all groups size supported

relax the spec not to specify the group size
(I don't like that one because, it would allow to have really weird
grouping in input for bases with well-known conventions ex: fffff_fffff)

remove all restrictions on grouping in function parseInteger but keep the
restriction in the language number literals.
This would allow to have less logic in the parseInteger (we will just
ignore underscores) but it will introduce a discrepancy (do we have to care
about it?) between the numeric literals and the string literals parsed by
parseInteger

I think I would prefer the proposition 4 but I'm not 100% sure,

WDYT?


Reply to this email directly or view it on GitHub.

Gavin King
gavin.king@gmail.com
http://profiles.google.com/gavin.king
http://ceylon-lang.org
http://hibernate.org
http://seamframework.org

chochos added a commit that referenced this issue Apr 25, 2013

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Jul 7, 2013

#196 ceylon implementation of parseInteger (with native java implemen…
…tation for minIntegerValue and maxIntegerValue)

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Jul 7, 2013

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Jul 7, 2013

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Jul 7, 2013

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Jul 7, 2013

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Jul 7, 2013

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Jul 7, 2013

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Jul 7, 2013

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Jul 7, 2013

Owner

FroMage commented Jul 8, 2013

Why not allow grouping with no fixed size for groups? So any non-consecutive _ would be allowed for all radix:

  • 0_0_0_1
  • 00_01
  • 12_34_56
  • 123_456
  • 123_456_78
  • FF_EAB_BABE

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Jul 8, 2013

loicrouchon added a commit to loicrouchon/ceylon.language that referenced this issue Jul 8, 2013

tombentley added a commit that referenced this issue Jul 11, 2013

Merge git@github.com:loicrouchon/ceylon.language.git parse-hex-bin-in…
…teger

for #196, fixing a conflict in runtime-js/language-module.txt
Owner

gavinking commented Jul 31, 2013

This seems to be solved, right?

Thanks, @loicrouchon!! Appreciated.

@gavinking gavinking closed this Jul 31, 2013

@ghost ghost assigned loicrouchon Jul 31, 2013

Member

loicrouchon commented Jul 31, 2013

Well implementation is done, but @FroMage asked a question about how we decide to support digits grouping

Owner

gavinking commented Jul 31, 2013

Well, digit grouping seems to be working fine to me...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment