Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hexadecimal, octal and binary literals #382

Closed
FroMage opened this issue Aug 16, 2012 · 85 comments
Closed

Add hexadecimal, octal and binary literals #382

FroMage opened this issue Aug 16, 2012 · 85 comments

Comments

@FroMage
Copy link
Member

FroMage commented Aug 16, 2012

We can live without them, but damn that makes the code look silly. Most specs that deal with binary are defined in terms of either hexa, octal or binary numbers, which means that the code we write in Ceylon would need to have those numbers translated in decimal (first possible source of programmer error), and put the original number (hexa, octal or binary) as comments (which looks clumsy).

Case in point: UTF-8 decoding:

// 0b1100 0000 <= byte < 0b1110 0000
if(byte >= 192 && byte < 224){
}
// byte & 0x0F
Integer part1 = bytes.get() - 240;

Now, the syntax in Java is 0x prefix for hexa, 0 prefix for octal and 0b prefix for binary. I find the 0 prefix error-prone (if traditional) so I suggest we go for 0o (little "o" for Octal). I guess I'd also be fine with \x \o and \b.

The other question for those literals is that of postfix quantifiers and exponents. I just don't think they apply for those numbers, but hey, open to suggestions.

WDYT?

@gavinking
Copy link
Member

I always assumed that this was a job for single-quoted literals.

@FroMage
Copy link
Member Author

FroMage commented Aug 16, 2012

That's a possibility, but we don't know when/if those will be supported, and it looks weird that we go through all the trouble of supporting numeric literal postfixes and exponents and separators while overlooking a pretty basic and traditional feature such as this one.

I don't think '0x23f' is more readable than 0x23f either, so we're not winning by relying on single-quoted literals. What would be a good reason to not support them and rely on single-quoted literals?

@gavinking
Copy link
Member

To be honest I have no clue wtf 0x has to do with base 16 numerals or how it came to be adopted by so many programming languages. I don't think that is a tradition we should continue.

Sent from my iPhone

On Aug 16, 2012, at 8:33 AM, Stéphane Épardaud notifications@github.com wrote:

That's a possibility, but we don't know when/if those will be supported, and it looks weird that we go through all the trouble of supporting numeric literal postfixes and exponents and separators while overlooking a pretty basic and traditional feature such as this one.

I don't think '0x23f' is more readable than 0x23f either, so we're not winning by relying on single-quoted literals. What would be a good reason to not support them and rely on single-quoted literals?


Reply to this email directly or view it on GitHub.

@tombentley
Copy link
Member

Could we special-case this particular use of single quoted literals in the typechecker, just for use with the following three functions (which would be defined in the language module of course):

Integer i1 = hex('ffffff');
Integer i2 = octal('23634');
Integer i3 = binary('1111');

(Obviously something somewhere optimizes them to normal literals).

@gavinking
Copy link
Member

What about a literal format something like:

Integer i1 = 2AFF\F;
Integer i2 = 2363\7;
Integer i3 = 101011\1;

of course, \9 would be the implied default.

@tombentley
Copy link
Member

That looks horrid to my eyes.

  • I doubt someone who didn't know that syntax would easily guess what it meant.
  • Having the \ in the middle of the token makes it looks like an operator.
  • Also \7 doesn't really suggest octal to me so much as \8 might.

@FroMage
Copy link
Member Author

FroMage commented Aug 17, 2012

To be honest I have no clue wtf 0x has to do with base 16 numerals

"x" stands for "heXadecimal". That's not really confusing.

I really don't like 2AFF\F because it looks like we're escaping the last F for some reason, and I think the base should come in front so my eyes can parse the thing as a hexa literal from the start and do redo the reading when I reached the end. Also it's not easy for me to associate the last digit in a base to the base itself. I mean, \9 doesn't really tell me this is base-10. It's certainly not obvious, even though it makes sense once someone explains what it could mean.

Also it's not consistent with our other uses of \ escapes which are prefixes (\i for example).

My favourites are still 0x, 0o and 0b, or \x, \o and \b.

@FroMage
Copy link
Member Author

FroMage commented Aug 17, 2012

That looks horrid to my eyes.

Glad it's not just be being stuck in the past :)

@gavinking
Copy link
Member

I doubt someone who didn't know that syntax would easily guess what it meant.

Hah, but someone who didn't know what 0xffff meant could easily guess?!

C'mon, admit it, the only reason that convention doesn't make you want to vomit is that you're used to it.

Having the \ in the middle of the token makes it looks like an operator.

Perhaps, though \ isn't an operator in any language I know.

I mean, I don't really care what the escape/separator character is, it could be \, @, #, $, :, ::, \\, ~, whatever for all I care. It could be postfix or prefix if you prefer.

But what seems to me to be the craziest possible choice for an escape character is a leading zero. Nobody (except a computer programmer) thinks a leading zero in a numeral is significant. The 0x or 0b convention sucks the "x" and "b" into the digit string itself, making it difficult to visually parse, since our eyes aren't accustomed to treating numeric digits or letters as punctuation. To me, the difference between 0o2363 and 002363 is extremely difficult to pick. I mean, it's just not at all visually obvious what is the first numeric digit in 0xffff.

Sure, sure, we've taught ourselves how to interpret this crap, but it's still crap.

"x" stands for "heXadecimal". That's not really confusing.

OMG, it's so non-confusing that in 30 years programming I never figured this out for myself. That's perhaps the most arbitrary thing thing I've ever heard. Why not "h"?

@gavinking
Copy link
Member

Perhaps, though \ isn't an operator in any language I know.

Excuse me, that's not true. I should have said "isn't an infix operator in any language I know".

@ikasiuk
Copy link
Member

ikasiuk commented Aug 17, 2012

Excuse me, that's not true. I should have said "isn't an infix operator in any language I know".

Now you know one: BASIC! ;-)

@tombentley
Copy link
Member

Hah, but someone who didn't know what 0xffff meant could easily guess?!

I'm not particularly fond of 0x (I'm not advocating it), and I agree it isn't obvious unless you've met if before. The one thing it does have in its favour though is historical precedent: Almost any programmer who sees 0xffff will instantly recognize a hex int literal (even if they don't know what 0x 'means'). Surely catering to existing programmers with a poor but widely recognized syntax makes sense if the alternative is a new equally poor syntax that caters to pretty much no one.

What I would like is something like hex('ffffff');.

@FroMage
Copy link
Member Author

FroMage commented Aug 17, 2012

OMG, it's so non-confusing that in 30 years programming I never figured this out for myself. That's perhaps the most arbitrary thing thing I've ever heard.

Frankly the number of things you never figured out in 30 years of programming is the puzzling and most arbitrary thing ;)

Why not "h"?

My guess is that the 'h' is silent?

The x may be arbitrary but it is the strong sounds in hexa. It's also a custom that is absolutely trans-language.

http://en.wikipedia.org/wiki/Hexadecimal#Written_representation lists a lot of representations. Most use x, though some like assembly use h. Surprisingly PostScript and Bash have an interesting solution: 16#ffee and by extension 8#644 and 2#1100, but I don't find that more intuitive or readable than the 0x or \x prefixes.

OK, so you don't like the leading 0, I can understand that. So \xff \o664 and \b1100 would work for you? That would be in line with the \u string escape sequence too.

@chochos
Copy link
Member

chochos commented Aug 17, 2012

I like the idea of stating an arbitrary base, but the 12345\5 does look ugly. In any case I prefer the PostScript 5#12345, and also stating the base, not the max symbol used (so 10#12345, not 9#12345)

@gavinking
Copy link
Member

Now you know one: BASIC! ;-)

Haha, is it possible I never used integer division in my years of BASIC programming? :-)

My guess is that the 'h' is silent?

Hahahaha, true—in France at least :-)

So then this syntax is totally natural and intuitive to native speakers of French. Is that an argument for or against?

Surprisingly PostScript and Bash have an interesting solution: 16#ffee and by extension 8#644 and 2#1100.

This to me makes far more sense and is essentially similar to my proposal, except I think F is a better way to indicate base 16 than 16 because:

  • well, one character instead of two, and
  • it's a little perverse to indicate base 16 using a base-10 number.

The second criticism also applies to 2#1100 and 8#644.

So \xff \o664 and \b1100 would work for you? That would be in line with the \u string escape sequence too.

But the \x and the ff still run together without separating punctuation.

FTR, I have always found the JavaScript convention of #AAF0F0 far more easy on the eyes than 0xAAF0F0. It's a pity that we also want binary literals, or I would just say go with the leading #.

@tombentley
Copy link
Member

My problem with something like 8#644 is that I find it quite easy to gloss over the # in among all those digits, and so miss the fact that it's not a decimal integer.

Dare I suggest another alternative: x#644, o#644, b#101010?

@gavinking
Copy link
Member

My problem with something like 8#644 is that I find it quite easy to gloss over the # in among all those digits.

Well, perhaps, but it's surely harder to gloss over an infix # than over a prefix 0, right?

And, FTR, I don't see any glossing over the F in F#110066, nor do I think it's possible to mistake 1#1010110011 for anything other than a binary literal. I agree that an octal literal is a little harder to recognize.

Dare I suggest another alternative: x#644, o#644, b#101010?

While it's clear that this works well enough for the cases we've identified, and might feel a little more familiar for many programmers, it wouldn't let people express numbers in any other base. Now, of course I recognize that bases other than 10, 2, and 16 almost never occur, but it still feels strange to choose something deliberately limited to four discrete cases when there are a couple of at least arguably-as-good option that covers any natural number.

@FroMage
Copy link
Member Author

FroMage commented Aug 18, 2012

[Warning: this comment was written with humour in mind, read it as such]

Guys, the insanity has to stop right there.

Gavin should just be banned from having a say in this matter. After all, he admitted to being a complete stranger to hexadecimal litterals and he's likely never used on in his life.

Certain domains come with jargon: special language used to describe special things. This is why there are a hundred ways to call a rubber tube, depending on its size, use and what flows inside. People familiar with a domain are familiar with jargon.

People familiar with binary shit (streams, bytes, bits, bitsets, bitmasks, bitwise operations, CPUs, registers, character sets, encodings, protocols, network and all that shit) know precisely what the prefixes 0x, 0 and 0b stand for. They know it. If you don't know that, you don't need to know. Seriously.

Those people would probably understand what x# (though that's backwards to HTML/XML where it's #x) or \x mean, but they would start trying Ceylon by typing 0xff and see an error and go blog that we're so lame we don't support hexadecimal litterals. They would not (willing to bet here) even try something else, as that's just so common a convention that that's how we expect it to be. Same as if we renamed + to an add keyword/operator they would never find it in a million years.

Now, Gavin, never having used binary shit in his life, decided to reallocate the bitwise operators to sets. I find this personally insane, but I figured that if nobody else found it insane that must be me. Besides, I don't really mind calling methods on Integer that deal with bitwise operations, so I figure I'll let him do that and we'll see what comes out. Never in my life have I ever wanted to do things like xor or a negation on a Set (what does ~set even mean?), but hey, I figure Gavin must have had this need so often that he thinks it's more frequent than the same operators on bits. Whatever, I am ready to wait and see.

But F# or anything else that Gavin suggested here? That's bananas. That's crazy bat shit. That's square wheel. That's what would have happened to the world if Picasso had been an engineer. Nobody in their right might would find it by themselves, and presented with it, I would have said that's a bloody musical literal, whatever that is.

Not only that, but are we seriously going to support base-52 numbers? How the hell do you write one after the initial 52#?

it's a little perverse to indicate base 16 using a base-10 number

I hope this is a joke, right? Because how would you even guess the base if it's spelled inside its own base? What would a# mean? Even in math bases are expressed in decimal.

Hell, I'm pretty sure that in math people never use non-decimal numbers. Whereas people dealing with binary shit? All the fucking time. Every day. Seriously!

So if you go with something that isn't 0x we will have to parse it correctly and provide a meaningful error so that people can, well, not guess, but find it after trial and error. And a similar warning (or even error to be unambiguous) for 0 prefixes. Seriously. If we don't, people are never going to guess what our syntax is. And we want Ceylon to be familiar and easy, right?

Now, I already said I found the 0 prefix for octal error-prone, and besides octal numbers I've never seen in the wild outside of UNIX File modes, but even then I wouldn't rule it out as less frequent than hexadecimal literals out of hand, even though in our case we can definitely abstract those so that Ceylon users never have to deal with octal numbers. So for that reason I'm ready to break the convention and fix it so that there's a less error-prone way to deal with octal literals, such as 0o or \o or even o#.

Now, to get back to what bases we support, let's admit that non-decimal literals are only for binary shit, and so let's only support literals for binary, octal and hexadecimal. Hell if we want to get rid of octal and we never have to use it in our APIs (though using the interop/Java APIs will become slightly harder), then whatever: let's get rid of octal and support only binary and hexa. For all other bases, let's add a method String.parseInt(Integer base) or whatever.

Now, having said that, we already have an escape sequence and it's \ so \x and \b scream to be used. That's what's consistent. That's what's not going to shock people (though we still have to add meaningful parser errors for 0x, 0 and 0b). Let's just use that. Or 0x and 0b which is what people familiar with binary shit (the target demographics for those literals) and going to be expecting, so let's not fuck with them. We don't have to. We removed the octal ambiguity so those are clear. # is not a good choice, and we might end up using it for field references, we don't know yet.

Please let's stop being crazy here, and please Gavin let the binary shit to people that deal with binary ;)

@chochos
Copy link
Member

chochos commented Aug 18, 2012

Damn I've already been bitten by lack of bitwise ops yesyerday while trying to solve a Project Euler problem.

The bit about having to parse 0xff anyway just to give an error is so true...

@ikasiuk
Copy link
Member

ikasiuk commented Aug 18, 2012

Stef nailed it. And this thread is absolutely hilarious, I love it :-)

@gavinking
Copy link
Member

@FroMage Well, that's a nice enough rant, but I'm a little confused as to how I'm supposed to take it. If it's just a rant, well, funny, well done, and let's get back to the discussion of what is the best format for numeric literals in Ceylon.

But if it's more than humor, and I'm supposed to take seriously the points contained in it, then I suppose I would need to respond. Excuse me if that's not the idea, but here goes:

Gavin should just be banned from having a say in this matter. After all, he admitted to being a complete stranger to hexadecimal literals and he's likely never used on in his life.

In fact I use them all the time, for the same purpose that the overwhelming majority of developers using a high-level language use them: to represent colors using RGB. This is likely to be the most common use of hex literals in Ceylon, since Ceylon is a very high-level language, not really intended for pushing bits around. If you're planning on doing a lot of bit-pushing, you need a language like C, C++, Rust, whatever, which gives you proper direct access to memory, something we don't have in a language like Java, C#, Ceylon, JS, Smalltalk, Ruby, etc. Horses for courses.

Certain domains come with jargon: special language used to describe special things. This is why there are a hundred ways to call a rubber tube, depending on its size, use and what flows inside. People familiar with a domain are familiar with jargon.

This is a great argument, nicely expressed, that you should save up and keep for some other argument about some totally different topic. In this particular discussion it's like a poor lost 3 year old wandering around asking for its mummy. The problem is that, as proven by the wikipedia page you linked to above, there is absolutely no standard universally accepted notation or jargon in this area, and therefore we're forced to use our brains and make a choice for ourselves.

If we're looking for what the real hardcore bit-pushers use, then we're talking assembly and according to wikipedia that would be a postfix H or prefix $ for hexadecimal. (Neither or which appears to me to be a crazy notation, and either of which I would be perfectly happy with if hex were the only additional base you were asking for.)

On the other hand, if we want Ceylon to be like other high-level languages, there are a heap of precedents to choose between, including prefix # (css, modula-2), prefix & (BASIC) 16# (Ada, bash, postscript), 16r (Smalltalk, Algol) #x or #16r (Lisp).

Now, of course there is a strong tradition behind prefix 0x: unix shells (but conspicuously not the most popular one), C/C++/C#/Java, and even ML. If this format were "just or almost as good" as the other competing possibilities, then that would be reason enough to stick with it, given the popularity of these languages, and given that Ceylon is cut mostly from the same tradition.

But if it is indeed not the case that this is a "just or almost as good" syntax, then I think we should choose something better.

it's a little perverse to indicate base 16 using a base-10 number

I hope this is a joke, right? Because how would you even guess the base if it's spelled inside its own base?

Eh? I think we have a very well-known and well-defined ordering of the latin alphabet. I actually remember reciting it out loud in grade 1!

What would a# mean?

It wasn't a joke at all. A# would mean, of course, base 11.

Now, to get back to what bases we support, let's admit that non-decimal literals are only for binary shit, and so let's only support literals for binary, octal and hexadecimal.

See that's the fucking problem here. If you were asking for "just" hex literals, which are arguably somewhat general purpose, this would be something I could pretty easily rationalize. But no, you want to bloat out the numeric literal format with three (3) extra special-purpose thingys which are designed for use in a domain that Ceylon is not even an appropriate language for. I tried to rationalize the awful special-caseyness of this to myself by trying to generalize it to one slightly less special-purposey thing, and you ridicule the idea. Fine. Sometimes you can go to far in trying to abstract something, making it deserving of ridicule.

So fine, so now let me turn this around: if you can justify adding three new special purpose things here (including a special separate feature for such incredibly, incredibly rare and endangered creature as an octal numeral), then there are about 10 new types of literal that I have way, way more justification for adding. Things that developers would use orders of magnitude more often than a fucking octal literal: dates, times, URIs, module version numbers, regexes, cron patterns, etc.

So Stef, where do we draw the line? I understand that, fancying yourself a hairy-chested bit-pusher, you would find octal literals to add a little extra convenience. Well, I would personally find date and time literals extremely convenient, oh and then cron patterns would fit in very nicely with that. Shall we add them too?

(what does ~set even mean?)

FTR, it doesn't mean anything. It's a syntax error. x~y is set complement (subtraction), an operation which I perform all the fucking time, and which you probably do too, even though you might not conceptualize it like that.

@gavinking
Copy link
Member

Now, I already said I found the 0 prefix for octal error-prone, and besides octal numbers I've never seen in the wild outside of UNIX File modes [snip] So for that reason I'm ready to break the convention and fix it so that there's a less error-prone way to deal with octal literals, such as 0o or \o or even o#.

Alright, stop trying to have it both ways: do you accept that there is justification for breaking the C tradition, or don't you? If you do accept it, which is the impression I get from this passage, then what's apparent is this: that while I don't deny for a second the claim that I'm totally batshitcrazy, apparently my batshitcraziness isn't actually relevant to the discussion at hand, and we are all looking for a reasonable format for non-decimal numeric literals, that doesn't necessarily follow the tradition of C. If that's the case, then chest-thumpy rants, while great for showing off all that bit-pusher chest hair, don't actually move the discussion forward.

Therefore I ask you to look at it from my perspective: while you may not share my aversion for bullet-pointy lists of special-case language features, you surely recognize that the language definition would get completely out of hand if I added a new special-purpose syntax every time everyone wanted a minor convenience for some special usecase?

@quintesse
Copy link
Member

The only thing I find strange is choosing something like F# for the base 16 numbers. I think I would just prefer having the base defined in decimal, just as if you would call a conversion method and pass the base as an argument. Because if you really want a totally flexible system where you could write down a number in base 22 I'd rather read 22#100 than m#100 and having to figure out what the m stands for. If on the other hand we think we'll only ever use binary and hexadecimal (and possibly octal) we could just go for Stef's suggestion with \b and \x which wouldn't any new syntax to the language.

NB: I don't mind breaking C-tradition here, even though I think it's the first thing people will try when they want to try to write a hex number.

@quintesse
Copy link
Member

With respect to bit operations, I think you underestimate how many binary protocols are written in Java (or other high level languages), Gavin, this is not something you can just wish away. And I think it's much more important to have those operations perform as efficient as possible (within the limits of the language) than set operations that will most of the time be much more expensive.

So actually I think the set operations are pretty cool and I hope you'll be right and people will come to love them, but I also think we need performant bit operations (so not only as method calls on a Byte class for example).

@gavinking
Copy link
Member

The only thing I find strange is choosing something like F# for the base 16 numbers.

Fine, I thought it was nice to get the same number of characters as h# or H# or x# with a "non-arbitrary" notation. 16# is slightly more verbose. Not that I especially care...

Because if you really want a totally flexible system where you could write down a number in base 22 I'd rather read 22#100 than m#100 and having to figure out what the m stands for. If on the other hand we think we'll only ever use binary and hexadecimal (and possibly octal) we could just go for Stef's suggestion with \b and \x which wouldn't any new syntax to the language.

This is certainly a reasonable argument.

NB: I don't mind breaking C-tradition here, even though I think it's the first thing people will try when they want to try to write a hex number.

Sure, but I highly doubt that "write a hex number" is the first thing most people will try to do in Ceylon.

With respect to bit operations, I think you underestimate how many binary protocols are written in Java (or other high level languages), Gavin, this is not something you can just wish away.

Sure, but why would we rewrite this stuff in Ceylon?

And I think it's much more important to have those operations perform as efficient as possible (within the limits of the language) ... we need performant bit operations (so not only as method calls on a Byte class for example).

Certainly, but just because something is represented as a method call at the language level doesn't mean the compiler can't optimize it.

@quintesse
Copy link
Member

Sure, but why would we rewrite this stuff in Ceylon?

It's not always rewriting, the moment we have a Socket people will start writing their own binary protocols. We could always go to Java for that but personally I'd want to prevent that as much as possible. Besides people thought Java could never be used for any of that stuff either and look in what kind of situations it's being used nowadays. I'm guessing only embedded real time systems using microcontrollers can't use Java. I would hope that one day with Ceylon we can do the same.

@gavinking
Copy link
Member

So, after some reflection, what I think we should do here is just stick with my original plan for this stuff. If it's important enough to have support for hex/binary literals in Ceylon 1.0, then we should simply plan to add some level of support for single-quoted literals in 1.0. Specifically, I think we should support the following syntax:

value blue = hex '0000FF';

Where hex() is a toplevel function:

Integer hex(literal '^[0-9a-fA-F]+$' Quoted hexString) { ... }

The argument literal would be validated against the regex at compile time.

Of course, anyone who wants something less verbose can just use an import alias to write h'0000FF' or x'0000FF' or whatever. Which is actually the built-in literal format in some languages.

@FroMage
Copy link
Member Author

FroMage commented Aug 19, 2012

In fact I use them all the time, for the same purpose that the overwhelming majority of developers using a high-level language use them: to represent colors using RGB. This is likely to be the most common use of hex literals in Ceylon, since Ceylon is a very high-level language, not really intended for pushing bits around. If you're planning on doing a lot of bit-pushing, you need a language like C, C++, Rust, whatever, which gives you proper direct access to memory, something we don't have in a language like Java, C#, Ceylon, JS, Smalltalk, Ruby, etc. Horses for courses.

I hope you're not selling down Ceylon as a sort of high-level language which is only good for mental masturbation? I'm pretty sure Java and C# is used in plenty of places where they can do low-level stuff like binary operations, and they do it well and as fast as lower-level C stuff. This is why Java has added things like memory-mapped IO and non-blocking IO with select over the years: because these low-level things can be abstracted slightly higher and Java people hate to rely on native calls to do the dirty jobs.

I sure hope Ceylon will excel in the same area, and frankly I don't see why not. At least, I see no good reason why we should declare it unfit for that.

Now, of course there is a strong tradition behind prefix 0x: unix shells (but conspicuously not the most popular one), C/C++/C#/Java, and even ML. If this format were "just or almost as good" as the other competing possibilities, then that would be reason enough to stick with it, given the popularity of these languages, and given that Ceylon is cut mostly from the same tradition.

But if it is indeed not the case that this is a "just or almost as good" syntax, then I think we should choose something better.

Strawman: I'm not saying it should be 0x or nothing.

What I am saying is that our parser will have to recognize this to be friendly, to help people discover our own syntax. Do you agree with that?

What would a# mean?
It wasn't a joke at all. A# would mean, of course, base 11.

WRONG! ;) I asked about a#, which, according to my own convention of using Unicode-character-ordering for bases greater than 36 (0-9A-Z) is a base-37 number (0-9A-Za). What precisely is the convention for a base-345 number? Is ชิ้# the right notation?

So fine, so now let me turn this around: if you can justify adding three new special purpose things here (including a special separate feature for such incredibly, incredibly rare and endangered creature as an octal numeral), then there are about 10 new types of literal that I have way, way more justification for adding. Things that developers would use orders of magnitude more often than a fucking octal literal: dates, times, URIs, module version numbers, regexes, cron patterns, etc.

Wrong argument, and yes we will support those literals because they are also a problem worth solving, and the single-quoted literal thing is a great idea. But not supporting hex and binary numbers is a regression compared to C and Java. Perhaps we're OK with that, but I haven't seen a good reason thrown around yet why.

(what does ~set even mean?)
FTR, it doesn't mean anything. It's a syntax error. x~y is set complement (subtraction), an operation which I perform all the fucking time, and which you probably do too, even though you might not conceptualize it like that.

OK, that's confusing to me. Intuitively I would have used - for subtraction. ~number in bitwise operations returns all 0 bits turned to 1 and vice-versa.

Look, I already admitted we can abstract away the single use I know of octal literals in Ceylon so I'm pretty sure we don't need them. Hexa and binary we should support. I don't care at all if it's 0x 0h or \x or \h because I find those sane notations. The first because it's traditional in the tradition that we care about (C, Java), the second because it fits with our other quotation notations \i and \u.

That's where I draw the line: a trivial feature to implement and not a can of worms. For the worms, let's use the single-quoted literals.

@FroMage
Copy link
Member Author

FroMage commented Aug 19, 2012

If that's the case, then chest-thumpy rants, while great for showing off all that bit-pusher chest hair, don't actually move the discussion forward.

So what, you're the only one allowed to diatribe here? ;)

Therefore I ask you to look at it from my perspective: while you may not share my aversion for bullet-pointy lists of special-case language features, you surely recognize that the language definition would get completely out of hand if I added a new special-purpose syntax every time everyone wanted a minor convenience for some special usecase?

Come on. Binary and hex literals can hardly be called either of those things. You know it.

@FroMage
Copy link
Member Author

FroMage commented Aug 19, 2012

So, after some reflection, what I think we should do here is just stick with my original plan for this stuff. If it's important enough to have support for hex/binary literals in Ceylon 1.0, then we should simply plan to add some level of support for single-quoted literals in 1.0.

I suppose I could agree to that, though I find the syntax confusing, for lack of parenthesis around the function call… And we could optimise it properly. I find it ironic that you would push the regex syntax (which I love) while you told me many times how much you hated it.

I still don't see why you think that numeric exponents are more important than hexa and binary literals, although I'm pretty sure that you decided they were to you, but I'm ready to let this one pass because I don't care enough.

What I'm convinced about though, is that if we don't support the traditional notation we must parse it and give appropriate error messages and quick-fixes.

@FroMage
Copy link
Member Author

FroMage commented Aug 30, 2012

  • We will use @ for attribute references. Something like "hello".size@ or "hello".@SiZe.

I find "hello"@size more natural, myself.

  • A hex literal is written #fa10 or #FA10.

Sold!

  • A binary literal is written $10101100.

Sold!

if we go with the above proposal, I suppose regularity would argue for a syntax like: "\{#00E5}ngstr\{#00F6}ms"

I don't see why, unless we allow non-hex numbers in there.

BTW, we do need the braces, but we could have made the number of digits free from 1-8. So that \{FF} is shorter to type. Most unicode characters don't use the first three bytes in a 8 byte code point, I think they're not even allocated. So you'll never see \{1234ffff} it's always going to be up to \{0004ffff} in practice, so allowing the first zeros to be omitted would be nice, and easy with the braces.

@gavinking
Copy link
Member

I find "hello"@size more natural, myself.

Strange. Now that someone else is proposing this, I actually find it quite significantly non-ugly. It has a very decent chance of growing on me. Go figure.

Sold!
Sold!

OK, phew!

I don't see why, unless we allow non-hex numbers in there.

So I don't think anyone is going to want to express unicode characters in binary. Is it likely that they would want to express them in decimal?

Oh, here's a good argument: eventually it would be nice to do like python and support unicode character names, and the following syntax would be really natural for that:

"\{LATIN CAPITAL LETTER C}eylon"

@FroMage
Copy link
Member Author

FroMage commented Aug 30, 2012

"\{LATIN CAPITAL LETTER C}eylon"

Well, is the name a unique ID? Do you have a list of mappings?

@gavinking
Copy link
Member

Well, is the name a unique ID?

Yes.

Do you have a list of mappings?

In the unicode spec. Unfortunately I don't have it as a text file we can distribute with the language module :-(

@ikasiuk
Copy link
Member

ikasiuk commented Aug 30, 2012

That use of #, $ and @ looks pretty good!

BTW: On a German keyboard $ is much easier to type than a backtick.

@FroMage
Copy link
Member Author

FroMage commented Aug 30, 2012

Perhaps there's support for that in Java and JavaScript?

It's interesting, though. So we'd have \{#ff} or \{NAME OF CHAR} but no decimal literals? I can like it, but won't people try to stuff decimals in there anyways?

@FroMage
Copy link
Member Author

FroMage commented Aug 30, 2012

BTW: On a German keyboard $ is much easier to type than a backtick

This is SO true on querty too. Can we stick backticks to single-quoted literals and take back the single quote for character literals?

It hurt my hand so much typing those backticks.

@ikasiuk
Copy link
Member

ikasiuk commented Aug 30, 2012

This is SO true on querty too. Can we stick backticks to single-quoted literals and take back the single quote for character literals?

That's presuming that character literals will be used more often than single-quoted literals in Ceylon. And I'm absolutely not sure that that will be the case!

@FroMage
Copy link
Member Author

FroMage commented Aug 30, 2012

Perhaps you're right. But my HTTP, JSON and URI parsers disagree with you ;)

@gavinking
Copy link
Member

It's interesting, though. So we'd have \{#ff} or \{NAME OF CHAR} but no decimal literals? I can like it, but won't people try to stuff decimals in there anyways?

I have no problem with also allowing decimals. I don't believe that unicode character names start with digits.

That's presuming that character literals will be used more often than single-quoted literals in Ceylon. And I'm absolutely not sure about that that will be the case!

100% Agreed!

@RossTate
Copy link
Member

Mainly because I like to reserve characters in case we ever want them in the future:

Instead of $01010101 can we do 1#01010101? Similarly 7# for octal and F# for hex (though we'll presume that's the default so # would work just as well). Clearly we can extend this if ever we want, such as 9# for explicit decimal.

@gavinking
Copy link
Member

The thinking here was that $ is not a very useful character for anything important because it's hard for some people to type, and because it's hard to think of noncryptic uses for it. So we're not spending a very valuable thing here. On the other hand, for cases where binary literals really are useful, $1011 simply looks less ugly and less complex that 2#1011.

Sent from my iPhone

On Aug 30, 2012, at 4:35 PM, Ross Tate notifications@github.com wrote:

Mainly because I like to reserve characters in case we ever want them in the future:

Instead of $01010101 can we do 1#01010101? Similarly 7# for octal and F# for hex (though we'll presume that's the default so # would work just as well). Clearly we can extend this if ever we want, such as 9# for explicit decimal.


Reply to this email directly or view it on GitHub.

@quintesse
Copy link
Member

@RossTate Personally I never really liked that notation, it's "octal" but we use a 7, it's decimal but we use a 9. Oh here we have an M... so what's that again? (counting under his breath A1, B2, .... M16, aah ok so it's base 25!)

@quintesse
Copy link
Member

Hmm, so how about 123 being decimal (and short for 10#123), #FFAA being hex (and short for 16#FFAA) and just explicitly specify it for any other radix? (so binary would be 2#01010101, maybe not as nice as $01010101 but binary is probably much less used than hex anyway and it does free the $ for other things).
(BTW I'm perfectly fine with the latest suggestion of $ for binary and # for hex, just wanted to give an alternative that saves us the $)

@RossTate
Copy link
Member

So my impression is that, except for hex and decimal (and less often binary), these are not really used to encode numbers. Rather they are used essentially as a DSL to encode a list of cases. Thus I imagine the programmer having some table next to them or in their had saying a maps to "magenta" or whatever. That table has a last entry, say c, so they say c#93c0. Then to translate such an Integer, say config, they do config.parseRadix(c#c) and get an Integer[]. They don't have to figure out that c corresponds to base 14 and do config.parseRadix(14).

Regardless of whether you'd rather do 14# or c#, I'd prefer to save $ just in case.

Shoot, if this is a common use, then you could have c#93c0 make the appropriate base 14 Integer, and have c$93c0 translate to the sequence [9, 3, 13, 0]. They could then take their type, say Config, and write a function Config toConfig(Integer), then do c$93c0.map(toConfig) to get a Config[]. Note that this also gets around the problem of leading zeros in this setting.

gavinking added a commit that referenced this issue Jan 6, 2013
change format of unicode escape to \{#XXXX}
gavinking added a commit that referenced this issue Jan 6, 2013
change format of unicode escape to \{#XXXX}
@gavinking
Copy link
Member

I have implemented support for #FF0001AA, #FF00_01AA, $00101011, and $0010_1011. The format for unicode character escapes is now \{#221E}, in order to accommodate future support for decimal or named unicode characters.

I don't want to hear a single peep out of you guys or I swear I'll roll it back!

@quintesse
Copy link
Member

Peep?

@chochos
Copy link
Member

chochos commented Jan 6, 2013

Great, so can we get rid of the fugly hex and bin functions to.handle those literals in the backends?

@gavinking
Copy link
Member

Great, so can we get rid of the fugly hex and bin functions to.handle those literals in the backends?

Yes. Please.

chochos added a commit to ceylon/ceylon.language that referenced this issue Jan 7, 2013
chochos added a commit to ceylon/ceylon-js that referenced this issue Jan 7, 2013
@chochos
Copy link
Member

chochos commented Jan 7, 2013

Stuff like #ff_ff is still unsupported (typechecker throws an error)

@gavinking
Copy link
Member

@chochos I figured groups of 4 digits made more sense, but is groups of 2 better?

@chochos
Copy link
Member

chochos commented Jan 7, 2013

Ah I didn't test with groups of 4 digits. It was just some arbitrary test; I suppose for binary it's also 4 or 8 digits, right?

chochos added a commit to ceylon/ceylon.language that referenced this issue Jan 7, 2013
@gavinking
Copy link
Member

It is groups of 4 digits for both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants