New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hexadecimal, octal and binary literals #382
Comments
I always assumed that this was a job for single-quoted literals. |
That's a possibility, but we don't know when/if those will be supported, and it looks weird that we go through all the trouble of supporting numeric literal postfixes and exponents and separators while overlooking a pretty basic and traditional feature such as this one. I don't think |
To be honest I have no clue wtf 0x has to do with base 16 numerals or how it came to be adopted by so many programming languages. I don't think that is a tradition we should continue. Sent from my iPhone On Aug 16, 2012, at 8:33 AM, Stéphane Épardaud notifications@github.com wrote:
|
Could we special-case this particular use of single quoted literals in the typechecker, just for use with the following three functions (which would be defined in the language module of course):
(Obviously something somewhere optimizes them to normal literals). |
What about a literal format something like:
of course, |
That looks horrid to my eyes.
|
"x" stands for "heXadecimal". That's not really confusing. I really don't like Also it's not consistent with our other uses of My favourites are still |
Glad it's not just be being stuck in the past :) |
Hah, but someone who didn't know what C'mon, admit it, the only reason that convention doesn't make you want to vomit is that you're used to it.
Perhaps, though I mean, I don't really care what the escape/separator character is, it could be But what seems to me to be the craziest possible choice for an escape character is a leading zero. Nobody (except a computer programmer) thinks a leading zero in a numeral is significant. The Sure, sure, we've taught ourselves how to interpret this crap, but it's still crap.
OMG, it's so non-confusing that in 30 years programming I never figured this out for myself. That's perhaps the most arbitrary thing thing I've ever heard. Why not "h"? |
Excuse me, that's not true. I should have said "isn't an infix operator in any language I know". |
Now you know one: BASIC! ;-) |
I'm not particularly fond of What I would like is something like |
Frankly the number of things you never figured out in 30 years of programming is the puzzling and most arbitrary thing ;)
My guess is that the 'h' is silent? The http://en.wikipedia.org/wiki/Hexadecimal#Written_representation lists a lot of representations. Most use OK, so you don't like the leading |
I like the idea of stating an arbitrary base, but the |
Haha, is it possible I never used integer division in my years of BASIC programming? :-)
Hahahaha, true—in France at least :-) So then this syntax is totally natural and intuitive to native speakers of French. Is that an argument for or against?
This to me makes far more sense and is essentially similar to my proposal, except I think
The second criticism also applies to
But the FTR, I have always found the JavaScript convention of |
My problem with something like Dare I suggest another alternative: |
Well, perhaps, but it's surely harder to gloss over an infix And, FTR, I don't see any glossing over the
While it's clear that this works well enough for the cases we've identified, and might feel a little more familiar for many programmers, it wouldn't let people express numbers in any other base. Now, of course I recognize that bases other than 10, 2, and 16 almost never occur, but it still feels strange to choose something deliberately limited to four discrete cases when there are a couple of at least arguably-as-good option that covers any natural number. |
[Warning: this comment was written with humour in mind, read it as such] Guys, the insanity has to stop right there. Gavin should just be banned from having a say in this matter. After all, he admitted to being a complete stranger to hexadecimal litterals and he's likely never used on in his life. Certain domains come with jargon: special language used to describe special things. This is why there are a hundred ways to call a rubber tube, depending on its size, use and what flows inside. People familiar with a domain are familiar with jargon. People familiar with binary shit (streams, bytes, bits, bitsets, bitmasks, bitwise operations, CPUs, registers, character sets, encodings, protocols, network and all that shit) know precisely what the prefixes Those people would probably understand what Now, Gavin, never having used binary shit in his life, decided to reallocate the bitwise operators to sets. I find this personally insane, but I figured that if nobody else found it insane that must be me. Besides, I don't really mind calling methods on But Not only that, but are we seriously going to support base-52 numbers? How the hell do you write one after the initial
I hope this is a joke, right? Because how would you even guess the base if it's spelled inside its own base? What would Hell, I'm pretty sure that in math people never use non-decimal numbers. Whereas people dealing with binary shit? All the fucking time. Every day. Seriously! So if you go with something that isn't Now, I already said I found the Now, to get back to what bases we support, let's admit that non-decimal literals are only for binary shit, and so let's only support literals for binary, octal and hexadecimal. Hell if we want to get rid of octal and we never have to use it in our APIs (though using the interop/Java APIs will become slightly harder), then whatever: let's get rid of octal and support only binary and hexa. For all other bases, let's add a method Now, having said that, we already have an escape sequence and it's Please let's stop being crazy here, and please Gavin let the binary shit to people that deal with binary ;) |
Damn I've already been bitten by lack of bitwise ops yesyerday while trying to solve a Project Euler problem. The bit about having to parse 0xff anyway just to give an error is so true... |
Stef nailed it. And this thread is absolutely hilarious, I love it :-) |
@FroMage Well, that's a nice enough rant, but I'm a little confused as to how I'm supposed to take it. If it's just a rant, well, funny, well done, and let's get back to the discussion of what is the best format for numeric literals in Ceylon. But if it's more than humor, and I'm supposed to take seriously the points contained in it, then I suppose I would need to respond. Excuse me if that's not the idea, but here goes:
In fact I use them all the time, for the same purpose that the overwhelming majority of developers using a high-level language use them: to represent colors using RGB. This is likely to be the most common use of hex literals in Ceylon, since Ceylon is a very high-level language, not really intended for pushing bits around. If you're planning on doing a lot of bit-pushing, you need a language like C, C++, Rust, whatever, which gives you proper direct access to memory, something we don't have in a language like Java, C#, Ceylon, JS, Smalltalk, Ruby, etc. Horses for courses.
This is a great argument, nicely expressed, that you should save up and keep for some other argument about some totally different topic. In this particular discussion it's like a poor lost 3 year old wandering around asking for its mummy. The problem is that, as proven by the wikipedia page you linked to above, there is absolutely no standard universally accepted notation or jargon in this area, and therefore we're forced to use our brains and make a choice for ourselves. If we're looking for what the real hardcore bit-pushers use, then we're talking assembly and according to wikipedia that would be a postfix On the other hand, if we want Ceylon to be like other high-level languages, there are a heap of precedents to choose between, including prefix Now, of course there is a strong tradition behind prefix But if it is indeed not the case that this is a "just or almost as good" syntax, then I think we should choose something better.
Eh? I think we have a very well-known and well-defined ordering of the latin alphabet. I actually remember reciting it out loud in grade 1!
It wasn't a joke at all.
See that's the fucking problem here. If you were asking for "just" hex literals, which are arguably somewhat general purpose, this would be something I could pretty easily rationalize. But no, you want to bloat out the numeric literal format with three (3) extra special-purpose thingys which are designed for use in a domain that Ceylon is not even an appropriate language for. I tried to rationalize the awful special-caseyness of this to myself by trying to generalize it to one slightly less special-purposey thing, and you ridicule the idea. Fine. Sometimes you can go to far in trying to abstract something, making it deserving of ridicule. So fine, so now let me turn this around: if you can justify adding three new special purpose things here (including a special separate feature for such incredibly, incredibly rare and endangered creature as an octal numeral), then there are about 10 new types of literal that I have way, way more justification for adding. Things that developers would use orders of magnitude more often than a fucking octal literal: dates, times, URIs, module version numbers, regexes, cron patterns, etc. So Stef, where do we draw the line? I understand that, fancying yourself a hairy-chested bit-pusher, you would find octal literals to add a little extra convenience. Well, I would personally find date and time literals extremely convenient, oh and then cron patterns would fit in very nicely with that. Shall we add them too?
FTR, it doesn't mean anything. It's a syntax error. |
Alright, stop trying to have it both ways: do you accept that there is justification for breaking the C tradition, or don't you? If you do accept it, which is the impression I get from this passage, then what's apparent is this: that while I don't deny for a second the claim that I'm totally batshitcrazy, apparently my batshitcraziness isn't actually relevant to the discussion at hand, and we are all looking for a reasonable format for non-decimal numeric literals, that doesn't necessarily follow the tradition of C. If that's the case, then chest-thumpy rants, while great for showing off all that bit-pusher chest hair, don't actually move the discussion forward. Therefore I ask you to look at it from my perspective: while you may not share my aversion for bullet-pointy lists of special-case language features, you surely recognize that the language definition would get completely out of hand if I added a new special-purpose syntax every time everyone wanted a minor convenience for some special usecase? |
The only thing I find strange is choosing something like NB: I don't mind breaking C-tradition here, even though I think it's the first thing people will try when they want to try to write a hex number. |
With respect to bit operations, I think you underestimate how many binary protocols are written in Java (or other high level languages), Gavin, this is not something you can just wish away. And I think it's much more important to have those operations perform as efficient as possible (within the limits of the language) than set operations that will most of the time be much more expensive. So actually I think the set operations are pretty cool and I hope you'll be right and people will come to love them, but I also think we need performant bit operations (so not only as method calls on a Byte class for example). |
Fine, I thought it was nice to get the same number of characters as
This is certainly a reasonable argument.
Sure, but I highly doubt that "write a hex number" is the first thing most people will try to do in Ceylon.
Sure, but why would we rewrite this stuff in Ceylon?
Certainly, but just because something is represented as a method call at the language level doesn't mean the compiler can't optimize it. |
It's not always rewriting, the moment we have a |
So, after some reflection, what I think we should do here is just stick with my original plan for this stuff. If it's important enough to have support for hex/binary literals in Ceylon 1.0, then we should simply plan to add some level of support for single-quoted literals in 1.0. Specifically, I think we should support the following syntax:
Where
The argument literal would be validated against the regex at compile time. Of course, anyone who wants something less verbose can just use an |
I hope you're not selling down Ceylon as a sort of high-level language which is only good for mental masturbation? I'm pretty sure Java and C# is used in plenty of places where they can do low-level stuff like binary operations, and they do it well and as fast as lower-level C stuff. This is why Java has added things like memory-mapped IO and non-blocking IO with I sure hope Ceylon will excel in the same area, and frankly I don't see why not. At least, I see no good reason why we should declare it unfit for that.
Strawman: I'm not saying it should be What I am saying is that our parser will have to recognize this to be friendly, to help people discover our own syntax. Do you agree with that?
WRONG! ;) I asked about
Wrong argument, and yes we will support those literals because they are also a problem worth solving, and the single-quoted literal thing is a great idea. But not supporting hex and binary numbers is a regression compared to C and Java. Perhaps we're OK with that, but I haven't seen a good reason thrown around yet why.
OK, that's confusing to me. Intuitively I would have used Look, I already admitted we can abstract away the single use I know of octal literals in Ceylon so I'm pretty sure we don't need them. Hexa and binary we should support. I don't care at all if it's That's where I draw the line: a trivial feature to implement and not a can of worms. For the worms, let's use the single-quoted literals. |
So what, you're the only one allowed to diatribe here? ;)
Come on. Binary and hex literals can hardly be called either of those things. You know it. |
I suppose I could agree to that, though I find the syntax confusing, for lack of parenthesis around the function call… And we could optimise it properly. I find it ironic that you would push the regex syntax (which I love) while you told me many times how much you hated it. I still don't see why you think that numeric exponents are more important than hexa and binary literals, although I'm pretty sure that you decided they were to you, but I'm ready to let this one pass because I don't care enough. What I'm convinced about though, is that if we don't support the traditional notation we must parse it and give appropriate error messages and quick-fixes. |
I find
Sold!
Sold!
I don't see why, unless we allow non-hex numbers in there. BTW, we do need the braces, but we could have made the number of digits free from 1-8. So that |
Strange. Now that someone else is proposing this, I actually find it quite significantly non-ugly. It has a very decent chance of growing on me. Go figure.
OK, phew!
So I don't think anyone is going to want to express unicode characters in binary. Is it likely that they would want to express them in decimal? Oh, here's a good argument: eventually it would be nice to do like python and support unicode character names, and the following syntax would be really natural for that:
|
Well, is the name a unique ID? Do you have a list of mappings? |
Yes.
In the unicode spec. Unfortunately I don't have it as a text file we can distribute with the language module :-( |
That use of BTW: On a German keyboard |
Perhaps there's support for that in Java and JavaScript? It's interesting, though. So we'd have |
This is SO true on querty too. Can we stick backticks to single-quoted literals and take back the single quote for character literals? It hurt my hand so much typing those backticks. |
That's presuming that character literals will be used more often than single-quoted literals in Ceylon. And I'm absolutely not sure that that will be the case! |
Perhaps you're right. But my HTTP, JSON and URI parsers disagree with you ;) |
I have no problem with also allowing decimals. I don't believe that unicode character names start with digits.
100% Agreed! |
Mainly because I like to reserve characters in case we ever want them in the future: Instead of |
The thinking here was that $ is not a very useful character for anything important because it's hard for some people to type, and because it's hard to think of noncryptic uses for it. So we're not spending a very valuable thing here. On the other hand, for cases where binary literals really are useful, $1011 simply looks less ugly and less complex that 2#1011. Sent from my iPhone On Aug 30, 2012, at 4:35 PM, Ross Tate notifications@github.com wrote:
|
@RossTate Personally I never really liked that notation, it's "octal" but we use a 7, it's decimal but we use a 9. Oh here we have an M... so what's that again? (counting under his breath A1, B2, .... M16, aah ok so it's base 25!) |
Hmm, so how about |
So my impression is that, except for hex and decimal (and less often binary), these are not really used to encode numbers. Rather they are used essentially as a DSL to encode a list of cases. Thus I imagine the programmer having some table next to them or in their had saying Regardless of whether you'd rather do Shoot, if this is a common use, then you could have |
change format of unicode escape to \{#XXXX}
change format of unicode escape to \{#XXXX}
I have implemented support for I don't want to hear a single peep out of you guys or I swear I'll roll it back! |
Peep? |
Great, so can we get rid of the fugly hex and bin functions to.handle those literals in the backends? |
Yes. Please. |
Stuff like |
@chochos I figured groups of 4 digits made more sense, but is groups of 2 better? |
Ah I didn't test with groups of 4 digits. It was just some arbitrary test; I suppose for binary it's also 4 or 8 digits, right? |
It is groups of 4 digits for both. |
We can live without them, but damn that makes the code look silly. Most specs that deal with binary are defined in terms of either hexa, octal or binary numbers, which means that the code we write in Ceylon would need to have those numbers translated in decimal (first possible source of programmer error), and put the original number (hexa, octal or binary) as comments (which looks clumsy).
Case in point: UTF-8 decoding:
Now, the syntax in Java is
0x
prefix for hexa,0
prefix for octal and0b
prefix for binary. I find the0
prefix error-prone (if traditional) so I suggest we go for0o
(little "o" for Octal). I guess I'd also be fine with\x
\o
and\b
.The other question for those literals is that of postfix quantifiers and exponents. I just don't think they apply for those numbers, but hey, open to suggestions.
WDYT?
The text was updated successfully, but these errors were encountered: