Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Literal strings (without interpolation) #11567

Closed
wavexx opened this issue Jun 3, 2015 · 25 comments
Closed

Literal strings (without interpolation) #11567

wavexx opened this issue Jun 3, 2015 · 25 comments
Labels
strings "Strings!"

Comments

@wavexx
Copy link
Contributor

wavexx commented Jun 3, 2015

Am I wrong, or the is currently no way/macro in Base to enter a string without interpolation? I see @b_str, @r_str and @v_str.

I would definitely add some standard macro for that. Maybe @l_str for literal?

@malmaud
Copy link
Contributor

malmaud commented Jun 3, 2015

While we're at it, I wouldn't mind

macro u_str(s) UTF8String(s) end

@ScottPJones
Copy link
Contributor

👍 to both ideas...

@cdsousa
Copy link
Contributor

cdsousa commented Jun 4, 2015

@wavexx
Copy link
Contributor Author

wavexx commented Jun 4, 2015

Yes, I saw that post. More than one package came up with the same solution, which is why something in base is warranted. I one of my package (poyglot.jl), using inline Perl code is quite tedious due to that.

I would avoid "R" if possible, even though it would feel familiar to python, since "r" is already used for regex. "l" for literal sounds good to me.

On top of that, l_str would allow to "fix" #5135 the proper way: if you want to preserve whitespace in a multiline block, a literal string sounds perfect.

Although I see the point, could you elaborate with some examples on where u_str would be useful?

@ScottPJones
Copy link
Contributor

I'd also like something that isn't totally raw, but doesn't do interpolation...
(i.e. closer to C/C++/Java/Javascript... string literals)

@wavexx
Copy link
Contributor Author

wavexx commented Jun 4, 2015

You mean by parsing only backslash sequences? Should we call it "quoted" string? (q"")?

" -> interpolated/backslash
""" -> multiline/interpolated/backlash/de-indent
l" -> raw/raw multiline (no interpolation, no backslash, no de-indent)
q" -> backslashes only?
q""" -> what about de-indent in this one? (I would say de-indent is pretty useful generally for multiline blocks - this is something I hate in Perl which doesn't do it).

@ScottPJones
Copy link
Contributor

Yes, you've got it exactly! 👍

@wavexx
Copy link
Contributor Author

wavexx commented Jun 4, 2015

I can give a stab at this, should be pretty trivial. But some feedback from @StefanKarpinski @pao (from issue #5135, #5200) would be nice.

IMHO we should keep prefixes to a minimum, but q/l seem pretty useful to me. I'm still not convinced about u for unicode (python went down that road already). Some arguments would be nice to hear.

@ScottPJones
Copy link
Contributor

The C/C++ standard is: u8 for UTF-8, u for UTF-16, and U for UTF-32, and L for wchar_t (either UTF-16 or UTF-32). It might be nice if Julia had all of these, and maybe an a for ASCII (so that it could be checked to see if it really was just an ASCII string)... it would help interfacing with C a lot.
I would probably make these interpolation free...

@pao
Copy link
Member

pao commented Jun 4, 2015

(how'd I get involved, I was just acting as the collective unconcious!)

But yeah, a true "do what I say, not what I mean" string literal macro seems almost reasonable to have. Problem: we don't have arbitrary string delimiters, so you still have to escape the " character.

@ScottPJones
Copy link
Contributor

I don't really need the "totally" raw format, but having a set of string macros compatible with the C/C++ definitions, without interpolation, that also would help type stability, I think would be useful.
"string" you can't tell if it will be ASCIIString or UTF8String... If you had these prefixes, you could have:
a"string" (get an error if not valid ASCII)
u8"string" (comes back as UTF8String, even if it doesn't have any characters > 0x7f)

@wavexx
Copy link
Contributor Author

wavexx commented Jun 4, 2015

On 06/04/2015 05:43 PM, pao wrote:

(how'd I get involved, I was just acting as the collective
unconcious!)

Sorry :), you made pretty good points in other issues.

But yeah, a true "do what I say, not what I mean" string literal
macro seems almost reasonable to have. Problem: we don't have
arbitrary string delimiters, so you still have to escape the "
character.

True, but if you have many of them, l""" would still be a pretty good
compromise.

I have mixed feelings about arbitrary delimiters (mostly on the negative
side).

@ScottPJones
Copy link
Contributor

I also dislike arbitrary delimiters...
Oh, and I just thought of something... for the C/C++ style macros, I think you could still have interpolation, just with a different syntax, i.e. \(...), which is used in other languages for interpolation... since \ is already an escape, that sequence won't normally every be used, since ( doesn't need escaping.

@cdsousa
Copy link
Contributor

cdsousa commented Jun 4, 2015

We can also have postfix flags, as in regexes http://julia.readthedocs.org/en/latest/manual/strings/#regular-expressions, to further set the string behavior...

@kmsquire
Copy link
Member

kmsquire commented Jun 5, 2015

Relevant previous discussion: https://groups.google.com/d/topic/julia-users/956hxyU03hc/discussion

@JeffBezanson
Copy link
Member

I always found it really obscure to have lots of different string prefixes. It's hard to remember what they mean, and nearly impossible to guess. Some could be made quite clear, e.g. ascii"str" and utf8"str". I think the extra keystrokes are worth it there.
String interpolation is just unfortunate. As some of you may not yet know, it's probably the only feature of julia I really dislike.

@ScottPJones
Copy link
Contributor

OK, I could live with ascii, utf8, utf16, utf32... the only advantage in my mind for u8, u, and U was that they match C and C++.

I was really surprised when Stefan said that recently... my main beef with it is the syntax... I'd not be so unhappy with it if 1) it were limited to strings with a particular prefix, and 2) it used the syntax \(...)
so as not to "eat" a character that doesn't need to be quoted in most all languages...
What are your objections to it?

@StefanKarpinski
Copy link
Member

Don't get him started. Also, @JeffBezanson, I'm pretty sure you don't like multiline comments either.

@wavexx
Copy link
Contributor Author

wavexx commented Jun 5, 2015

On 06/05/2015 05:36 AM, Scott P. Jones wrote:

OK, I could live with ascii, utf8, utf16, utf32... the only
advantage in my mind for u8, u, and U was that they match C and
C++.

I would be in favor for short identifiers, if these happen to be
required. Long identifiers are better suited for non-base code IMHO.
With the assumption that you have few in base though.

I was really surprised when Stefan said that recently... my main beef
with it is the syntax... I'd not be so unhappy with it if 1) it were
limited to strings with a particular prefix, and 2) it used the
syntax \(...) so as not to "eat" a character that doesn't need to
be quoted in most all languages... What are your objections to it?

I don't have anything in favor/contrary to $ or (). () would clash
less, but depending on how you see it, $var is very convenient for quick
string concatenation like "$a/$b/$c" vs "(a)/(b)/(c)" (at this point
I would just use string()).

@JeffBezanson
Copy link
Member

Multiline comments are nowhere near the problem that string interpolation is. There is no comparison. Code just doesn't belong in strings.

@wavexx
Copy link
Contributor Author

wavexx commented Jun 5, 2015

On 06/04/2015 07:01 PM, Cristóvão Duarte Sousa wrote:

We can also have postfix flags, as in regexes
http://julia.readthedocs.org/en/latest/manual/strings/#regular-expressions,
to further set the string behavior...

Maybe this could actually be a better place to specify internal encoding?

l'str'u8 would request an utf8 literal?

I actually like postfix flags better ...

@ScottPJones
Copy link
Contributor

How did string interpolation ever get into the language, if you dislike it so much? Did the rest of the "Gang of Four" gang up on you? ☺

@diegozea
Copy link
Contributor

I've found this problem while I was writing a multiline string with Perl code inside. For one line strings, I can use asci(b"... in order to avoid the interpolation, but I can't do it for multiline strings because @b_mstr is not defined.

perl_line

@ivarne
Copy link
Member

ivarne commented Jul 25, 2015

Luckily the @*_mstr macros is gone from 0.4, in favor of a custom parser behavior.

@tkelman
Copy link
Contributor

tkelman commented Jan 26, 2017

closed by #19900

@tkelman tkelman closed this as completed Jan 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

No branches or pull requests