Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string macros #47

Merged
merged 4 commits into from
Oct 18, 2022
Merged

Add string macros #47

merged 4 commits into from
Oct 18, 2022

Conversation

mkitti
Copy link
Contributor

@mkitti mkitti commented Oct 12, 2022

This creates a string macro to aid in inline string construction.

julia> using InlineStrings

julia> inline"Hello world"
"Hello world"

julia> typeof(ans)
String15

julia> inline"Hello world"31
"Hello world"

julia> typeof(ans)
String31

julia> @allocated inline"This is a string macro"
0

julia> @allocated inline"This is a string macro"31
0

julia> @allocated InlineString("This is not a string macro")
48

julia> @allocated String31("This is not a string macro")
0

julia> inline"String interpolation does not work: $x"
"String interpolation does not work: \$x"

@quinnj
Copy link
Member

quinnj commented Oct 12, 2022

Hmmmm, I think I'd prefer lowercase? inline"this is a string"31? Thoughts @nickrobinson251/@bkamins?

@mkitti
Copy link
Contributor Author

mkitti commented Oct 12, 2022

I do not think these test failures are related to this change.

@nickrobinson251
Copy link
Collaborator

Thank you for contributing! Glad to have another person interested in the package 😄

I'm just pretty much against macros where avoidable so I should probably defer to others on whether we add this or not (i'd tentatively be against, but it's not much code to maintain and perhaps some users like macros so 🤷)

What's the benefit of the string macro in this case?

I tend to think of macros as mostly providing nicer syntax.
e.g. r"\w+" at least saves letters over Regex("\w+"), likewise tz"UTC" over TimeZone("UTC").
But we already have e.g. String7("foobar") which is shorter (and seems much clearer / less magic to me) than inline"foobar"7. I suppose in the case where we pick the size for you, inline"x" is a bit shorter than InlineString("x")...

If we do add this, i agree with Jacob on preferring the lowercase spelling :)

@bkamins
Copy link

bkamins commented Oct 12, 2022

I prefer lowercase spelling.

However, the major question is in what context do you expect to use these macros in practice. My experience is that these strings are useful if one works with collections of them (usually large collections). In such cases the key thing for performance is having uniform type of strings in the whole collection (I raise this point as this is not guaranteed by this macro - and cannot be).

In summary - could you give some practical example when you think that having such macro would be handy?

@quinnj
Copy link
Member

quinnj commented Oct 12, 2022

I tend to think of macros as mostly providing nicer syntax.

The other benefit of macros is that you can do work at macro-expansion-time instead of at runtime, which is another important benefit of the Regex example you provided. i.e. we compile the regex expression at macro-expansion-time once instead of every time we encounter the regex at runtime.

In a similar vein, but as far as I can tell not implemented as such here, you could do the InlineString construction at macro-expansion-time instead of doing it at runtime. To do that, we'd want to do a full InlineString(str) and then inject the result of that into the quote block. But, you have to be careful with interpolated strings and such because those rely on runtime parameters and such.

@mkitti
Copy link
Contributor Author

mkitti commented Oct 13, 2022

I updated the macros so they are now lowercase and more is done while building the expression.

@bkamins
Copy link

bkamins commented Oct 13, 2022

@mkitti - my question about use-case was important (at least for me). Whether the design you propose is OK will depend on the intended usage scenarios. What are the situations where you assume you would use these macros?

@mkitti
Copy link
Contributor Author

mkitti commented Oct 13, 2022

What are the situations where you assume you would use these macros?

Inline string literals

The way I view this is a way to create "inline string" literals that are interpreted as raw strings. Currently, one has to first construct a String and then pass it to an InlineStrings constructor. These macros allow for the direct construction of inline strings without String intermediates.

julia> using InlineStrings

julia> A = String15[]
String15[]

julia> push!(A, inline"Hello world"15)
1-element Vector{String15}:
 "Hello world"

julia> push!(A, inline"Bye"15)
2-element Vector{String15}:
 "Hello world"
 "Bye"

julia> push!(A, inline"Hasta Luego"15)
3-element Vector{String15}:
 "Hello world"
 "Bye"
 "Hasta Luego"

One practical benefit of this is avoiding unnecessary allocation during construction.

julia> @allocated inline"This is a string macro"
0

julia> @allocated InlineString("This is not a string macro")
48

Consider how the two forms differ in compilation:

julia> f() = inline"This is a string. Hello!"
f (generic function with 1 method)

julia> g() = String31("This is a string. Hello!")
g (generic function with 1 method)

julia> @code_llvm f()
;  @ REPL[30]:1 within `f`
; Function Attrs: uwtable
define void @julia_f_731(i256* noalias nocapture sret(i256) %0) #0 {
top:
  store i256 38178759162904981154304547648621834361872905465358515510743786238036441825304, i256* %0, align 8
  ret void
}

julia> @code_llvm g()
;  @ REPL[31]:1 within `g`
; Function Attrs: uwtable
define void @julia_g_733(i256* noalias nocapture sret(i256) %0) #0 {
top:
  %1 = alloca i256, align 8
  call void @j_String31_735(i256* noalias nocapture nonnull sret(i256) %1, {}* inttoptr (i64 7549763600 to {}*)) #0
  %2 = load i256, i256* %1, align 8
  store i256 %2, i256* %0, align 8
  ret void
}

This is not merely superficial cosmetics. There are real practical and tangible implications of using these macros.

Homoiconicity

The second consideration, not offered in this pull request, is changing how InlineStrings are displayed (as opposed to printed).

At the moment inline strings are displayed exactly like their String counterparts. One cannot tell the type of string from the display below.

julia> String15("Hello world")
"Hello world"

It would be clearer that the result is an inline string if this were displayed as if it was a literal.

julia> String15("Hello world")
inline"Hello world"15

This would make the display of the inline string types homoiconic.

Regular expression strings provide a precedence for this.

julia> r"Hello"
r"Hello"

@nickrobinson251
Copy link
Collaborator

On the display change idea, i wouldn't be in favour of changing the default "pretty printing" which shows inline strings the same as regular strings, but i'd support repr(s) showing how an inline string s would actually be constructed, like #52

Perhaps we could discuss that idea further in an issue or on #52 to not distract from this PR :)

@bkamins
Copy link

bkamins commented Oct 13, 2022

There are real practical and tangible implications of using these macros.

This is true. Sorry for not being precise. My question was in what cases these tangible benefits matter. I.e. do we have use cases where these savings matter? It would have to be cases where there are millions of such literals (most likely such a literal is defined in a function that is called millions of times). Do they happen in practice?

Thinking of this for example using a pattern:

f1(x) = x == String7("abcdefg")
f2(x) = x == inline"abcdefg"

And here f2 will be indeed faster than f1, so it makes sense to have it.

changing how InlineStrings are displayed

This is something that indeed I sometimes users want. However, if we went for this then I would feel that inline"Hello world"15 should mean 15 exactly not 15 or more. I understand why in your macro you went for this widening, but the question - again - is if it is needed. In other words in what use cases 15 or more is better than 15 exactly. For example if the use-case is pushing to a type stable container then 15 exactly is I think a better design.

@mkitti
Copy link
Contributor Author

mkitti commented Oct 13, 2022

Do they happen in practice?

A practical use case is in situations where allocations are not allowed at all. For example, in the use of GPUCompiler.jl on GPUs or StaticCompiler.jl / StaticTools.jl on embedded hardware we might to strictly avoid allocations due to the lack of a garbage collector. Despite inline strings themselves not requiring allocations, the current constructors require allocation.

Here we're trying to enable applications which may be quite difficult with the current constructors.

This is something that indeed I sometimes users want. However, if we went for this then I would feel that inline"Hello world"15 should mean 15 exactly not 15 or more. I understand why in your macro you went for this widening, but the question - again - is if it is needed. In other words in what use cases 15 or more is better than 15 exactly. For example if the use-case is pushing to a type stable container then 15 exactly is I think a better design.

In the case of display, it does mean 15 exactly. The limitation in this package is that only values of 2^N-1 where N a nonnegative integer are valid constructions. For other values, we cannot guarantee it will be exactly that value. However, it should consistently round up to the same number, the nearest 2^N-1 value greater than the given value. That logic is implemented by InlineStringType.

julia> using InlineStrings

julia> inline"abcdefg"11
"abcdefg"

julia> typeof(ans)
String15

julia> inline"hello"11
"hello"

julia> typeof(ans)
String15

@bkamins
Copy link

bkamins commented Oct 14, 2022

Ah - now I have looked at implementation. Sorry for not checking this earlier. So you guarantee type stability here when you allow passing a number.
So I am in favor of adding these macros.

What I would discuss though if it were not better to just define macros: inline1, inline3, inline7, inline15, inline31, inline63, inline127, and inline255 instead of allowing any number - which can be confusing. (+ inline macro that would be flexible)

See even at our discussion - when I was saying "exactly" I meant that for a single number passed it is always the same InlineStringX type would be picked (which is what you propose to do and I was not precise enough) while you understood I mean "exactly that number of codeunits".

@mkitti
Copy link
Contributor Author

mkitti commented Oct 14, 2022

In that case, perhaps inline"<str>"N should error if N is not 2^n-1? Then we only need one macro.

@bkamins
Copy link

bkamins commented Oct 14, 2022

We could. Let us wait for others to comment as this is a matter of style.
I felt that having a suffix after a string (especially if the string were long) is harder to read and having several more macros would not that big problem.

Thank you for submitting this PR.

@quinnj
Copy link
Member

quinnj commented Oct 17, 2022

yeah, I agree on preferring the inline3/inline5/etc names instead of the trailing number args. As mentioned, these numbers are fixed, so it makes sense to me to just bake them into the macro names. I'm also fine adding an inline"..." form that would figure out the size automatically so you don't have to be counting characters all the time.

@mkitti
Copy link
Contributor Author

mkitti commented Oct 17, 2022

I made individual macros for 2^N-1

julia> inline1"a"|> typeof
String1

julia> inline3"a"|> typeof
String3

julia> inline7"a"|> typeof
String7

julia> inline15"a"|> typeof
String15

julia> inline31"a"|> typeof
String31

julia> inline63"a"|> typeof
String63

julia> inline127"a"|> typeof
String127

julia> inline255"a"|> typeof
String255

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nickrobinson251
Copy link
Collaborator

can we drop the inline"str"N macros (with the N suffix)?

@mkitti
Copy link
Contributor Author

mkitti commented Oct 17, 2022

Is it weird that the canonical type is called String15 but the macro is inline15?

@quinnj
Copy link
Member

quinnj commented Oct 17, 2022

can we drop the inline"str"N macros (with the N suffix)?

Hmmm, yeah, it doesn't seem like this form is as useful now that we have inline15"...". Technically, the N form would allow you to do inline"abc"10, but I'm not sure that's useful at all? i.e. you'd still end up with an InlineString15, but you have to statically specify 10 to get it to pick InlineString15. So I'm in favor of removing.

Is it weird that the canonical type is called String15 but the macro is inline15?

Yeah, we could consider naming these string15"...", but then we have the inconsistency of having inline"...", which I personally believe will be the most commonly used of all these. So for that reason, I think I prefer staying with inline15"...".

@nickrobinson251
Copy link
Collaborator

i also prefer inline" as the name, and even tend to use the more vebose InlineString names, as it allows grepping for "inline" in a codebase which i've found useful more than once.

Thanks again for the PR -- now that things are done at macro-expansion time and can help avoid allocations i think this is a really nice addition!

@@ -277,6 +295,28 @@ end
InlineString(x::InlineString) = x
InlineString(x::AbstractString)::InlineStringTypes = (InlineStringType(ncodeunits(x)))(x)

"""
inline"string"
inline"string"N
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also not add inline"string"N. If it is really needed we can add it later, but I do not feel it is useful given the other definitions we add.

@@ -277,6 +295,28 @@ end
InlineString(x::InlineString) = x
InlineString(x::AbstractString)::InlineStringTypes = (InlineStringType(ncodeunits(x)))(x)

"""
inline"string"
inline"string"N
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also not add inline"string"N. If it is really needed we can add it later, but I do not feel it is useful given the other definitions we add.

@mkitti
Copy link
Contributor Author

mkitti commented Oct 17, 2022

The N suffix makes it a lot easier to build a second layer macros or metaprogramming on top of this. Consider the following:

julia> x = 15
15

julia> quote
           @inline_str "Hello World" $x
       end
quote
    #= REPL[35]:2 =#
    inline"Hello World"15
end

julia> eval(ans)
"Hello World"

julia> typeof(ans)
String15

The only alternative I know of is to build the macro call expression:

julia> x = 15
15

julia> e = Expr(:macrocall, Symbol("@inline$(x)_str"), LineNumberNode(0), "Hello world")
:(inline15"Hello world")

julia> eval(e)
"Hello world"

julia> typeof(ans)
String15

@bkamins
Copy link

bkamins commented Oct 17, 2022

Yes, but the question is when would one need them in practice (as I have commented - these macros can be added later if they are needed). However, if you indeed need then now then I think they can be added.

@mkitti
Copy link
Contributor Author

mkitti commented Oct 18, 2022

I suppose we should export all of the new macro forms. edit: Already done.

@quinnj quinnj merged commit b9c58bb into JuliaStrings:main Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants