Subtle changes to `sprintf` in 6.e

With RakuAST getting closer to be the default backend, it was time to
me to revisit the work I did on `sprintf` about 3 years ago (but
which was halted because of the inability to be able to use synthetically
generated code in precomp files.  Which is now fixed, thanks to @ugexe++).

## Background
Perhaps some background first: the current (legacy) implementation of
`sprintf` actually lives in NQP and has a number of known issues:

1. slow

Because *every* call to `sprintf` takes the format string, parses it
using the grammar in NQP and produces a string from the result of that
parse.

2. errors

Several nooks and crannies produces incorrect strings, e.g.
`sprintf("%#-08.2f",0)` produces `"0000.000"`  whereas it should produce
`"00000.00"`.  Or `sprintf("%#o",-64)` producing `"0-100"`.

3. unimplemented features

Some more obscure features are not implemented, such as `#` in `%f`
(show decimal point even if the precision is 0), or `%F` (uppercase
Inf and NaN).

4. Native `num` based

The NQP implementation is based on native `num`s and is thus limited in
its precision.

## Re-imagine

So with all of that known now (well, actually I only knew about 1.) it
felt like an excellent opportunity to create a new implementation using
RakuAST.

The big difference with the legacy implementation is that the RakuAST
implementation only runs the `sprintf` grammar **once** and from that
creates a `Callable`.  A simple example: the format string `"foo%xbar"`
creates:
```raku
-> Int() $a! {
    ("foo", $a ?? $a.base(16).lc !! "0", "bar").join
}
```

The advantage to this approach is that once the `Callable` has been
created, it is a piece of code that will be executed (and runtime
optimized) like any other Raku code.  And in principle, the above
format could be converted to a `Callable` at compile time, because
there are no dynamic parts in the format string (which would happen
at the optimize stage in RakuAST, which is still quite underdeveloped
at the moment).

Anyways, [I blogged about it at the time](https://dev.to/lizmat/moving-printf-formats-forward-1m3p).

## Issues / Worries

1. %s vs $foo

The parsing a grammar and turning a format string into executable code
is quite a bit more involved than parsing a grammar to create a string.
So an old optimization for just interpolating a string was to use a
double quoted string, instead of using `%s` in the format and passing
the string as an argument.

This optimization is still valid (to an extent) **if** the content of
the variable doesn't change.  However, if the contents of that variable
changes, creating a `Callable` for each change can become a "burden".
Some timings:
```
$ time raku -e 'use v6.d; my $a = "foo"; Nil for ^100000'
real    0.10s
$ time raku -e 'use v6.d; my $a = "foo"; sprintf("zip%s",$a) for ^100000'
real    1.92s
$ time raku -e 'use v6.d; my $a = "foo"; sprintf("zip$a") for ^100000'
real    0.81s
$ time raku -e 'use v6.d; sprintf("zip$_") for ^100000'
real    0.81s
$ time raku -e 'use v6.*; my $a = "foo"; sprintf("zip%s",$a) for ^100000'
real    0.19s
$ time raku -e 'use v6.*; my $a = "foo"; sprintf("zip$a") for ^100000'
real    0.16s
$ time raku -e 'use v6.*; sprintf("zip$_") for ^100000'
real    59.65s
```
This shows that the new `sprintf` implementation, even *without* compile
time caching of the `Callable`, is between 100x and 200x as fast as the
old implementation, **except** when the format string is not static, in
which case the new implementation is about 80x **slower**.

So I feel that might become a *gotcha* for some code in the ecosystem,
when people are expecting to have things go faster, but in fact would
only see a *very significant* slowdown.

> In RakuAST at CHECK time, it should be possible to issue a "worry"
> for the case when a `sprintf` is being called with a dynamic string.
> However, as seen above, if the variable in there is not changing its
> value often, it may well be a valid optimization.  So YMMV.

2. Stricter binding

As seen in the generated code example for `"foo%x"`, the argument is
using coercion to create an `Int` value that is needed to be able to
convert it to a hex presentation.
```raku
-> Int() $a! {
    ("foo", $a ?? $a.base(16).lc !! "0", "bar").join
}
```
However, this creates a difference in behaviour for type objects
(a difference that is actually "enshrined" in some spectests):
```
$ raku -e 'use v6.d; dd sprintf("foo%x",Num)'
Use of uninitialized value of type Num in numeric context
  in block <unit> at -e line 1
"foo0"
$ raku -e 'use v6.*; dd sprintf("foo%x",Num)'
Cannot create an Int from a 'Num' type object
  in block <unit> at -e line 1
```
The stricter coercion rules make passing the `Num` type object an
execution error, because `Num.Int` is an execution error.

One could argue that this is an improvement, as in the legacy
implementation it would just create a warning, and produce `"0"`
and thus hide a problem in the input data.

On the other hand one could argue that this is a regression.

> If we're going to consider this a regression, then additional
> checking code would have to be inserted, which would slow down
> the general execution of `sprintf`.  And *if* we're going to
> do that, we would need to decide whether this is only going
> to be for the `sprintf` function, and not for the new `q:o//`
> adverb in quoting.

## Suggestions

I think I covered all possible issues with regards to `sprintf`
in 6.e.  But I probably missed a few.  So please comment!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subtle changes to `sprintf` in 6.e #519

Background

Re-imagine

Issues / Worries

Suggestions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Subtle changes to sprintf in 6.e #519

Description

Background

Re-imagine

Issues / Worries

Suggestions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Subtle changes to `sprintf` in 6.e #519