New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding formatted string output #108

Closed
NOP0 opened this Issue Nov 25, 2017 · 14 comments

Comments

Projects
None yet
4 participants
@NOP0

NOP0 commented Nov 25, 2017

Gitter discussion suggests that possible solution could be printf macro.

@hellerve

This comment has been minimized.

Contributor

hellerve commented Nov 30, 2017

I have a proposal for this in a feature branch that works with interfaces. Let me describe it and see whether it does what we need, is okay to look at, and makes sense.

It registers an interface format of type (λ [String a] String) that specializes on the second argument. A motivating example:

> (Float.format @"%.4d" 10.0f)
=> "10.0000"
> (Int.format @"%x" 255)
=> "ff"

Why do we need to define this interface for any type when it is only a thin wrapper around snprintf? Well, I’m glad you asked. This is needed to preserve type hygiene; otherwise we’d have to introduce some sort of catch-all type into the interop, and that doesn’t sound like a great idea.

There are some open questions and concerns from my side (feel free to add to the list):

  • is this good enough? While I feel like this interface is nice to have, it might become tedious to print more complicated expressions. This can be partially solved by a simple macro, if we made, say, an interface like this: (fmt "%x" 255 " %.4d" 100.0f). I didn’t add that macro, because I couldn’t settle on a good API.
  • having user-defined strings in *printf functions is a common security hole. We’ll have to make sure that we document that having a potential attacker define the format string is insecure.
  • what about compound types? Since there is no ready-made solution from *printf for structs, I’m not sure how to best design such an interface. Keep in mind that we already have str, and we are able to format the properties of a type by themselves, but maybe a syntax for structs would be interesting? That’s kind of a moonshot, though, I think, because it would complicate the implementation a lot.

I think that’s all!

Cheers

@jiegec

This comment has been minimized.

jiegec commented Nov 30, 2017

I think we should add variadic function first?

@hellerve

This comment has been minimized.

Contributor

hellerve commented Nov 30, 2017

The problem is that in this case that wouldn’t help, because printf is not only variadic, but also has a list of catch-all types as parameters. Unless I’m missing something profound, there is no valid signature for printf that doesn’t use the varargs ellipsis.

@jiegec

This comment has been minimized.

jiegec commented Nov 30, 2017

What about implement a printf-like macro that convert to (concat (Int.fmt "%d" a) (Char.fmt "%c" b))? Macro can have varargs (and maybe compile-time type check, which some c/c++ compilers already does and shows a warning when type mismatched).

@hellerve

This comment has been minimized.

Contributor

hellerve commented Nov 30, 2017

I think we talked about that idea on the chat! I do agree that this sounds good, but have one major concern:

It would require introspection and parsing of the string, which is not currently possible inside the macro. We could do it at runtime, but that would be a terribly costly idea. Why would we need that? Because it needs to split the format strings on the % characters; and it’d also have to take care of the %% special case and such. That’s why I talked about an already-split fmt macro in my first comment. I would also try to optimize it, but bear in mind that it would generate a string for every argument and then merge them, which sounds like a fairly slow operation, even without benchmarking it.

As I said, I’m not sure the format interface solves our problems. But I think it is a step in the right direction.

@jiegec

This comment has been minimized.

jiegec commented Nov 30, 2017

Oh, I didn't join the gitter chat and missed those conversations.

@hellerve

This comment has been minimized.

Contributor

hellerve commented Nov 30, 2017

That’s alright, it’s good that we document it here.

To recap, a macro that analyzes the string would be awesome, but sounds like a lot of work. I’m not opposed to work per se, but I would like to make sure we do not find a simpler version that is satisfying our requirements—which also includes a good, clean, intuitive API of course.

In general, I also think that while it’s important to think of a solid system to implement, it is empowering to realize that the system might change again before 1.0 if we find a better way to do things.

@eriksvedang

This comment has been minimized.

Collaborator

eriksvedang commented Nov 30, 2017

What (dynamic) functions would a macro that analyses the format string need access to? (roughly)

It's probably pretty handy functions in general, so might be worth adding those and give it a try.

@hellerve

This comment has been minimized.

Contributor

hellerve commented Nov 30, 2017

We would have to traverse the string and be able to split and merge it. I can sketch a function that does what we want and see what I’ll need for it.

In general, I’d opt for the functions being consistent for the ones we’ll implement in #94, to reduce confusion, even if the internal implementation will probably be quite different.

@eriksvedang

This comment has been minimized.

Collaborator

eriksvedang commented Nov 30, 2017

Yes agreed. I'll create a module, something like Dynamic.String so they can avoid clashing with the existing ones.

@hellerve

This comment has been minimized.

Contributor

hellerve commented Nov 30, 2017

I’ve attached an obviously untested prototype. This might work, but mostly it highlights which functions we need:

(defdynamic fmt-internal [s args]
  (let [idx (String.index-of s \%)
        len (String.count s)]
    (cond
      (= idx -1) s ; no more splits found, just return string
      (= \% (String.char-at s (inc idx))) ; this is an escaped %
        (list 'String.append
              (String.substring s 0 (+ idx 2))
              (fmt-internal (String.substring (+ idx 2) len)))
      (= 0 (count args)) ; we need to insert something, but have nothing
        (macro-error "error in format string: not enough arguments for format string")
      ; okay, this is the meat:
      ; get the next % after our escaper
      (let [next (String.index-of (String.substring s (inc idx) len) \%)]
        (if (= 0 next)
          (list 'fmt slice (car args))
          (let [offs (+ (inc idx) 1)
                slice (String.substring s 0 offs)]
            (list 'String.append (list 'fmt slice (car args))
                                 (fmt-internal (String.substring s offs len)
                                               (cdr args)))))))))

(defmacro fmt [s :rest args]
  (fmt-internal s args))

The functions we need are thus (substring str from to), (index-of str char), and (char-at str index). One thing about this is problematic: the appending actually happens at runtime—because of the fmt, which might not be evalable at macro expansion time—, thus leading to a fairly high number of appends and possibly tiny strings in memory. I’m also not at all sure about the soundness of the algorithm, but it might work.

I also attached a proof of concept/hopefully faithful transcription in Python for people to run if they want to test the behaviour:

def fmt(s, args):
    idx = s.find("%")

    if idx == -1:
      return s
    if s[idx+1] == "%":
      return s[:idx+2] + fmt(s[idx+2:], args)
    if not args:
      raise Exception("error in format string: not enough arguments for format string")

    nxt = s[idx+1:].find("%")
    if nxt == -1:
      return s + "(arg={})".format(args[0])
    slc = s[:nxt+1+idx]
    return slc + "(arg={})".format(args[0]) + fmt(s[nxt+1+idx:], args[1:])

The args are interspersed differently, of course, but it should work.

@eriksvedang

This comment has been minimized.

Collaborator

eriksvedang commented Nov 30, 2017

Excellent, that's a very good start! I'll try to make this convenient to write using the dynamic String functions. Will write here again when those are in place and works well enough.

@jiegec

This comment has been minimized.

jiegec commented Dec 1, 2017

We can force the format string to be a string literal as in Rust std::format!.

And we can add 'varargs of same type' function like:

// Force the type check on Carp side
// (register concat (\lambda [:rest String] String) (omitting the ref for now)
// (String.concat str1 str2 str3)
// expands to
// (String.concat_internal [str1 str2 str3])
void String_concat_internal(Array args) {
  ...
}
// then (format "%d %s" 6 s) can be expanded to
// (String.append (Int.to_string 6) " " s)

Alternatively, we can use a precompiled version of format string and its arguments as in Rust std::fmt::Arguments and pass it to internal functions to avoid additional costs.

hellerve added a commit to hellerve/Carp that referenced this issue Dec 26, 2017

hellerve added a commit to hellerve/Carp that referenced this issue Dec 30, 2017

@hellerve

This comment has been minimized.

Contributor

hellerve commented Dec 30, 2017

I’ve added #154 which puportedly fixes this. I hope this is what we want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment