Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to trim leading spaces #9

Open
joprice opened this issue Aug 20, 2021 · 5 comments
Open

option to trim leading spaces #9

joprice opened this issue Aug 20, 2021 · 5 comments

Comments

@joprice
Copy link

joprice commented Aug 20, 2021

Is your feature request related to a problem? Please describe.

When leading spaces are significant, the text must be fully justified to the left of the file to avoid including them:

   let ssdp_message =
      [%string
        {|M-SEARCH * HTTP/1.1
HOST:$group
MAN:ssdp:discover
MX:10
ST:urn:smartspeaker-audio:service:SpeakerGroup:1
|}]

Describe the solution you'd like

A separate extension or a special character (as the pipe character in multiline strings in scala) could be used to indicate the leading space should be removed

   let ssdp_message =
      [%string.trimmed
        {|M-SEARCH * HTTP/1.1
          HOST:$group
          MAN:ssdp:discover
          MX:10
          ST:urn:smartspeaker-audio:service:SpeakerGroup:1
        |}]

or

   let ssdp_message =
      [%string
        {|M-SEARCH * HTTP/1.1
          |HOST:$group
          |MAN:ssdp:discover
          |MX:10
          |ST:urn:smartspeaker-audio:service:SpeakerGroup:1
        |}]
@ksromanov
Copy link
Contributor

ksromanov commented Aug 21, 2021

Thanks for the suggestion. Yes, it makes sense.This is especially useful for C++ codegeneration like

let emptyTemplateTClass name =
  [%string.padded
     {|template <class T> class $name {
           $name() {}
           ~$name() {}
       };
      |}]
      
 let emptyTemplateTClass2 name =
  [%string.padded
     {|
       template <class T> class $name {
           $name() {}
           ~$name() {}
       };
      |}]
      
 let emptyTemplateTClass3 name =
  [%string.padded
     {|
      |template <class T> class $name {
      |    $name() {}
      |    ~$name() {}
      |};
      |}]
      
 let emptyTemplateTClass4 name =
  [%string.padded
     {|
      |template <class T> class $name {
           $name() {}
           ~$name() {}
       };
      |}]

 let emptyTemplateTClass5 name =
  [%string.padded "
      |template <class T> class $name {
           $name() {}
           ~$name() {}
       };
      "]
 
 let emptyTemplateTClass6 name =
  [%string.padded
     "
       template <class T> class $name {
           $name() {}
           ~$name() {}
       };
      "]

producing

template <class T> class $name {
     $name() {}
     ~$name() {}
};

I am not sure though if .trimmed is a good suffix. May be '.indented'?

Another question is how this would interact with ecosystem. Especially formatters.

@joprice
Copy link
Author

joprice commented Aug 21, 2021

I agree. I didn't like it when I wrote it. string.indented sounds good, also string.strip_leading, string.remove_padding, string.strip_margin. The last one is the terminology scala uses, but more specifically means that there is a character that is used to designate where the margin starts, which means the whitespace can differ per line.

Also, I haven't checked the implementation of the ppx to know if this could happen at compile time, but the scala version I mentioned is actually a runtime implementation: https://github.com/scala/scala/blob/v2.13.6/src/library/scala/collection/StringOps.scala#L765-L784. If these strings are created in a tight loop, maybe you would benefit from the compiler preprocessing them, but if they are created once at startup, then this can just be a library function instead and doesn't have to be in a ppx.

@ksromanov
Copy link
Contributor

I would prefer to use the name without verb, since it makes it lengthy and harder to type (people usually want to type less, if possible). So, string.padded or smth like that?

If it is possible, i.e. there are no blockers, which I do not see right now, it should be compile-time implementation, since it is even easier to implement than run-time.

I am mostly worried about the interaction with the ecosystem, especially formatters, which can rewrite many things and move the leading {|. I would prefer to have a clean simple semantics + minimum of visual noise. Therefore your first variant is better than Scala-like, though the latter is "formatter-proof".

I will think about this sometime next week if you do not mind - we are quite busy now.

@joprice
Copy link
Author

joprice commented Aug 21, 2021

No urgency on my end. It would be a convenient addition but not strictly necessary.

@ksromanov
Copy link
Contributor

We considered several variants of formatting with my colleague @gxmas , so I changed my comment above and included all of them. The ideal seems to be №2 (unfortunately it does not work with ocamlformat).

I have checked the behavior of ocamlformat. It can change the formatting inside the extension point like [%string] and move the first line of the multiline string. However, it can't move the second and subsequent lines of this string, because it changes semantics.

Also, Ocaml lexer does not extract extension point as a whole - see https://github.com/ocaml/ocaml/blob/trunk/parsing/lexer.mll#L553

This means, that Ocaml parser takes the payload as a stream of tokens, so the number of spaces and newlines between these tokens can be arbitrary - https://github.com/ocaml/ocaml/blob/trunk/parsing/parser.mly#L3855

In other words, ocamlformat can (and does for some profiles) legitimately rewrite

[%string 
"   Hello
    wonderful
    world!"]

into

[%string "    Hello
    wonderful
    world!"]

and back.

This means that we are:

  1. Bound to completely ignore the first line of the string (line with "-----" is ignored):
[%string.padded {|-----
    Hello World! |}] == "    Hello World! "
  1. We can either use complete Scala solution (see №3 above), or we can use one extra pipe in the beginning (№4 and №5 above), or use №2, but take into account only the position of the closing pipe (in |}).

@joprice what do you think about these variants?

I'll try to make a proof of concept of №4 within next couple weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants