Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use my textwrap crate for wrapping help texts #845

Merged
merged 3 commits into from
May 29, 2017

Conversation

mgeisler
Copy link
Contributor

@mgeisler mgeisler commented Feb 8, 2017

This PR replaces the wrap_help function with a call to my textwrap crate which does word wrapping using a simpler and faster algorithm. I've extended the 05_ripgrep benchmarks with a benchmark that tests generating the help text -- I figured that ripgrep would be a good example since it has lots of long strings in the help output. Comparing before and after looks like this on my machine:

 name              before ns/iter  after ns/iter  diff ns/iter   diff %  speedup 
 build_app_long    22,101          21,099               -1,002   -4.53%   x 1.05 
 build_app_short   22,138          21,205                 -933   -4.21%   x 1.04 
 build_help_long   514,265         284,467            -229,798  -44.68%   x 1.81 
 build_help_short  85,720          85,693                  -27   -0.03%   x 1.00 
 parse_clean       23,471          22,859                 -612   -2.61%   x 1.03 
 parse_complex     29,535          28,919                 -616   -2.09%   x 1.02 
 parse_lots        422,815         414,577              -8,238   -1.95%   x 1.02 

I've also added a test for #617 to ensure that the output described there is kept unchanged. That is, textwrap will now by default keep horizontal whitespace when wrapping text and this allows you to do rudimentary formatting in the help text by manually inserting whitespace.


This change is Reviewable

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 91.689% when pulling ba6a6c2 on mgeisler:textwrap into 92919f5 on kbknapp:master.

@kbknapp
Copy link
Member

kbknapp commented Feb 9, 2017

Thanks for taking the time to do this, but I do have a few questions/concerns before this is merged:

  • Does text_wrap compile on Stable 1.11.0? If not, it would be a somewhat breaking change for clap to include it, as it increases the minimum version of Rust. According to claps compatibility policy, this is allowed, but the minor must be bumped (no big deal), but the key peice is as of this writing it must compile on stable 1.13.0 (Stable - 2 releases). If that can't be done, utililzing this feature needs to be done with the unstable cargo feature which includes all previous functionality if that feature is omitted.
  • How does text_wrap handle the whitespace alignment? I haven't dug into the source or documenation yet, so forgive me if it's listed clearly. I want to ensure clap isn't unknowingly taking a step back if we lose whitespace alignment, i.e.
    -f, --flag    some long text that needs to
                  be wrapped. Notice the
                  whitespace after each
                  newline

To be clear, I'm not saying this unmergable, just we need to ensure all the bases are covered first 😄

@mgeisler
Copy link
Contributor Author

mgeisler commented Feb 9, 2017

Hi Kevin, thanks for the comments!

Does text_wrap compile on Stable 1.11.0?

I'm very new to Rust so I didn't make any conscious choice about not supporting that version :-) Looking at this a bit indicates that it's the quote crate that is causing problems on Rust 1.11.0 -- it's being pulled in via the hyphenation crate.

I've tested with hyphenation version 0.5.0 and that version compiles all the way back to Rust 1.8.0. So I think I'll simply use that version -- the results seems to be the same, except that performance is worse (this affects only the words for which hyphenation is attempted, i.e., the first word that overflow a line). I didn't do anything to enable hyphenation in clap-rs -- but it could be a cool option to have, I think.

How does text_wrap handle the whitespace alignment? I haven't dug into the source or documenation yet, so forgive me if it's listed clearly.

It probably isn't listed very clearly... :-) It is very simple currently: it splits the string on whitespace and proceeds to add each word into lines, one by one. So

"foo    bar  \t  baz"

is treated just like "foo bar baz" with regard to wrapping.

From your example, I think you might be talking about an initial/subsequent indentation, though -- the string of whitespace on the second to last line? That kind of formatting is not supported yet, but I would like to have it: mgeisler/textwrap#26.

It doesn't seem to be necessary, though, since I'm just replacing the wrap_help function which also didn't do anything with indentation. When textwrap can do nice indented wrapping, then I think one can simplify the clap-rs code further by letting textwrap format an option completely.

@mgeisler
Copy link
Contributor Author

mgeisler commented Feb 9, 2017

Because all whitespace (including tabs, newlines, etc) is treated as a single space, #617 will reintroduced if this PR were to be merged -- as mentioned, I added a test for the issue and the test passes before and fails after this PR.

@kbknapp
Copy link
Member

kbknapp commented Feb 9, 2017

Ah ok, I understand now, I was under the impression textwrap would be handling the indentation as well. It makes no difference to me if it's handled in textwrap or clap so long as it's handled performantly 😉

I didn't do anything to enable hyphenation in clap-rs -- but it could be a cool option to have, I think.

Perhaps. I have mixed feelings on hyphenation, it's subjective but I find it harder to read when not required. A good use for it, IMO would be when the terminal size is small, or there isn't much room for wrapping on whole word boundaries. I would be good with hyphenation happening in those cases, but I think enabling it wholesale would be a bit much.

Also, if hyphenation isn't enabled, it'd nice to be able opt-out of pulling in related deps that aren't used. I don't know if this can be done in textwraps setup via cargo features, or not? Similar to how clap lets you opt out of color or wrap_help or suggestions to avoid pulling in deps frivilously.

[..] #617

I would like to maintain this abiltiy, so it'll be a blocker until textwrap can handle extra whitespace. I'm glad to hear it's on your mind though!


One last thing I forgot ask beforehand, does textwrap handled unicode boundaries? I've had issues before with homegrown wrapping solutions cutting unicode graphemes in half (current clap handles them correctly AFAIK).

@kbknapp
Copy link
Member

kbknapp commented Feb 9, 2017

Once this PR reaches a mergable state I'd like to do some comparison benchmarks too for reference

@mgeisler
Copy link
Contributor Author

mgeisler commented Feb 9, 2017

A good use for it, IMO would be when the terminal size is small, or there isn't much room for wrapping on whole word boundaries. I would be good with hyphenation happening in those cases, but I think enabling it wholesale would be a bit much.

Yeah, that makes a lot of sense! The user would have to tell clap which hyphenation patterns to use in any case, so I don't think it should be enabled unconditionally.

I don't know if this can be done in textwraps setup via cargo features, or not?

Nope, there's no support for this at the moment -- but I think you're right that it would make sense to add such a feature flag. The hyphenation dependency is rather large and pulls in many other dependencies.

I would like to maintain this abiltiy, so it'll be a blocker until textwrap can handle extra whitespace. I'm glad to hear it's on your mind though!

Very well, I have some ideas for how this could be supported -- I'll let you know when it's there. Feel free to close/park the PR until then.

One last thing I forgot ask beforehand, does textwrap handled unicode boundaries?

Yes, it should handle that fine -- mostly due to how it simply iterates over words from split_whitespace(). The word length is computed with the unicode_width crate, just like the old wrap_help function did.

I saw that there is a unicode_segmentation crate, but the way it splits things into "words" doesn't really match what I would expect -- some words can be adjacent in the input text. I don't know how it fares on languages like Japanese or Chinese, though -- with no whitespace, it will treat the entire text as one big word.

The next version of textwrap will have an option to break words longer than the available line width -- that might work okay for scripts where there are no whitespace between the characters.

@homu
Copy link
Contributor

homu commented Feb 21, 2017

☔ The latest upstream changes (presumably #867) made this pull request unmergeable. Please resolve the merge conflicts.

@mgeisler
Copy link
Contributor Author

I've fixed mgeisler/textwrap#36, so one can now use textwrap without the automatic hyphenation. I'll see about the indentation issue next and then resolve the merge conflicts here.

@mgeisler mgeisler force-pushed the textwrap branch 4 times, most recently from 3ff065d to 2478090 Compare May 22, 2017 22:29
@mgeisler mgeisler changed the title RFC: Use my textwrap crate for wrapping help texts Use my textwrap crate for wrapping help texts May 22, 2017
@mgeisler
Copy link
Contributor Author

Hi @kbknapp, I've finally updated this PR :-) With the newly released version 0.6.0 of textwrap, whitespace is new kept by default. This means the output in #617 is preserved.

I've also added a benchmark which shows that the new code is faster -- generating the large and complicated help text for ripgrep is now apparently about 45% faster than before.

In my Travis, I have tested textwrap back to Rust 1.8.0 and it works there when the optional hyphenation feature is not enabled: https://travis-ci.org/mgeisler/textwrap/.

I've updated the PR description above to match the new state. I took out the old cleanup commits from this PR -- I hope to submit them afterwards if this gets merged. Please let me know if there's anything missing!

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 92.121% when pulling 2478090 on mgeisler:textwrap into 2923515 on kbknapp:master.

@mgeisler
Copy link
Contributor Author

I see a failed job in Travis -- the job that tests with Rust 1.11.0. I believe that is because of
this change in the unicode-segmentation crate: unicode-rs/unicode-segmentation@aaf9da4#diff-dd3463da52dddbc4cbf510ba1b8c1fd2R715 which introduced the ? operator. That commit is part of the just released unicode-segmentation 1.2.0: https://crates.io/crates/unicode-segmentation/1.2.0.

I think that build failure is unrelated to my change :-) The ? operator became stable in Rust 1.13.0, so perhaps the minimum version could be bumped to that? Alternatively, one could change the dependency on unicode-segmentation to be <1.2.0 or something like that. Thoughts?

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 92.121% when pulling 2478090 on mgeisler:textwrap into 2923515 on kbknapp:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 92.121% when pulling bf0b157 on mgeisler:textwrap into 2923515 on kbknapp:master.

homu added a commit that referenced this pull request May 28, 2017
…r=kbknapp

cargo: restrict unicode-segmentation to ~1.1.0

Version 1.2.0 of unicode-segmentation adds code that use the "?"
operator, which in turn requires Rust 1.13.0. However, clap currently
still works with Rust 1.11.0 and this caused build failures:

  https://travis-ci.org/kbknapp/clap-rs/jobs/235010822

The changes since 1.1.0 seem to be related cursors/iterators and I
think clap can work fine without them.

This was found as part of #845. See also unicode-rs/unicode-segmentation#26.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/kbknapp/clap-rs/967)
<!-- Reviewable:end -->
@homu
Copy link
Contributor

homu commented May 28, 2017

☔ The latest upstream changes (presumably dd4c41e) made this pull request unmergeable. Please resolve the merge conflicts.

The ripgrep benchmarks work with lots of options and have really long
help texts. This makes them a good fit for judging the overall time it
takes to construct and format the help text.
This adds a test for the issue clap-rs#617 about keeping whitespace intact in
manually aligned text.
The textwrap crate uses a simpler linear-time algorithm for wrapping
the text. The current algorithm in wrap_help uses several O(n) calls
to String::insert and String::remove, which makes it potentially
quadratic in complexity.

Comparing the 05_ripgrep benchmark at commits textwrap~2 and textwrap
gives this result on my machine:

 name              before ns/iter  after ns/iter  diff ns/iter   diff %
 build_app_long    22,101          21,099               -1,002   -4.53%
 build_app_short   22,138          21,205                 -933   -4.21%
 build_help_long   514,265         284,467            -229,798  -44.68%
 build_help_short  85,720          85,693                  -27   -0.03%
 parse_clean       23,471          22,859                 -612   -2.61%
 parse_complex     29,535          28,919                 -616   -2.09%
 parse_lots        422,815         414,577              -8,238   -1.95%

As part of this commit, the wrapping_newline_chars test was updated.
The old algorithm had a subtle bug where it would break lines too
early. That is, it wrapped the text like

    ARGS:
        <mode>    x, max, maximum   20 characters, contains
                  symbols.
                  l, long           Copy-friendly,
                  14 characters, contains symbols.
                  m, med, medium    Copy-friendly, 8
                  characters, contains symbols.";

when it should really have wrapped it like

    ARGS:
        <mode>    x, max, maximum   20 characters, contains
                  symbols.
                  l, long           Copy-friendly, 14
                  characters, contains symbols.
                  m, med, medium    Copy-friendly, 8
                  characters, contains symbols.";

Notice how the word "14" was incorrectly moved to the next line. There
is clearly room for the word on the line with the "l, long" option
since there is room for "contains" just above it.

I'm not sure why this is, but the algorithm in textwrap handles this
case correctly.
@coveralls
Copy link

coveralls commented May 29, 2017

Coverage Status

Changes Unknown when pulling 12282e1 on mgeisler:textwrap into ** on kbknapp:master**.

@mgeisler
Copy link
Contributor Author

Yay, with the unicode-segmentation version fixed, the build passed.

@kbknapp
Copy link
Member

kbknapp commented May 29, 2017

Nice! Thanks for sticking with this and putting in all the work 😄 ❤️

@homu r+

@homu
Copy link
Contributor

homu commented May 29, 2017

📌 Commit 12282e1 has been approved by kbknapp

homu added a commit that referenced this pull request May 29, 2017
Use my textwrap crate for wrapping help texts

This PR replaces the `wrap_help` function with a call to my [textwrap][1] crate which does word wrapping using a simpler and faster algorithm. I've extended the `05_ripgrep` benchmarks with a benchmark that tests generating the help text -- I figured that ripgrep would be a good example since it has lots of long strings in the help output. Comparing before and after looks like this on my machine:

```
 name              before ns/iter  after ns/iter  diff ns/iter   diff %  speedup
 build_app_long    22,101          21,099               -1,002   -4.53%   x 1.05
 build_app_short   22,138          21,205                 -933   -4.21%   x 1.04
 build_help_long   514,265         284,467            -229,798  -44.68%   x 1.81
 build_help_short  85,720          85,693                  -27   -0.03%   x 1.00
 parse_clean       23,471          22,859                 -612   -2.61%   x 1.03
 parse_complex     29,535          28,919                 -616   -2.09%   x 1.02
 parse_lots        422,815         414,577              -8,238   -1.95%   x 1.02
```

I've also added a test for #617 to ensure that the output described there is kept unchanged. That is, textwrap will now by default keep horizontal whitespace when wrapping text and this allows you to do rudimentary formatting in the help text by manually inserting whitespace.

[1]: https://crates.io/crates/textwrap

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/kbknapp/clap-rs/845)
<!-- Reviewable:end -->
@homu
Copy link
Contributor

homu commented May 29, 2017

⌛ Testing commit 12282e1 with merge 2f4d3fc...

@mgeisler
Copy link
Contributor Author

Nice! Thanks for sticking with this and putting in all the work 😄 ❤️

My pleasure, it's been a great learning experience!

@kbknapp kbknapp merged commit b93870c into clap-rs:master May 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants