Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor formatting code #189

Merged
merged 8 commits into from Sep 12, 2021

Conversation

niklasmohrin
Copy link
Collaborator

See this discussion.

This aims to introduce some abstractions where needed and refactor code that has accumulated complexity over time

@niklasmohrin niklasmohrin force-pushed the simplify_highlight_commands branch 2 times, most recently from e406d4d to 8e8da2b Compare May 10, 2021 21:50
@niklasmohrin
Copy link
Collaborator Author

@dbrgn I bumped MSRV to 1.52 here because I wanted to use str::split_once. Like I said, I don't really see why we would have to support older versions of the compiler. Are you okay with that or do you see any problems?

@niklasmohrin
Copy link
Collaborator Author

Tests are complaining about highlighting, but I think it looks fine. I checked the binary output and it seems that the current implementation outputs more color modifiers than what I refactored it to (left / "stable.out" is current impl, right / "master.out" is this PR)

image

$ du *.out --bytes
1471    master.out
1542    stable.out

Can someone take a look that knows exactly what is important for the coloring? It looks just fine to me 😄 @SimplyDanny

@niklasmohrin niklasmohrin force-pushed the simplify_highlight_commands branch 2 times, most recently from ac52725 to e4c23eb Compare May 10, 2021 22:44
@niklasmohrin niklasmohrin requested a review from dbrgn May 10, 2021 22:44
@niklasmohrin niklasmohrin force-pushed the simplify_highlight_commands branch 2 times, most recently from c395989 to f187714 Compare May 10, 2021 22:51
Copy link
Contributor

@SimplyDanny SimplyDanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The colorization looks fine for what I have checked so far. I guess, the reference files can just be updated.

Some suggestions:

src/formatter.rs Outdated Show resolved Hide resolved
src/formatter.rs Outdated Show resolved Hide resolved
src/formatter.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@SimplyDanny SimplyDanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some more ideas ...

src/formatter.rs Outdated Show resolved Hide resolved
src/formatter.rs Outdated Show resolved Hide resolved
src/formatter.rs Outdated Show resolved Hide resolved
@niklasmohrin niklasmohrin force-pushed the simplify_highlight_commands branch 2 times, most recently from fd2a39f to 8a570f2 Compare May 21, 2021 10:50
@niklasmohrin niklasmohrin marked this pull request as ready for review June 2, 2021 20:40
@niklasmohrin
Copy link
Collaborator Author

@dbrgn @SimplyDanny I think I managed to include everything I had in mind, your turn :)

@niklasmohrin
Copy link
Collaborator Author

@dbrgn In case you feel like merging this PR, please make sure not to squash the history, so that we can keep the documentation I put in the commits

Copy link
Owner

@dbrgn dbrgn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a clear improvement.

However, with all the ad-hoc parsing in multiple passes (first we detect line types in the tokenizer, then we detect variables in highlihgt_code, then we detect the code name in highlight_code_segment I wonder if in the long term it would be better if the tokenizer (now line iterator) were more powerful and could emit more fine-grained tokens on the first pass. We would need up to len(command_name) lookahead, but that's not an issue at all. This way, we would not need to differentiate between LineType tokens and HighlightingSnippet anymore (they are kind of similar if you give up the line based approach that I used initially for simplicity).

The main reason for splitting tokenization and output formatting in parsers is that the processing of the input data is separated from the formatting of the output. Right now we process input data both in the tokenizer and in the formatter, which isn't really the cleanest approach.

(However, that was just an observation and does not affect this PR.)

@dbrgn In case you feel like merging this PR, please make sure not to squash the history, so that we can keep the documentation I put in the commits

Sure, which such clean commit descriptions there's no need for squashing :) If you want, you could squash the commit "Remove unneeded imports in main.rs" into the first commit ("Add FindFrom extension trait to str to start searching at a byte offset") which added those imports in the first place.

src/formatter.rs Outdated Show resolved Hide resolved
src/formatter.rs Outdated Show resolved Hide resolved
src/line_iterator.rs Show resolved Hide resolved
src/formatter.rs Outdated Show resolved Hide resolved
src/formatter.rs Show resolved Hide resolved
src/formatter.rs Outdated Show resolved Hide resolved
@dbrgn
Copy link
Owner

dbrgn commented Sep 7, 2021

I think only the renamings and a rebase against master are left until we can merge this!

@dbrgn dbrgn added the waiting-for-author The PR needs an update before it can be considered for merging. label Sep 7, 2021
…offset.

This also moves the contents of `dedup.rs` into a shared module `extensions`.
This includes
- adding the `HighlightingSnippet` enum as a common ground for
  highlighting and printing code to communicate
- decomposing `print_lines` accordingly
- clearing up `highlight_code_segment` (previously `highlight_command`)
- adding unit tests (now that they can reason about
  `HighlightingSnippet`s instead of having to output on the integration
  test level
This also replaces the `next_token` method with the `next` method from
the `Iterator` trait.
This moves `print_page` from `main.rs` and `print_snippet` from
`formatter.rs` into a new file `output.rs`. To decompose `print_snippet`
from `print_lines`, the latter now takes the `yield_snippet` callable as
an argument (similar to how the helper methods got a hold of it). In
order for this to work, callers have to provide a function that is
generic over all possible snippet lifetimes.
As someone reading this file for the first time, I would want to see the
only public and most general function first and find the specifics
further down instead of having to look for the "module entry" first.
Although this did not fail on old master, it is still an important case
to test. Actually, this test doesn't even break when introducing some
errors that fail i18n unit tests, unless the command name itself is
non-ascii.
- Rename: `yield_snippet` => `process_snippet`
- Rename: `HighlightingSnippet` => `PageSnippet`
@niklasmohrin niklasmohrin added ready-for-review The PR can be reviewed and is waiting for the maintainers. and removed waiting-for-author The PR needs an update before it can be considered for merging. labels Sep 8, 2021
@dbrgn dbrgn added this to the v1.5.0 milestone Sep 12, 2021
Copy link
Owner

@dbrgn dbrgn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thank you!

@dbrgn dbrgn merged commit a811788 into dbrgn:master Sep 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-review The PR can be reviewed and is waiting for the maintainers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants