New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor formatting code #189
Conversation
e406d4d
to
8e8da2b
Compare
@dbrgn I bumped MSRV to 1.52 here because I wanted to use |
Tests are complaining about highlighting, but I think it looks fine. I checked the binary output and it seems that the current implementation outputs more color modifiers than what I refactored it to (left / "stable.out" is current impl, right / "master.out" is this PR) $ du *.out --bytes
1471 master.out
1542 stable.out Can someone take a look that knows exactly what is important for the coloring? It looks just fine to me 😄 @SimplyDanny |
ac52725
to
e4c23eb
Compare
c395989
to
f187714
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The colorization looks fine for what I have checked so far. I guess, the reference files can just be updated.
Some suggestions:
0b30256
to
de5f461
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some more ideas ...
fd2a39f
to
8a570f2
Compare
e323830
to
25cde36
Compare
@dbrgn @SimplyDanny I think I managed to include everything I had in mind, your turn :) |
@dbrgn In case you feel like merging this PR, please make sure not to squash the history, so that we can keep the documentation I put in the commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a clear improvement.
However, with all the ad-hoc parsing in multiple passes (first we detect line types in the tokenizer, then we detect variables in highlihgt_code
, then we detect the code name in highlight_code_segment
I wonder if in the long term it would be better if the tokenizer (now line iterator) were more powerful and could emit more fine-grained tokens on the first pass. We would need up to len(command_name)
lookahead, but that's not an issue at all. This way, we would not need to differentiate between LineType
tokens and HighlightingSnippet
anymore (they are kind of similar if you give up the line based approach that I used initially for simplicity).
The main reason for splitting tokenization and output formatting in parsers is that the processing of the input data is separated from the formatting of the output. Right now we process input data both in the tokenizer and in the formatter, which isn't really the cleanest approach.
(However, that was just an observation and does not affect this PR.)
@dbrgn In case you feel like merging this PR, please make sure not to squash the history, so that we can keep the documentation I put in the commits
Sure, which such clean commit descriptions there's no need for squashing :) If you want, you could squash the commit "Remove unneeded imports in main.rs" into the first commit ("Add FindFrom extension trait to str to start searching at a byte offset") which added those imports in the first place.
5e679f2
to
2515abb
Compare
2515abb
to
df100dd
Compare
I think only the renamings and a rebase against |
…offset. This also moves the contents of `dedup.rs` into a shared module `extensions`.
This includes - adding the `HighlightingSnippet` enum as a common ground for highlighting and printing code to communicate - decomposing `print_lines` accordingly - clearing up `highlight_code_segment` (previously `highlight_command`) - adding unit tests (now that they can reason about `HighlightingSnippet`s instead of having to output on the integration test level
This also replaces the `next_token` method with the `next` method from the `Iterator` trait.
This moves `print_page` from `main.rs` and `print_snippet` from `formatter.rs` into a new file `output.rs`. To decompose `print_snippet` from `print_lines`, the latter now takes the `yield_snippet` callable as an argument (similar to how the helper methods got a hold of it). In order for this to work, callers have to provide a function that is generic over all possible snippet lifetimes.
As someone reading this file for the first time, I would want to see the only public and most general function first and find the specifics further down instead of having to look for the "module entry" first.
Although this did not fail on old master, it is still an important case to test. Actually, this test doesn't even break when introducing some errors that fail i18n unit tests, unless the command name itself is non-ascii.
- Rename: `yield_snippet` => `process_snippet` - Rename: `HighlightingSnippet` => `PageSnippet`
df100dd
to
3ac087a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thank you!
See this discussion.
This aims to introduce some abstractions where needed and refactor code that has accumulated complexity over time