Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make use of bat as a library #116

Closed
dandavison opened this issue Mar 22, 2020 · 12 comments
Closed

Make use of bat as a library #116

dandavison opened this issue Mar 22, 2020 · 12 comments

Comments

@dandavison
Copy link
Owner

dandavison commented Mar 22, 2020

See https://github.com/sharkdp/bat/releases/tag/v0.13.0

Currently, delta vendors some bat code, e.g. related to paging and custom themes/syntax definitions.

@dandavison dandavison changed the title Make use of bat library where possible instead of vendoring bat code Make use of bat as a library Mar 22, 2020
@MarcoIeni
Copy link
Contributor

I will try to work on this in my spare time. Let's see what I can do :)

@dandavison
Copy link
Owner Author

Hi @MarcoIeni, that sounds great!

@MarcoIeni
Copy link
Contributor

Started with some clippy warnings fix. See #380

Anyway I think that the best strategy might be to keep the bat/ directory and incrementally substitute parts of its code with the bat external library.

My first approach was to remove the bat/ directory entirely and try to replace it with the bat library, but I failed because the library doesn't do everything.
In fact, there will be some code that will remain in the bat/ directory, because the bat library doesn't expose everything (for example the function bat::assets::list_languages).

So we will have both use crate::bat (the bat/ dir) and use bat (the actual bat library added in cargo.toml) mixed in the code. I will try to keep things clean as much as possible. In the future we might think to remane the bat/ dir to something like bat_utils to avoid confusion.

@dandavison
Copy link
Owner Author

dandavison commented Nov 5, 2020

Yes, that all makes sense. I was aware that some things aren't exposed, and also that there are some delta-specific changes in src/bat/output.rs that may make it difficult to avoid code duplication there?

bat_utils sounds like a good name for that directory.

@MarcoIeni
Copy link
Contributor

MarcoIeni commented Nov 7, 2020

From what I understood this is the function that prints to output:

delta/src/paint.rs

Lines 438 to 442 in b149425

pub fn emit(&mut self) -> std::io::Result<()> {
write!(self.writer, "{}", self.output_buffer)?;
self.output_buffer.clear();
Ok(())
}

So here I should call the bat print() function as shown in the docs example.
Instead of self.writer we should have a bat::PrettyPrinter somehow.

@dandavison
Copy link
Owner Author

dandavison commented Nov 8, 2020

Hi @MarcoIeni, I'm worried that this ticket is misleading and/or that I've been slow to give input! Delta cannot use bat for syntax highlighting. The places where I think delta may be able to use bat are related to (a) handling of the sublime syntax and theme binary assets that the bat project extremely helpfully makes available, and (b) handling of the pager process that delta, just like bat, spawns to receive its output. So that is the code that is vendored under the bat/ directory, that originated as a licensed direct copy from bat's source, and which has diverged slightly since. I am not sure whether the bat library currently exposes the necessary functionality for these; but if not, that wouldn't be terrible as it's a fairly small part of the code base.

The reason I say delta cannot use bat for syntax highlighting is this: the fundamental task that delta has to do when it is colorizing a diff hunk is

  1. Take in a diff hunk in some language
  2. For the plus and minus lines, compute annotations of the string representing the foreground color syntax highlighting (using the syntect library, which bat also uses)
  3. For the plus and minus lines, compute annotations of the string representing the background colors that delta uses to represent plus/minus/emph etc
  4. Interleave those streams of annotations to form strings with the appropriate sequence of foreground and background colors
  5. Convert this data structure to ANSI escape sequences for printing, possibly with line numbers, or in side-by-side format.

So of course, (2), (3), and (4) all has to be done at the level of data structures representing color-annotated strings; the similarity with bat here is that bat uses syntect, and delta also does at phase (2).

@MarcoIeni
Copy link
Contributor

The original ticket was not very detailed, but you are not slow to give input, in fact you are really fast :)

Anyway it was a good opportunity to explore the both bat and delta source code :)

Now it's clearer. I will try to extract the parts of code exposed by the bat library, even if I don't think they are a lot.

@MarcoIeni
Copy link
Contributor

This is the maximum I achieved to extract in a clean way: MarcoIeni@1b9c6e3

I am not doing a PR because it's too little.

Somehow you can extract how the syntax_set is evaluated, but it would be an hack.

pub syntax_set: SyntaxSet,

I think that the right thing to do here is to extract the common code from the bat project and create another library, that both bat and delta will use.
For example, bat could have this library in their repo, like ripgrep does with ignore and all their other crates.

What do you think? If you agree we can create an issue in the bat project.

@dandavison
Copy link
Owner Author

I think that the right thing to do here is to extract the common code from the bat project and create another library, that both bat and delta will use. For example, bat could have this library in their repo, like ripgrep does with ignore and all their other crates.

I think I'm not quite understanding - why create separate crates rather than just exposing public functions in the existing bat crate?

What do you think?

The binary assets in particular, along with associated code for interacting with them, do seem like a resource which could be valuable to multiple projects. Can/should a rust crate contain large amounts of data like that?

Of course, everything bat exposes publicly is a source of future maintenance work for bat developers, and restricts their ability to make backwards incompatible changes. And delta's paging code has already diverged slightly so might not be easy to abstract. So there's a question about whether it is worth it, or whether it's fine for paging and binary asset interaction to be implemented independently (albeit from a common starting point) in delta.

@MarcoIeni
Copy link
Contributor

MarcoIeni commented Nov 8, 2020

I think I'm not quite understanding - why create separate crates rather than just exposing public functions in the existing bat crate?

Because you separate concerns:

  • bat users who just want to print formatted output will not be confused by all those other functionalities. If a library has a smaller interface is better.
  • delta will not depend on the whole bat library, but just on a smaller library, so compilation will be faster for example.

Can/should a rust crate contain large amounts of data like that?

I think that bat already does it

And delta's paging code has already diverged slightly so might not be easy to abstract.

We should identify common code and abstract only that part.
It may be worth it if the library has a useful purpose, because if we need it, maybe it would be useful to other rust projects, too.
Furthermore we avoid code duplication, so if bat or delta fixes a bug or add a functionality, than the other project benefits from it.

Edit: Of course you have a better understanding on this project, so it's up to you to decide if there is enough common code among the two libraries that may be useful to some purpose, in order to extract it to another crate :)

@dandavison
Copy link
Owner Author

In the future we might think to rename the bat/ dir to something like bat_utils to avoid confusion.

Thanks for that suggestion, I've just done that in master e527a69

@dandavison
Copy link
Owner Author

We do this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants