Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created flat tokens as additional output #53

Merged
merged 36 commits into from
Apr 16, 2022
Merged

Conversation

douweschulte
Copy link
Collaborator

Here I created a flat simple token based output, this can be used for syntax colouring and better change diffs. In some cases the output is not very nice yet, especially generics. And sometimes the ID comes peeking out, in structs in enum variants. The code organisation/API is of course open for discussion.

Cargo.toml Outdated Show resolved Hide resolved
@douweschulte
Copy link
Collaborator Author

I would say it is fine to look over the code, especially if you find major architectural improvement points it is nice to know that now. Otherwise I would say that the flat tokens (tokens::Token and tokens::TokenStream) are pretty much final. The rendering code need to support more input and I am thinking of a refactor into something like the D you made, instead of a bunch of functions. But the diff token output is pretty unstable still, I am still changes major things in it.

@douweschulte
Copy link
Collaborator Author

I would say the rendering into tokens code is pretty much done. It now supports all exposed types from rustdoc_types::ItemEnum I tried to make something useful for all, but if you find something can be expanded/changed please tell me. I struggled a bit with the api members I could not easily test (like Impl, ProcMacro and some others) so if you have some tests which expose those that would be really helpful. Otherwise it would maybe be wise to create some test crate which exposes all types of api members?

Copy link
Owner

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After getting more acquainted with your code, I've got the following high-level design question that I would like to understand before I dive deeper into the code.

src/item_iterator.rs Outdated Show resolved Hide resolved
src/intermediate_public_item.rs Show resolved Hide resolved
@Enselic
Copy link
Owner

Enselic commented Apr 4, 2022

Otherwise it would maybe be wise to create some test crate which exposes all types of api members?

That sounds like exactly what the new comprehensive_api in-repo test crate is supposed to accomplish. Run this after doing a git fetch of the latest code to see if that is what you are after:

cargo doc --manifest-path ./tests/crates/comprehensive_api/Cargo.toml --open

I plan on replacing thiserror with another in-repo test crate that has a proc-macro, because those apparently must live in a special crate. But apart from that the API in that crate should be pretty broad.

@douweschulte douweschulte marked this pull request as ready for review April 4, 2022 19:20
Copy link
Owner

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good on a high level I think! Good job! So now let's move down one level of detail, which means looking at the public API of public_items. Let's do a diff (The feeling of meta when doing this never gets tiresome for me):

Click to expand `cargo public-items --diff-git-checkouts main main-douweschulte`
Removed items from the public API
=================================
pub fn public_items::PublicItem::ne(&self, other: &PublicItem) -> bool

Changed items in the public API
===============================
pub fn public_items::PublicItem::cmp(&self, other: &PublicItem) -> $crate::cmp::Ordering
pub fn public_items::PublicItem::cmp(&self, other: &Self) -> std::cmp::Ordering
pub fn public_items::PublicItem::eq(&self, other: &PublicItem) -> bool
pub fn public_items::PublicItem::eq(&self, other: &Self) -> bool
pub fn public_items::PublicItem::partial_cmp(&self, other: &PublicItem) -> $crate::option::Option<$crate::cmp::Ordering>
pub fn public_items::PublicItem::partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering>

Added items to the public API
=============================
pub enum public_items::tokens::ChangedTokenStream
pub enum public_items::tokens::Token
pub enum variant public_items::tokens::ChangedTokenStream::Changed
pub enum variant public_items::tokens::ChangedTokenStream::Same(TokenStream)
pub enum variant public_items::tokens::Token::Function(String)
pub enum variant public_items::tokens::Token::Generic(String)
pub enum variant public_items::tokens::Token::Identifier(String)
pub enum variant public_items::tokens::Token::Keyword(String)
pub enum variant public_items::tokens::Token::Kind(String)
pub enum variant public_items::tokens::Token::Lifetime(String)
pub enum variant public_items::tokens::Token::Primitive(String)
pub enum variant public_items::tokens::Token::Qualifier(String)
pub enum variant public_items::tokens::Token::Self_(String)
pub enum variant public_items::tokens::Token::Symbol(String)
pub enum variant public_items::tokens::Token::Type(String)
pub enum variant public_items::tokens::Token::Whitespace
pub fn public_items::diff::ChangedPublicItem::changed_tokens(&self) -> Vec<ChangedTokenStream>
pub fn public_items::diff::ChangedPublicItem::clone(&self) -> ChangedPublicItem
pub fn public_items::diff::PublicItemsDiff::clone(&self) -> PublicItemsDiff
pub fn public_items::tokens::ChangedTokenStream::clone(&self) -> ChangedTokenStream
pub fn public_items::tokens::ChangedTokenStream::eq(&self, other: &ChangedTokenStream) -> bool
pub fn public_items::tokens::ChangedTokenStream::fmt(&self, f: &mut $crate::fmt::Formatter<'_>) -> $crate::fmt::Result
pub fn public_items::tokens::ChangedTokenStream::ne(&self, other: &ChangedTokenStream) -> bool
pub fn public_items::tokens::Token::clone(&self) -> Token
pub fn public_items::tokens::Token::eq(&self, other: &Token) -> bool
pub fn public_items::tokens::Token::fmt(&self, f: &mut $crate::fmt::Formatter<'_>) -> $crate::fmt::Result
pub fn public_items::tokens::Token::function(text: impl Into<String>) -> Self
pub fn public_items::tokens::Token::generic(text: impl Into<String>) -> Self
pub fn public_items::tokens::Token::identifier(text: impl Into<String>) -> Self
pub fn public_items::tokens::Token::keyword(text: impl Into<String>) -> Self
pub fn public_items::tokens::Token::kind(text: impl Into<String>) -> Self
pub fn public_items::tokens::Token::len(&self) -> usize
pub fn public_items::tokens::Token::lifetime(text: impl Into<String>) -> Self
pub fn public_items::tokens::Token::ne(&self, other: &Token) -> bool
pub fn public_items::tokens::Token::primitive(text: impl Into<String>) -> Self
pub fn public_items::tokens::Token::qualifier(text: impl Into<String>) -> Self
pub fn public_items::tokens::Token::self_(text: impl Into<String>) -> Self
pub fn public_items::tokens::Token::symbol(text: impl Into<String>) -> Self
pub fn public_items::tokens::Token::text(&self) -> &str
pub fn public_items::tokens::Token::type_(text: impl Into<String>) -> Self
pub fn public_items::tokens::TokenStream::clone(&self) -> TokenStream
pub fn public_items::tokens::TokenStream::default() -> TokenStream
pub fn public_items::tokens::TokenStream::eq(&self, other: &TokenStream) -> bool
pub fn public_items::tokens::TokenStream::extend(&mut self, tokens: impl Into<Self>)
pub fn public_items::tokens::TokenStream::fmt(&self, f: &mut $crate::fmt::Formatter<'_>) -> $crate::fmt::Result
pub fn public_items::tokens::TokenStream::fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result
pub fn public_items::tokens::TokenStream::from(token: Token) -> Self
pub fn public_items::tokens::TokenStream::from(tokens: &[Token]) -> Self
pub fn public_items::tokens::TokenStream::from(tokens: Vec<Token>) -> Self
pub fn public_items::tokens::TokenStream::is_empty(&self) -> bool
pub fn public_items::tokens::TokenStream::len(&self) -> usize
pub fn public_items::tokens::TokenStream::ne(&self, other: &TokenStream) -> bool
pub fn public_items::tokens::TokenStream::push(&mut self, token: Token)
pub fn public_items::tokens::TokenStream::remove_from_back(&mut self, len: usize)
pub fn public_items::tokens::TokenStream::tokens(&self) -> impl Iterator<Item = &Token> + '_
pub fn public_items::tokens::TokenStream::tokens_len(&self) -> usize
pub macro public_items::ws!
pub mod public_items::tokens
pub struct public_items::tokens::TokenStream
pub struct field public_items::PublicItem::tokens: tokens::TokenStream
pub struct field public_items::tokens::ChangedTokenStream::Changed::inserted: TokenStream
pub struct field public_items::tokens::ChangedTokenStream::Changed::removed: TokenStream
pub struct field public_items::tokens::TokenStream::tokens: Vec<Token>

Disregarding the parts of the API that is related to diffing (see separate comment about that), one high-level comment I have is that I think it would be good if we could remove all API that allows clients to create tokens themselves. Having such an API vastly complicates upholding backwards compatibility. I think it is good to be as conservative as possible when it comes to the public API, because one of the biggest problems we can get long-term is to get locked in to a public API that we are not happy with because it prevents us from improving the library further.

This is obviously just my thoughts, and I am very open to being convinced otherwise. But I don't really see a need for clients to create tokens themselves. But you do, perhaps?

src/diff.rs Outdated Show resolved Hide resolved
@douweschulte
Copy link
Collaborator Author

Yes I do agree that we need to remove all token creation APII, I will scrutinize the API diff and remove all code related to that.

@douweschulte
Copy link
Collaborator Author

I pub(crate)ed everything that is not necessary outside this crate. I used the TokenStream::extend function in the crate-public-api to pad out the stream with whitespace. So I could also write a helper method for padding. Downstream users will still be able to crate tokens, but that is needed for nice matching so I would not worry about that. If wanted there could be a private phantomdata item added to TokenStream to keep downstream users from being able to create an instance.

Note the missing PublicItem::ne will fall back to the default implementation from the trait which is fine in this case.

cargo public-items --diff-git-checkouts upstream/main main
Removed items from the public API
=================================
-pub fn public_items::PublicItem::ne(&self, other: &PublicItem) -> bool

Changed items in the public API
===============================
- pub fn public_items::PublicItem::cmp(&self, other: &PublicItem) -> crate::cmp::Ordering
+                                       self          Self           std
- pub fn public_items::PublicItem::eq(&self, other: &PublicItem) -> bool
+                                      self          Self
- pub fn public_items::PublicItem::partial_cmp(&self, other: &PublicItem) -> crate::option::Option<crate::cmp::Ordering>
+                                               self          Self                                 std

Added items to the public API
=============================
+pub struct field public_items::PublicItem::tokens: TokenStream
+pub fn public_items::diff::ChangedPublicItem::clone(&self) -> ChangedPublicItem
+pub fn public_items::diff::PublicItemsDiff::clone(&self) -> PublicItemsDiff
+pub mod public_items::tokens
+pub enum public_items::tokens::Token
+pub enum variant public_items::tokens::Token::Function(String)
+pub enum variant public_items::tokens::Token::Generic(String)
+pub enum variant public_items::tokens::Token::Identifier(String)
+pub enum variant public_items::tokens::Token::Keyword(String)
+pub enum variant public_items::tokens::Token::Kind(String)
+pub enum variant public_items::tokens::Token::Lifetime(String)
+pub enum variant public_items::tokens::Token::Primitive(String)
+pub enum variant public_items::tokens::Token::Qualifier(String)
+pub enum variant public_items::tokens::Token::Self_(String)
+pub enum variant public_items::tokens::Token::Symbol(String)
+pub enum variant public_items::tokens::Token::Type(String)
+pub enum variant public_items::tokens::Token::Whitespace
+pub fn public_items::tokens::Token::clone(&self) -> Token
+pub fn public_items::tokens::Token::eq(&self, other: &Token) -> bool
+pub fn public_items::tokens::Token::fmt(&self, f: &mut crate::fmt::Formatter<'_>) -> crate::fmt::Result
+pub fn public_items::tokens::Token::len(&self) -> usize
+pub fn public_items::tokens::Token::ne(&self, other: &Token) -> bool
+pub fn public_items::tokens::Token::text(&self) -> &str
+pub struct public_items::tokens::TokenStream
+pub fn public_items::tokens::TokenStream::clone(&self) -> TokenStream
+pub fn public_items::tokens::TokenStream::default() -> TokenStream
+pub fn public_items::tokens::TokenStream::eq(&self, other: &TokenStream) -> bool
+pub fn public_items::tokens::TokenStream::extend(&mut self, tokens: impl Into<Self>)
+pub fn public_items::tokens::TokenStream::fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result
+pub fn public_items::tokens::TokenStream::fmt(&self, f: &mut crate::fmt::Formatter<'_>) -> crate::fmt::Result
+pub fn public_items::tokens::TokenStream::from(token: Token) -> Self
+pub fn public_items::tokens::TokenStream::from(tokens: Vec<Token>) -> Self
+pub fn public_items::tokens::TokenStream::from(tokens: &[Token]) -> Self
+pub fn public_items::tokens::TokenStream::is_empty(&self) -> bool
+pub fn public_items::tokens::TokenStream::len(&self) -> usize
+pub fn public_items::tokens::TokenStream::ne(&self, other: &TokenStream) -> bool
+pub fn public_items::tokens::TokenStream::tokens(&self) -> impl Iterator<Item = &Token> + '_
+pub struct field public_items::tokens::TokenStream::tokens: Vec<Token>
+pub fn public_items::tokens::TokenStream::tokens_len(&self) -> usize
+pub macro public_items::ws!

Besides that I created a single commit which removes all code related to token diffing. I am not really sure how well git will handle stuff like this if now both PRs are merged. If you know a better way to handle this please let me know.

@douweschulte
Copy link
Collaborator Author

All tests now pass except for the two that take in the output from a different file. These fail for some reason while the diff between both outputs (according to the test helper) is nothing.

Copy link
Owner

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be able to go through it all at once, so here are a couple of early comments. I plan on leaving more comments later as I go along doing the review.

README.md Outdated Show resolved Hide resolved
src/tokens.rs Show resolved Hide resolved
Copy link
Owner

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think tests are failing a bit too much for me to do a very detailed review. Do you think you could make all tests pass before the next review round? Let me know if you think that would be too much work. Again, no stress, it is fine if it takes you a while.

I feel like maybe we can simplify render.rs a bit? But before all tests pass there is no point in trying to figure out how, because before tests pass we do not know under what constraints we can refactor.

src/item_iterator.rs Outdated Show resolved Hide resolved
src/tokens.rs Outdated Show resolved Hide resolved
Co-authored-by: Martin Nordholts <enselic@gmail.com>
@douweschulte
Copy link
Collaborator Author

douweschulte commented Apr 8, 2022

I thought I got all tests to pass, but I missed all the nice ones. Because two were randomly crashing in bin_lib I never got to the nice ones in lib_test. I fixed all except some in print_public_items, print_public_items_with_blanket_implementations, and comprehensive_api for which I need to do some more work on the generics rendering.

Points to go over:

  • For structs I render them as a comma separated list of all item names, do you agree to handle it this way?
  • I deliberately deleted the $ from $crate because that is not what a user would see in their own IDE and replaced it with a Token that is different from the standard, so it retains the information in the TokenStream.

And I do agree to keep the sort order out of this PR otherwise the output from the tests is just unusable.

The diff, with only the changed lines highlighted

Preview looks terrible if you click to enlarge it looks fine
image

@Enselic
Copy link
Owner

Enselic commented Apr 9, 2022

  • For structs I render them as a comma separated list of all item names, do you agree to handle it this way?

I'm not a fan of that to be honest, because it will make diffing more noisy. In general I like diffs to be on an item-level. Ideally also when doing simple text-based diffs. I am open to changing that later, but if it is OK with you I would like to not do it in this PR. Also of importance: cargo doc HTML does not display structs with their fields like that, and I consider it quite important that our output is the same as what cargo doc has, so that users feels familiar with the output. (I do know that tuple structs already show fields as part of the name, but that's what cargo doc also does.)

  • I deliberately deleted the $ from $crate because that is not what a user would see in their own IDE and replaced it with a Token that is different from the standard, so it retains the information in the TokenStream.

Our "gold standard" is cargo doc HTML output, and it turns out they do not render $crate. So long term we should render it the same. Short term however, I'm not entirely comfortable to change it just slightly. I am open to discuss and maybe that later, but for this PR I think it is good if we keep output the same as before, unless it is too much work.

Apart from the above it looks like it's only generics that indeed needs some tweaking.

Great work so far! Looking forward to your next update!

(I also changed repo settings now so that CI should trigger for you automatically. There's also scripts/run-ci-locally.sh to run CI locally, but your probably remember that.)

@@ -1,2 +1,2 @@
pub mod thiserror
pub proc macro thiserror::Error!
pub proc macro thiserror::#[derive(Error)]
Copy link
Owner

@Enselic Enselic Apr 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I created a conflict with you here with #68

I try to not create conflicts but forgot to double-check this one

Fortunately it should be easy for you to resolve; change from

pub proc macro comprehensive_api_proc_macro::SimpleDeriveMacro!

to

pub proc macro comprehensive_api_proc_macro::#[derive(SimpleDeriveMacro)]

in ./tests/expected_output/comprehensive_api_proc_macro.txt

Let me know if you would like me try to push a commit to this PR that resolves the conflict.

(The reason I work towards removing all .json from the repo is that that makes it easier to rename the lib to public-api later.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The reason I work towards removing all .json from the repo is that that makes it easier to rename the lib to public-api later.)

Just a note (also mentioned in a separate comment): I now feel done with a big restructuring on how tests are run. Mainly stopping to use public_items itself for tests, because that becomes a real pain when we will rename the lib to public-api. The PR is at #70, but I will not merge it before this PR is merged, to avoid causing more troubles for you. Again, I am very open to resolving the thiserror-1.0.30.txt-conflict for you, if you wish. It is not fair for you to solve problems caused by me :)

@@ -6,7 +6,27 @@ fn print_public_items() {
let mut cmd = Command::cargo_bin("public_items").unwrap();
cmd.arg("./tests/rustdoc_json/public_items-v0.4.0.json");
cmd.assert()
.stdout(include_str!("./expected_output/public_items-v0.4.0.txt"))
Copy link
Owner

@Enselic Enselic Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a bit of repetition here: It would be great if you could avoiding changing "expected output" (here for example). Not only to avoid conflicts with #70 (which I will wait with merging until this PR is merged though), but also because good practice in and of itself, when doing big changes.

Edit: I am fine with changing Error! to #[derive(Error)] though, because that is such a good change.

@@ -1,2 +1,2 @@
pub mod thiserror
pub proc macro thiserror::Error!
pub proc macro thiserror::#[derive(Error)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The reason I work towards removing all .json from the repo is that that makes it easier to rename the lib to public-api later.)

Just a note (also mentioned in a separate comment): I now feel done with a big restructuring on how tests are run. Mainly stopping to use public_items itself for tests, because that becomes a real pain when we will rename the lib to public-api. The PR is at #70, but I will not merge it before this PR is merged, to avoid causing more troubles for you. Again, I am very open to resolving the thiserror-1.0.30.txt-conflict for you, if you wish. It is not fair for you to solve problems caused by me :)

@Enselic
Copy link
Owner

Enselic commented Apr 15, 2022

@douweschulte Hi! I developed a commit on top of this PR that makes all tests pass and that takes care of my own code review comments. Is it OK if I push it to this PR and then merge this PR? We can then of course iterate on the code further, but it is much nicer to do that in small increments in small PRs rather than have one big PR open :)

(Note that the commit does a "revert" of changes to expected output. To diff expected output, it is better to diff against v0.8.0 than towards the parent of my commit.)

I hope I don't come across as "stealing" your work. Or cause other problems for you. Just trying to be helpful 🙂 This was a great way for me to get some hands on experience with your new code though. Please let me know if you think I am doing anything wrong here!

If you want to take my code through some code review rounds that is perfectly fine of course. And if you want me to merge my code with in-progress code you have locally, that is also perfectly fine. Or discard my code entirely if you were almost done locally. Just let me know whatever your preferences are.

@douweschulte
Copy link
Collaborator Author

Nice work! I do agree that small PRs are better here once this beast has been merged. I think it would be fine to merge and gather up all small leads in separate PRs. I was planning on doing this work this weekend but the fact that you build it is a pleasant surprise. This huge beast of a PR was getting a bit hard to fully understand every detail of all the time. And I got a surprise load of work to do in other projects this week.

So I would propose to merge this, I will then gather all loose ends from my side and open issues/PRs respectively to continue our discussion there. As I said I got some work to do on other projects as well so I maybe will be a bit less active for a while, but I want to continue work here as well.

@Enselic
Copy link
Owner

Enselic commented Apr 16, 2022

@douweschulte Great! I will merge this PR then, and shortly after I will also merge #70. Stay tuned :)

@Enselic Enselic merged commit fdf7acb into Enselic:main Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants