Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse impl blocks, type and const declarations #28

Merged
merged 15 commits into from Aug 15, 2022

Conversation

Bromeon
Copy link
Collaborator

@Bromeon Bromeon commented Jun 28, 2022

This pull request takes #27 as a base.

Supports parsing of two impl block schemes:

  • Inherent: impl Type { ... }
  • Trait: impl Trait for Type { ... }

Each block can contain members of three different categories:

  • functions
  • constants
  • associated types

Examples:

// Inherent impl
impl<'a, T: Clone, const N: i8> MyStruct<'a, T, N> {}

// Trait impl
impl MyTrait for MyStruct {
    pub type MyType = std::string::String;

    fn new(i: i32, b: bool) -> Self {
        Self {}
    }

    #[attr]
    const fn set_value(&mut self, s: String) {}

    const CONSTANT: i8 = 24 + 7;
}

There are a few open questions on my side:

  1. I plan to also add support for traits. Regarding some of the new API types I added, there's the option to reuse them (forcing users to unwrap options) or to make dedicated, type-safe separate ones for trait and impl contexts. What is your preference? This would also affect the naming (e.g. I'm not happy with TypeDefinition).
    Examples:

    • Trait: const CONSTANT: i8;
      Impl: const CONSTANT: i8 = value;
      either we have 1 type representing both, with optional = and value, or 2 separate types.
    • Trait: type AssocTy: Clone; -- can have bounds, but (at the moment) no default type initializers
      Impl: type AssocTy = MyType; -- must have type initializers, no bounds
  2. I was thinking to add GenericArg/GenericArgList to represent <'a, T, U, N> -- in addition to GenericParam/GenericParamList for <'a: 'static, T, U: Clone, const N: usize>. Would that make sense?
    If yes, we could probably remove InlineGenericArgs and directly allow to convert from parameters (unless you say we can't afford that copy).

  3. Is there a reason why the Debug impl for GenericParam does not include the tk_prefix? Might be good to check if it's a lifetime, a const or a type.

    venial/src/types.rs

    Lines 486 to 493 in b490dd7

    impl std::fmt::Debug for GenericParam {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
    let mut f = f.debug_struct("GenericParam");
    f.field("name", &self.name.to_string());
    f.field("bound", &self.bound);
    f.finish()
    }
    }

  4. There might be some room to make the naming more consistent (e.g. FunctionParameter vs. GenericParam) as well as other smaller refactorings, but it probably makes more sense to work out a proposal to parse trait first 🙂

@Bromeon
Copy link
Collaborator Author

Bromeon commented Jun 30, 2022

I realized that the previous structures couldn't represent some syntax; concretely:

  • Ident is not enough for the Trait and Self parts in impl Trait for Self.
    These can be arbitrarily long type paths, or even non-ident things such as ().
  • GenericParam cannot support anything beyond simple idents, while generic arguments can be arbitrarily complex types.

So I just used TyExpr. Since that just leaves verbatim tokens, I added a convenience function TyExpr::as_path() that can (partially) parse path::to::Type<other::Arg, 32, 'a>. We don't yet have a type representing generic arguments, so the argument list is left verbatim.

After some feedback I can also clean up the commits a bit.

@Bromeon Bromeon force-pushed the feature/impl branch 3 times, most recently from bd0916f to c058207 Compare June 30, 2022 22:40
};

let impl_decl = parse_declaration_checked(expr);
assert_debug_snapshot!(impl_decl);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using assert_quote_snapshot ? (defined higher in the file)

src/types.rs Outdated Show resolved Hide resolved
src/types.rs Outdated Show resolved Hide resolved
src/parse_impl.rs Outdated Show resolved Hide resolved
src/tests.rs Show resolved Hide resolved
@PoignardAzur
Copy link
Owner

I plan to also add support for traits. Regarding some of the new API types I added, there's the option to reuse them (forcing users to unwrap options)

I'd lean towards re-using them.

Note that venial aims to cover anything that is syntactically valid, even if it's not semantically valid. For instance, this compiles:

#[cfg(FALSE)]
const X: i32;

I was thinking to add GenericArg/GenericArgList to represent <'a, T, U, N> -- in addition to GenericParam/GenericParamList for <'a: 'static, T, U: Clone, const N: usize>. Would that make sense?
If yes, we could probably remove InlineGenericArgs and directly allow to convert from parameters (unless you say we can't afford that copy).

I guess? I haven't really considered it.

I think we'd want names that differentiate between the two less ambiguously. I like InlineGenericArgs, but I'm open to suggestions.

Is there a reason why the Debug impl for GenericParam does not include the tk_prefix? Might be good to check if it's a lifetime, a const or a type.

I don't remember. Probably not.

Supports parsing of two impl block schemes:
* impl Type { ... }
* impl Trait for Type { ... }

Each block can contain members of three different categories:
* functions
* constants
* associated types
@Bromeon
Copy link
Collaborator Author

Bromeon commented Aug 2, 2022

@PoignardAzur Sorry for the delay. In my initial implementation, a lot of stuff was "half-baked". By doing things a bit more properly, I realized that several things were missing and took the chance to add quite a bit of functionality in the process, while also addressing your feedback 😬 I apologize if this PR got a bit too big.

New Features in this PR:

  • Top-level const, type and impl declarations
  • const, type and fn declarations inside impl blocks
  • Path parsing, new types Path and PathSegment
    • TyExpr::as_path() lazily constructs a path from a type when needed
  • Generic argument parsing, new types GenericArg + GenericArgList
    • InlineGenericArgs::to_owned_args() to convert to GenericArgList
  • ValueExpr type that behaves like TyExpr
    • I'm not sure if this is needed. There are scenarios like generic arguments, where I cannot differentiate between the two, since venial doesn't do expression parsing. Should we merge them (under which name?) or just use TyExpr everywhere?

Also added tests for everything.

I left the commits split, so it's easier for you to follow the changes. I can squash them to only a handful if you prefer.

@Bromeon Bromeon marked this pull request as ready for review August 2, 2022 19:32
@Bromeon Bromeon changed the title Parse impl blocks Parse impl blocks, type and const declarations Aug 2, 2022
@PoignardAzur
Copy link
Owner

Will take a look soon.

@PoignardAzur
Copy link
Owner

PoignardAzur commented Aug 7, 2022

FYI, I'm going to vacation on wednesday (well, kind of, but the gist is I'll have less time for open source) but I'd like to review this before.

Ping me tomorrow if I haven't gone back to you by then.

@Bromeon
Copy link
Collaborator Author

Bromeon commented Aug 8, 2022

@PoignardAzur the day is not over yet, but here's a ping, to not push it too far into the evening 😉

Copy link
Owner

@PoignardAzur PoignardAzur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this all seems like very high-quality code.

The biggest problems are the incomplete parsers for type and const declarations, but it's always better than no implementation at all, so I wouldn't mind merging it now.

Do you want me to give you write access to the repository? Since I don't intend to maintain the project in the coming months, you might want a tighter feedback loop. And while I made a lot of small nitpicks, overall the quality of your contributions seems high enough I don't need to double-check everything.

src/parse_fn.rs Outdated
/// Panics when the following tokens do not constitute a function definition, with one exception:
/// when `const` is followed by an identifier which is not `fn`, then `None` is returned. This is to
/// allow fallback to a constant declaration (both can begin with the `const` token).
pub(crate) fn try_consume_fn(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be called consume_fn.

Overall, the naming convention so far has been that consume_xxx functions in venial return None if the syntax item is absent, and panic if the beginning of the syntax item is present but the expect continuation isn't.

Eg: consume_where_clause will return None if it doesn't find a where token, but will panic if it gets where "hello world".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/parse_impl.rs Outdated Show resolved Hide resolved
}
}

pub(crate) fn parse_impl_members(token_group: Group) -> (Vec<ImplMember>, GroupSpan) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This misses inner attributes, eg:

impl Foobar {
    #![hello]
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented.

Also renamed tk_hashbang to tk_hash, which is more correct.

src/parse_type.rs Outdated Show resolved Hide resolved
src/parse_type.rs Outdated Show resolved Hide resolved
src/parse_utils.rs Outdated Show resolved Hide resolved
src/tests.rs Outdated
Comment on lines 829 to 836
fn parse_fn_mut_param() {
let func = parse_declaration(quote! {
fn prototype(a: i32, mut b: f32) -> String;
})
.unwrap();

assert_debug_snapshot!(func);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick: I'd rather these tests be a bit more consistent with existing tests:

  • Use assert_quote_snapshot
  • Use parse_declaration_checked
  • Use quote! with parentheses
  • Add a consume_generic_args_checked function that does the same thing as parse_declaration_checked

Though none of these are blockers, it's more for the code aesthetic ^^

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, except assert_quote_snapshot, here I wasn't sure what exactly you meant.

This is already part of parse_declaration_checked, no? assert_debug_snapshot seems more meaningful for the "data model" representation (and not the token one). Unless this is very important, I'd prefer if this one could be postponed, otherwise this PR is never-ending 🙂

Comment on lines 49 to 58
let tk_equals = match tokens.next() {
Some(TokenTree::Punct(punct)) if punct.as_char() == '=' => punct,
_ => panic!("cannot parse constant"),
};

let value_tokens = consume_stuff_until(
tokens,
|tt| matches!(tt, TokenTree::Punct(punct) if punct.as_char() == ';'),
true,
);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constants without = xxx are syntactically valid and should be accepted.

Copy link
Collaborator Author

@Bromeon Bromeon Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented, the = and value are now optional:

pub struct Constant {
    pub attributes: Vec<Attribute>,
    pub vis_marker: Option<VisMarker>,
    pub tk_const: Ident,
    pub name: Ident,
    pub tk_colon: Punct,
    pub ty: TyExpr,
    /// The '=' sign is optional; constants without initializer are syntactically valid.
    pub tk_equals: Option<Punct>,
    /// The initializer value is optional; constants without initializer are syntactically valid.
    pub initializer: Option<ValueExpr>,
    pub tk_semicolon: Punct,
}

Also added a test for this case.

Comment on lines 80 to 84
pub(crate) fn consume_assoc_ty(
tokens: &mut TokenIter,
attributes: Vec<Attribute>,
vis_marker: Option<VisMarker>,
) -> TyDefinition {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called consume_type_definition. If called from the top level, it might parse a type alias, not an associated type.

(Also, for consistency, it should arguably return Option)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed and optionalized 🙂

Comment on lines +95 to +98
let tk_equals = match tokens.next() {
Some(TokenTree::Punct(punct)) if punct.as_char() == '=' => punct,
_ => panic!("cannot parse associated type"),
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that every type definition is of the form type Foo = some::Stuff;. But:

  • Type definitions may have bounds (eg type Foo: Clone;)
  • Type definitions may have generic parameters (eg type Item<Foo> = Bar;)
  • Type definitions may have where-clauses (eg type Foo where T: Debug = Bar;)
  • Type definitions are not required to have a type (eg type Foo;)

See the full definition there: https://doc.rust-lang.org/reference/items/type-aliases.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I'd like to defer this to a follow-up PR about traits, then we could have one TyDefinition which supports both.

@Bromeon
Copy link
Collaborator Author

Bromeon commented Aug 8, 2022

The biggest problems are the incomplete parsers for type and const declarations, but it's always better than no implementation at all, so I wouldn't mind merging it now.

I agree here, but maybe I should highlight missing parts (e.g. as // TODO) so they could potentially be added later (or at least they document the limitations). I can do this for some of the inline code comments, which require bigger changes and would further delay this PR.

Do you want me to give you write access to the repository? Since I don't intend to maintain the project in the coming months, you might want a tighter feedback loop. And while I made a lot of small nitpicks, overall the quality of your contributions seems high enough I don't need to double-check everything.

Thanks a lot for the compliment and the trust, it's appreciated 🙂

Write access might be nice for quicker integration, thanks for offering. Even though, I have to say your feedback in PRs has also been very valuable; I still need to understand some of the conventions and practices of venial. I don't know if you plan on reviewing things after some months, but I guess you could also directly clean up the main branch whenever you see fit (for smaller nitpicks, it's often faster than expressing your thoughts into words, letting me implement them and reviewing them again).

Maybe more generally, would that be more of a hiatus for a couple of months, or are you generally not sure about long-term plans? Are there more people involved or potentially interested in venial's maintenance? With godot-rust and a couple other open-source projects, I could probably implement some things every now and then, but realistically I don't see how I can spend enough time to help review PRs etc. I do see a lot of potential for a lightweight parser though, and know some people who think similarly.

@PoignardAzur
Copy link
Owner

I agree here, but maybe I should highlight missing parts (e.g. as // TODO) so they could potentially be added later (or at least they document the limitations).

Yeah, that's good practice.

I don't know if you plan on reviewing things after some months,

I still plan on reviewing incoming PRs (there's few enough it's not exactly a drag).

but I guess you could also directly clean up the main branch whenever you see fit (for smaller nitpicks, it's often faster than expressing your thoughts into words, letting me implement them and reviewing them again).

That's the idea. 😁

Maybe more generally, would that be more of a hiatus for a couple of months, or are you generally not sure about long-term plans?

I don't really have long-term plans for venial. As I've said elsewhere, the initial benchmarks have been somewhat disappointing (it's faster than syn, but not so much that I'd expect mass adoption). I mostly see the project as an informative experiment.

My plan is to make a 1.0 release at some point this year, with an attached post-mortem, and then I'll only fix issues and merge PRs.

Are there more people involved or potentially interested in venial's maintenance? With godot-rust and a couple other open-source projects, I could probably implement some things every now and then, but realistically I don't see how I can spend enough time to help review PRs etc. I do see a lot of potential for a lightweight parser though, and know some people who think similarly.

I'm not trying to dump the project in your lap ^^.

There aren't too many people involved in the project (as you can see from the commit history). If you know people who'd be interested in getting involved, sure, that would be helpful, but I don't really expect it.

Again, I don't plan on archiving the project. If people submit PRs, I'll review them. I just don't plan to have a very active role in the future.

(And honestly, given this was only side project I tried to do in a week to distract myself from Druid, I'm pretty impressed with how far it has come)

@Bromeon
Copy link
Collaborator Author

Bromeon commented Aug 9, 2022

I'm not trying to dump the project in your lap ^^.

No worries, that's not how I understood it, just wanted to make sure you're not disappointed if my contributions decrease in frequency over time 😉

I don't really have long-term plans for venial. As I've said elsewhere, the initial benchmarks have been somewhat disappointing (it's faster than syn, but not so much that I'd expect mass adoption). I mostly see the project as an informative experiment. [...] (And honestly, given this was only side project I tried to do in a week to distract myself from Druid, I'm pretty impressed with how far it has come)

In godot-rust, syn is a major compile time offender (total compile time 206s fresh, of which syn is 18%):

grafik

Now I won't rewrite that all to venial, so it's hard to tell how much difference there really is. But what I am going to try is use venial for the next version of godot-rust (GDExtension binding) and see how far I can push it. I'm a huge fan of lightweight libraries -- could be that I won't reach my goals in the end, but I think venial deserves to get a chance in a larger project 🚀

syn is an extremely powerful and versatile library, but two things bug me in particular:

  1. Libraries tend to be overly liberal in using it and enabling many crate features, so a dependency tree often has almost all features enabled. This in turn means that even libraries that only use the very basic syn configuration pay for the complex parsing.
  2. Most of the times, what's needed is parsing "headers" (attributes, struct fields, function signatures, etc) and not function bodies or expressions. syn however parses everything (see above point). I can imagine that the extra parsing time and all the intermediate allocations to store symbols can add up quite a bit over many crates.

A syn 2.0 could possibly address 1) by modularizing the crates, and 2) by providing lazy evaluation for rarely used items. But this is a mostly speculative idea without benchmarking. venial is a good trade-off here.

@Bromeon
Copy link
Collaborator Author

Bromeon commented Aug 9, 2022

I addressed your feedback:

  • improved consistency in naming and tests
  • support for const C: i32;
  • support for #![inner_attr] in impl blocks
  • added TODOs for limitations

The TyDefinition I could extend in another PR, which would add support for traits.

Another thought -- once venial supports a lot more cases than just basic structs and some functions, should we advertise it a bit (Twitter, Reddit etc.)? Maybe after a 0.5 release or so...

Also renames tk_hashbang to tk_hash (the `#` token).
@PoignardAzur
Copy link
Owner

Another thought -- once venial supports a lot more cases than just basic structs and some functions, should we advertise it a bit (Twitter, Reddit etc.)? Maybe after a 0.5 release or so...

Yeah, that's definitely the plan!

@Bromeon
Copy link
Collaborator Author

Bromeon commented Aug 13, 2022

Do you think we can merge this PR as it is now? 🙂

@PoignardAzur PoignardAzur merged commit 24bdfe7 into PoignardAzur:main Aug 15, 2022
@PoignardAzur
Copy link
Owner

Done.

@PoignardAzur
Copy link
Owner

That comment you added makes me think it might be worth adding an ARCHITECTURE.md file to the project.

@Bromeon Bromeon deleted the feature/impl branch August 15, 2022 16:53
@Bromeon
Copy link
Collaborator Author

Bromeon commented Aug 15, 2022

That sounds like a good idea! I haven't seen the ARCHITECTURE.md convention before, maybe CONTRIBUTING.md is more common as that's the file that contributors will likely read second (after the ReadMe)? Or a link to that file would also work.

@PoignardAzur
Copy link
Owner

I haven't seen it in the wild too much (or at all?), I'm referring to the kind of documentation described in this post: https://matklad.github.io/2021/02/06/ARCHITECTURE.md.html

I see it as being a bit different from CONTRIBUTING.md (one is more of a checklist, the other is a high-level view of the project) but it's not that important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants