Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete rewrite of the generator #344

Open
MaikKlein opened this issue Dec 13, 2020 · 16 comments
Open

Complete rewrite of the generator #344

MaikKlein opened this issue Dec 13, 2020 · 16 comments

Comments

@MaikKlein
Copy link
Member

The current design of the generator has become almost unmaintainable. There have been lots of changes to vk.xml and how we use it in ash in the past 5 years.

I started a very early WIP branch a couple of months ago, but haven't had much time to work on it so far. But I now have more time to spare and hopefully I can finish this up in the next couple of months.

I am going to list some of the changes/additions I have been thinking about:

  • Fully switch to vk-parse, don't use the outdated vkxml anymore

  • Don't use syn and quote anymore. They don't really have many benefits for pure codegen. Instead we will write directly to the files ourselves.

  • The generator needs to be easily understandable and documented. Need to make it easier for people to contribute.

  • Feature flags all the things: ash gets slower to compile every release.

    • Feature flag for every extension: By default everything will be active. But you can choose to opt-in into only the extensions are you want to use.
    • Feature flag for builders: While builders are nice, not everyone is using them and they add quite a bit of build time.
    • Feature flag provisional apis: Some extensions in vulkan are experimental, instead of generating those by default, you will have to opt-in into those
    • (Possibly?) Feature flags for the high level interface. For people that only want the low level bindings. Otherwise we still might split ash into ash and ash-sys.
  • Generate everything: Currently only the low level part is generated. Every high level function is written by hand. Instead we will now generate 95% and only handwrite functions that are particularly difficult to generate correctly.

    • Extensions: We still can not automatically generate high level extensions by default because the output has to be verified by a human. Otherwise it would be too easy to run into breaking changes.
      • To add a new extension: It should be as easy as to add the name of the extension to a list, re-run the generator, verify the output, and potentially write some custom overrides for some functions.
    • Feature flags would have to be generated as well. (in .rs and .toml)
    • Make it easy to manually override functions that are hard to generate.
  • No build.rs: I have been thinking about this for a while. Initially I wanted to move the generation into a build.rs file, but has some limitations. We can't autogenerate the feature flags for extensions, and it will introduce dev-depencies. I'd rather have ash remain a 0 dependency library for now.

  • No huge files anymore: We need to split everything into multiple files. Github already can't show the diffs anymore for some files, and rustc/rustdoc struggle as well with files that are 50k loc.

  • ...

@MarijnS95
Copy link
Collaborator

@MaikKlein That'd be a great improvement, especially ditching vkxml that has now been unmaintained for quite some time and with a questionable conversion layer from vk-parse to vkxml format.

In particular pointers have bothered me. As explained in #341 (comment) I've deduplicated the type conversion system for safe functions (builders) and their unsafe struct/FFI counterpart in master...Traverse-Research:simplify. Pointers for dynamic arrays denote the same level as the array type, whereas the add up for static arrays.

I'm curious to the thought behind your choice to drop quote though. Having worked on GTK's gir very recently I've been in the process of converting parts to quote because it is much more readable than format strings and \n + \t littered all around codegen files. That latter problem is easily solved by removing them altogether, the generated files will be ran through rustfmt anyway. However, having the ability to write code-blocks effectively in-line, with syntax highlighting and easy expansion of optionals and arrays (not on members though) seems hugely beneficial to me.
In addition things written to a file can't be modified anymore, so you have to be really cautious when to write to a file and when to return the code as String instead (this is a problem in gir utility functions where both exist). With quote that thought process is gone.

I had also already performed the conversion to syn/quote/proc_macro v1: master...Traverse-Research:quote1.

Both of these branches have not been submitted yet to not take away review from KHR_ray_tracing and its preparation PR. Not sure if I should submit them now that those made it in, perhaps that might help when reworking existing parts of the codebase into the new, refactored generator?

@filnet
Copy link
Contributor

filnet commented Dec 14, 2020

If possible, please address #305 during the rewrite.

@MarijnS95
Copy link
Collaborator

@MaikKlein Perhaps something to take into account with the new generator: have you thought about what to do when multiple struct fields share the same length field? Right now the builder overwrites it which can either yield unexpected results or out of bounds reads. A good example:

I extended our use of SubmitInfo to take more wait_semaphores, but forgot to pass a longer array to wait_dst_stage_mask. "Fortunately" our ordering set wait_dst_stage_mask after wait_semaphores, overwriting the length of >1 with a length of 1, leaving me wondering why not all semaphores were waited for (in turn resulting in validation layer warnings at least). If it were the other way around we probably got a validation layer for the garbage that would have been read out of bounds of wait_dst_stage_mask.

There seem to be a couple ways to make this safer:

  • Combine all arguments sharing the same length in a single builder call, to assert (or at least warn on) length mismatches. Downsides:
    • This makes it harder to distinguish individual members (ie. when looking at autocomplete);
    • Impossible or annoying to leave some members optionally empty.
  • Keep track of all separate lengths in the builder struct;
  • If the length field isn't 0 (some other array must have been set before), ensure the current array is of the same length;
    • This way fields cannot be re-set (to a different length).
  • Or is this a level of safety we don't want to provide/enforce in Ash, being an intermediate "not really safe" but also "not really unsafe" layer?

@cheako
Copy link

cheako commented Feb 3, 2022

get_device_proc_addr() feels like it should be callable from an ash::Device.

@aloucks
Copy link
Contributor

aloucks commented Feb 3, 2022

I really like how errupt is laid out. It would be nice if ash considered structuring the code in a similar way.

I thought about switching to it at one point but ran into a few things that I didn't like (and I think ash should avoid or keep in mind)

  1. I find gitlab much less ideal than github from a contribution and community perspective
  2. The generator did not work on Window (at least at the time of when I was investigating it)
  3. There were no unit tests
  4. There was no CI at all

@Friz64
Copy link
Contributor

Friz64 commented Feb 4, 2022

Just for clarity's sake, generator Windows support and CI support was since addressed.

@Ralith
Copy link
Collaborator

Ralith commented Feb 4, 2022

We've discussed merging with erupt in the past, and trying to assemble the best of both worlds. A detailed writeup of where the libs currently differ, and proposals for which approach to retain, would be a great place for that effort to kick off.

@Friz64
Copy link
Contributor

Friz64 commented May 10, 2022

I put together a roadmap on where erupt currently stands in regards to the ash merge. https://gitlab.com/Friz64/erupt/-/issues/56

@cheako
Copy link

cheako commented May 10, 2022

There is also higher level APIs that implement auto-destroy or treating CommandBuffers with different Queue properties as being separate rust types, to guard against calling unsupported cmd functions at compile time.

@MarijnS95
Copy link
Collaborator

@Friz64 That's great, looking forward to collaborating on a unified, smooth Vulkan wrapper experience! I'll make sure to be around and help out as much as I can.

For starters we should lay out the differences between both our generators and the code generated from it. Combine that with current desires for the codebase and we should have a nice plan forward, mainly driven by the resulting user-visible API.

What do you think is the best way to cook this up: a long GH discussion thread, a collaborative design document or something else entirely? I could start naming some random points but don't think that's very useful without structure.

@Friz64
Copy link
Contributor

Friz64 commented May 10, 2022

looking forward to collaborating on a unified, smooth Vulkan wrapper experience

❤️

What do you think is the best way to cook this up: a long GH discussion thread, a collaborative design document or something else entirely? I could start naming some random points but don't think that's very useful without structure.

My initial thought was using a GitHub discussion where you can throw in comments for any differences that need to be addressed and people can discuss what to do in the replies of each comment.

@Friz64
Copy link
Contributor

Friz64 commented Sep 24, 2022

I really dislike how I announced this over 4 months ago, and then promptly disappeared. I do want to get this done, but it hasn't worked with the current approach. I'm not able to exactly pinpoint what went wrong, but I think it's the merge being too daunting for me, especially the fear that I'd have to rework big parts of the existing erupt generator codebase, and that being a ton of annoying work very hard to finish.

So, I want to try to start work on the new generator now, and discuss design choices as they come up. This doesn't mean we have to throw old the entirety of the cold code, there's certainly still ideas and snippets that can be reused. If this plan sounds good, i'll start a "Design of the rewrite" discussion and (hopefully be able to) get to work.

@Ralith
Copy link
Collaborator

Ralith commented Sep 24, 2022

Sounds good to me, thanks!

@Friz64
Copy link
Contributor

Friz64 commented Sep 24, 2022

The discussion: #662

@MarijnS95
Copy link
Collaborator

Same here, this topic crossed my mind a few times yet have too many responsibilities on my mind. Alas, I don't think this is a thing anyone should be doing all alone.

It doesn't seem like merging would work. If anything I'd start with the generator that's in the best shape - erupt - and progress from there. Since we're merging into ash (which currently has the largest userbase) we should aim for some resemblance of the current API and have a proper discussion when/where to diverge?
We can integrate the erupt history into that of ash (in a separate folder), delete all its generated code, and start changing it with a PR-based workflow from there, so that everyone can work on it at their own pace and contribute smaller or larger chunks? Unless you @Friz64 rather think it is best to start anew?

In the end I prefer to set some goals targeting the outcome of the generator: where do ash and erupt differ, and what can they learn from each other? Choose the better of the two, add a list of improvements either side still wishes to make for an API and we at least have something to work off of.

For the generator itself I mostly wish that ash's generator would have been split in a clearer "analysis" and "codegen" part. Analysis ties all the information together (i.e. setting up links between the types/functions and the extension(s) providing them, flatten the aliases, propagate references for lifetimes needed in nested struct-pointers, etc), and codegen simply reads out pre-made datastructures and spits them out to a file.

One thing to keep in mind is that I recently found @jdtatz overhauling the entire generator to finally switch to vk-parse so that we can use newer fields, instead of relying on the unmaintained/useless vkxml (with the necessary support in krolli/vk-parse#26). Will we still want the generator to use vk-parse, or roll its own parser inside ash?

@Friz64
Copy link
Contributor

Friz64 commented Sep 24, 2022

Unless you @Friz64 rather think it is best to start anew?

Yes, I think it's easier to start anew, there's less old code that interferes with new requirements. And, as stated previously, it's not like we can't still use it to some degree.

For the generator itself I mostly wish that ash's generator would have been split in a clearer "analysis" and "codegen" part. Analysis ties all the information together (i.e. setting up links between the types/functions and the extension(s) providing them, flatten the aliases, propagate references for lifetimes needed in nested struct-pointers, etc), and codegen simply reads out pre-made datastructures and spits them out to a file.

I agree that the new generator should use this design.

Will we still want the generator to use vk-parse, or roll its own parser inside ash?

I liked not having to use external crates for any Vulkan specific business in erupt, it allowed me to be more flexible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants