Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow packwerk to scan for packages inside of engines #216

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

AndrewSwerlick
Copy link

What are you trying to accomplish?

I'm trying to add support to packwerk for more complex repository structures, particularly mono-repos where multiple apps live in the same repo, and share code through engines that are defined in top level folders. Currently, packwerk cannot detect violations where an app is inappropriately using code defined in a package within an engine, because packwerk cannot discover engines that aren't defined below the Rails.root of the app.

What approach did you choose and why?

To support this, I modified PackageSet so it also looks at Rails.application.railties to pull a list of engines, and their root directories, and include those in the paths scanned for packages. I also dropped the restrictions on load paths for constant resolution inApplicationLoadPaths. The result is that packwerk can now find packages defined in these engines, and resolve constants in these engines.

With this setup, packwerk still only scans files in the actual app and it's subdirectories for violations, but now it can resolve constants in the engines, and attach them to packages in the engine.

What should reviewers focus on?

I'm not confident in the approach I took to mocking and testing things to support the new tests I wrote, so particular focus there would be helpful. Also I'm curious if there are implications to the changes to ApplicationLoadPaths that I'm not considering.

Type of Change

  • Bugfix
  • New feature
  • Non-breaking change (a change that doesn't alter functionality - i.e., code refactor, configs, etc.)

Additional Release Notes

  • Breaking change (fix or feature that would cause existing functionality to change)

Include any notes here to include in the release description. For example, if you selected "breaking change" above, leave notes on how users can transition to this version.

If no additional notes are necessary, delete this section or leave it unchanged.

Checklist

  • I have updated the documentation accordingly. Do we want documentation changes for this, to call it out as a specific use case?
  • I have added tests to cover my changes.
  • It is safe to rollback this change.

Copy link
Contributor

@alexevanczuk alexevanczuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this on Gusto's codebase and it didn't produce any change to the number of inspected files or the time to run bin/packwerk check.
bin/packwerk update-deprecations also produces no diff.

It's my understanding based on this and based on the principles of this change that this should continue to work as normal for systems whose application structure does not have multiple rails app and engines at the root.

Before

time bin/packwerk check
Running via Spring preloader in process 35737
📦 Packwerk is inspecting 40922 files
# ... 
📦 Finished in 18.11 seconds

No offenses detected
No stale violations detected

real    0m21.756s

After

Running via Spring preloader in process 70028
📦 Packwerk is inspecting 40922 files
# ...
📦 Finished in 26.05 seconds

No offenses detected
No stale violations detected

real    0m29.417s

The good news is this didn't produce changes to the number of files inspected or any changes to violations.

One thing I did notice is that it appears to increase runtime significantly (~30% increase in runtime). I didn't get to investigating this more thoroughly, but I wonder if it's because there is a lot files to glob out and scan? I'm really surprised it would add 8 seconds like this though, but it appears to be happening consistently across runs.

I'd love to dig in a bit deeper on the performance implications. Separately, I realize that this is actually a behavioral change for user, and in theory could be a breaking change. I wonder if (A) to make it non-breaking and (B) unblock you to use this in your application without us digging into the performance implications we can have this be off by default and able to be turned on with a configuration in packwerk.yml. I imagine something like include_external_rails_engines (there is probably a better name here). I use "external" to distinguish setups that have unbuilt rails engines located within the rails app.


all_paths
.transform_keys { |path| Pathname.new(path).expand_path }
.select { |path| path.fnmatch(rails_root_match.to_s) } # path needs to be in application directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this served a purpose before. I suppose that for the most part, autoload paths are always going to be under rails root, which is why this may have been redundant by that implicit fact. I don't know if you can configure Rails to identify autoload paths outside of the application directory. You probably can, but I don't know why we wouldn't want packwerk to consider those, so at the surface this seems fine to me, especially since it doesn't change how things work in Gusto's configuration of packwerk (and as long as it doesn't for Shopify's either).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we introduced this line to keep packwerk from inferring constants to be defined in gems. Not sure it's still necessary as we might now be doing that filtering somewhere else.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we introduced this line to keep packwerk from inferring constants to be defined in gems

I think that filtering is also happening with the line right below it, but I'll double check testing against our mono repo.

That being said, I do wonder if maybe changing the behavior so packwerk does infer constants defined in gems could be a feature not a bug? I can imagine a use case where a gem author publishes a rails engine, with packages defined inside of it, and they intend for that to message to consumers what the public API of the gem is, vs the internals you shouldn't be relying on. I'm not sure if this use case has been explicitly considered and rejected by packwerk maintainers, or just hadn't been considered yet, and ya'll chose to filter out constants from gems for performance or other reasons.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that filtering is also happening with the line right below it, but I'll double check testing against our mono repo.

Just closing this loop, testing this function with our own repo, which has both locally defined engines, and engines from ruby gems via bundler, I'm only seeing our locally defined engine paths show show up

@AndrewSwerlick
Copy link
Author

AndrewSwerlick commented Aug 4, 2022

@alexevanczuk

One thing I did notice is that it appears to increase runtime significantly (~30% increase in runtime). I didn't get to investigating this more thoroughly, but I wonder if it's because there is a lot files to glob out and scan? I'm really surprised it would add 8 seconds like this though, but it appears to be happening consistently across runs.

I have some suspicions as to where this is coming from. I suspect I'm generating effectively overlapping globs, and ruby is scanning some directories multiple times as a result. There's probably a way to filter out those.

I'd love to dig in a bit deeper on the performance implications. Separately, I realize that this is actually a behavioral change for user, and in theory could be a breaking change. I wonder if (A) to make it non-breaking and (B) unblock you to use this in your application without us digging into the performance implications we can have this be off by default and able to be turned on with a configuration in packwerk.yml. I imagine something like include_external_rails_engines (there is probably a better name here). I use "external" to distinguish setups that have unbuilt rails engines located within the rails app.

I'm cool with a new configuration key. I'll add it to PR shortly

@exterm
Copy link
Contributor

exterm commented Aug 8, 2022

maybe changing the behavior so packwerk does infer constants defined in gems could be a feature not a bug?

We'd have to come up with a convention for non-application package names. Since package name and path from application root are the same in packwerk, we don't have a convention for the name of a package that lives outside of the application root.

E.g. the package components/platform/thingamajig would always live at the path $RAILS_ROOT/components/platform/thingamajig.

@alexevanczuk
Copy link
Contributor

maybe changing the behavior so packwerk does infer constants defined in gems could be a feature not a bug?

We'd have to come up with a convention for non-application package names. Since package name and path from application root are the same in packwerk, we don't have a convention for the name of a package that lives outside of the application root.

E.g. the package components/platform/thingamajig would always live at the path $RAILS_ROOT/components/platform/thingamajig.

I'd love it if packwerk could be used to detect gem dependencies someday. I was excited about this because it could help folks keep their dependencies on other gems down, which ultimately makes packwerk more valuable from the perspective of a tool to help folks extract a system into another service (i.e. you need to explicitly list all your gem dependencies to do so).

That being said as @exterm said we have a handful of questions to answer, but could be cool for us to start with a design brainstorm in a discussion somewhere.

@AndrewSwerlick
Copy link
Author

Just a heads up, the new configuration key is a little trickier than I realized, just because building a package PackageSet relies on self methods that don't have access to the Configuration object, so I'm ending up with a lot of messy parameter passing.

At this point I think it's probably worth it to refactor these self methods into a new PackagePaths class, which takes in a full configuration object and ends up replacing parts of PackageSet.load_all_from and PackageSet.package_paths. Otherwise this new config key has to get passed down as a dedicated argument to a few different methods that already have 3 arguments. I'm going to plan to do that, and then add support for this key.

@alexevanczuk
Copy link
Contributor

I think that sounds reasonable to me @AndrewSwerlick , though may have some more feedback when I see the diff. Really appreciate you continuing to push through on this, and please let me know if you feel stuck or frustrated with anything and I'd be happy to pair with you again on zoom.

@AndrewSwerlick AndrewSwerlick force-pushed the packages-anywhere branch 2 times, most recently from 8219642 to fa46a24 Compare August 11, 2022 02:31
@AndrewSwerlick
Copy link
Author

@alexevanczuk I've added the configuration setting. It's made the PR a little more gnarly, so I cleaned up the history so you can see the refactoring changes in one commit, and then the introducing of the new capabilities in the second.

I still need to update the documentation to check off all the boxes.

@shageman
Copy link
Contributor

I have a suspicion there will be some interaction between our two PRs: #218

Copy link
Contributor

@alexevanczuk alexevanczuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this seems sensible to me, and at this point mostly wanting @rafaelfranca 's take on it.

There has also recently been some other client requests around changes in the way packwerk identifies and pulls in load paths: #218

@rafaelfranca it would be great to find some time (over slack or zoom) to chat more about the type of ways clients might want packwerk to find and pull in load paths and see if there's a way we can systematically enable some more flexibility!

@@ -1,4 +1,4 @@
# typed: true
# typed: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why'd we need to downgrade the typed-ness of this file? Could we keep it at true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to use T.unsafe(...) to wrap something that is challenging to type check if it can keep us at a higher typed level, although better to type check it if we can.

case template
when :minimal
set_load_paths_for_minimal_template
when :skeleton
set_load_paths_for_skeleton_template
when :external_packages
set_load_paths_for_external_packages_template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This + the helper methods looks interesting -- I'm not too familiar with the Rails happy path for setting and knowing about Rails engines. This looks fine but curious if @rafaelfranca has any feedback on setting up an engine stub for test.

@AndrewSwerlick
Copy link
Author

@alexevanczuk I think I addressed all the feedback. I know there are some other ongoing discussion, so let me know if there's anything else I can do related to this PR.

@AndrewSwerlick
Copy link
Author

Hey @alexevanczuk, I just wanted to bump this and see if it's still something we can explore getting merged in. I know there's been some conversations in other PRs that may make some of this obsolete, so I'm open to making changes as necessary, but wanted to check in and see where you all stood.

Copy link
Contributor

@alexevanczuk alexevanczuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me @AndrewSwerlick. @rafaelfranca let me know if you have any concerns prior to merging and releasing

@alexevanczuk
Copy link
Contributor

@rafaelfranca do you happen to know why CI is not running for this PR?

@alexevanczuk
Copy link
Contributor

@AndrewSwerlick Could you try pushing another commit to this branch (and then perhaps reverting it) to see if it kicks off CI?

@alexevanczuk
Copy link
Contributor

@gmcgibbon Would you be able to look at this one too?

def engine_paths_to_scan
bundle_path_match = Bundler.bundle_path.join("**")

Rails.application.railties
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't depend on railties in the gemspec so I'd rather not depend on rails constants here. I think this also assumes all local engines are packages. Couldn't you just glob for package.yml files instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't depend on railties in the gemspec so I'd rather not depend on rails constants here

I'll address this in your second comment below about paths because I think they're kind of related

I think this also assumes all local engines are packages

That's not exactly accurate. It assumes any local engine might contain packages, but not that the engine itself is a package. The engine doesn't have to contain packages either. It's just adding all the paths to the list of paths that later get scanned for via glob for a package.yml file.

:custom_associations,
:config_path,
:cache_directory,
:packages_outside_of_app_dir_enabled,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of this name. Maybe instead we could specify a root path (or list of paths) to scan for packages from?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did create a prototype that did something like that and proposed a potential solution in #206. It wasn't exactly the same, I was proposing creating load path aliases instead, but the general feedback as I understood it was that we didn't want to expand to the permanent configuration api.

The idea behind this configuration key is that it's meant to be a kind of feature flag, that would be replaced by an opt-out, and then a full removal after this change had been out in the wild and safely tested by a few people.

If we want to expand the permanent configuration api, then a key like external_package_paths work for our particular use case, but it has some downsides in that it could lead to confusion. Ultimately, these external package paths will only work if they're file structure follows zeitwerk. I'm not sure how easy it is to validate that in advance, or provide useful warning messages to the users as to why package rules aren't being enforced.

This is ultimately the advantage of railsties based approach, in that the engines are guaranteed to follow that structure, so it allows for a zero config, reliable approach, at the downside of creating another dependency

I didn't realize rails wasn't already an explicit dependency though, so I agree if we stick to a rails tie approach, the proper rails gem should be added to the gemspec.


def create_new_engine_at_path(path)
Class.new(Rails::Engine) do
T.unsafe(self).define_method(:root) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather use T.bind(self, Class) above this line so Sorbet understands the block is a Class. I think this might also be fixed if we just upgrade Sorbet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Moves arguments passed into the self methods on `PackageSet` into a
new `PackagePaths` object. This makes it easier to add new arguments
without alot of parameter drilling. This will support the next commit
where we add the ability to conditionally find packages outside of the
app dir
This commit modifies packwerk so that it can scan for packages defined
in any engines loaded in a given rails application. This allows
support for more complex repository structures where the rails app is
not at the root of the repository, and where that app depends on
engines defined in sibiling directories.

To support this, I modified `PackageSet` so it also looks at
`Rails.application.railties` to pull a list of engines, and their root
directories, and include those in the paths scanned for packages. I
also dropped the restrictions on load paths for constant resolution in
`ApplicationLoadPaths`. The result is that packwerk can now find
packages defined in these engines, and resolve constants in these
engines.

With this setup, packwerk still only scans files in the actual app and
it's subdirectories for violations, but now it can resolve constants
in the engines, and attach them to packages in the engine.

For testing, I created a new fixture which models this sort of
directory structure, and mocks an app with a rails engine.
Fixed some typing issues by using unsafe where necessary
@rafaelfranca
Copy link
Member

Currently, packwerk cannot detect violations where an app is inappropriately using code defined in a package within an engine, because packwerk cannot discover engines that aren't defined below the Rails.root of the app.

I'm not sure exactly why this is needed because what you are describing here is how we use packwerk and it already work for us without modification. Do you have an example of what doesn't work?

@AndrewSwerlick
Copy link
Author

AndrewSwerlick commented Nov 18, 2023

I'm not sure exactly why this is needed because what you are describing here is how we use packwerk and it already work for us without modification. Do you have an example of what doesn't work?

Sorry I didn't mean to generate noise on this. We had an internal hackathon recently exploring trying to get packwerk established in our ecosystem again, and I brought my fork back up to date to explore some approaches using it. I'll answer your question, but before I do, it's worth highlighting that I think I'm going to close out this PR because going forward we have a plan that will allow us to avoid some of the pain points that prompted this in the first place.

The first thing that will probably help is if I rewrite the piece you quoted. It should really read

Currently, packwerk cannot detect violations where an app is inappropriately using code defined in a package within an engine in an engine defined inside a symlink'd, path based gem, because packwerk cannot discover engines that aren't defined below the Rails.root of the app.

The specifics of our mono repo are such that we have a structure where we have a series of gems and apps as siblings at the root. The gems contain engines. The apps have Gemfiles that include the gems based on their symlink'd paths. Visually it looks something like this

- sales_app
  - local_gems
    - sales_core # symlink to ./sales_core  
- sales_core # a gem
- sales_admin
  - local_gems
    - sales_core # symlink to ./sales_core  

This creates two problems based on my testing.

  1. When it comes to constant resolution, Packwerk excludes everything not under Rails.root, based on the files absolute paths. Because of our use of symlinks, any files in the local gems get excluded and any constants defined in them get treated as unresolved and ignored.
  2. It also doesn't scan those engines for packages for basically the same reason

This PR fixes that problem by ensuring the engines load paths (which are absolute path values) are also scanned for both package and constant resolution.

All that being said, even with this PR we had some challenges trying to use some of the other community tools (ie https://github.com/rubyatscale/packs, https://github.com/rubyatscale/danger-packwerk) because of this repo structure, and some other issues. As a result, we decided to temporarily move forward with the experimental rust based parser in https://github.com/alexevanczuk/packs, which dodges many of these problems but not relying on zietwerk conventions. We're revisiting our unique mono repo structure for a number of unrelated reasons, and in the future may end up with something that plays better with packwerk itself out of the box. So we don't really have use for these changes anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants