New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Part 3: Use webpack content hash instead of rails asset digests #30901
Conversation
f2d8cfc
to
5eea97d
Compare
04a8591
to
f8e152b
Compare
59454da
to
a4b1510
Compare
@islemaster and @wjordan , can you please provide a review of the overall design linked in the PR description? CC @Hamms and @joshlory in case they have input. |
reviewing tip: when taking a first look at the |
@@ -54,7 +54,7 @@ Rerun `assets:precompile` to regenerate new assets and try again." | |||
# updated if there are any changes to Sprockets processors. | |||
module NoDoubleDigest | |||
def digest_path | |||
logical_path.match?(/wp\h{32}/) ? logical_path : super | |||
logical_path.match?(/wp\h{20}/) ? logical_path : super |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjordan , I am particularly interested in your opinion on this aspect of the plan, where we tell sprockets to skip digesting for assets which already have webpack content hashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct to me. If we're going to split this responsibility between Webpack and Rails Asset Pipeline, having a clear name like webpack_asset_path
helps a lot.
apps/Gruntfile.js
Outdated
// Ideally, the target file would have the .min.js suffix in | ||
// production mode. This could be accomplished by nesting these files | ||
// within a minifiable-lib directory in the output package so that the | ||
// manifest plugin could do special processing on these files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this ideal? I thought it was a good thing that we were simplifying down to one way to reference a "built" bundle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this comment wasn't very clear. The proposed future change would be to change the name of the file in the output bundle so that the names of minified files have .min in them, not to include more copies of the file.
In production mode today we have something like this in the manifest:
"js/droplet/droplet-full.js": "/assets/js/droplet/droplet-fullwp0573b060df80c10bff8ba38a881f31a0.js"
if we took these extra steps, we could have something with .min
in the path value like this:
"js/droplet/droplet-full.js": "/assets/js/droplet/droplet-full.minwp0573b060df80c10bff8ba38a881f31a0.js"
This is slightly better because you can see that you're getting a minified resource when you look at what your browser is downloading, whereas with the currently proposed approach a developer may say "why am I getting non-minified droplet?" and spend some time investigating before discovering these comments.
@@ -54,7 +54,7 @@ Rerun `assets:precompile` to regenerate new assets and try again." | |||
# updated if there are any changes to Sprockets processors. | |||
module NoDoubleDigest | |||
def digest_path | |||
logical_path.match?(/wp\h{32}/) ? logical_path : super | |||
logical_path.match?(/wp\h{20}/) ? logical_path : super |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - does this logic depend on Webpack generating exactly 20-character hashes? How does this interact with the 32-character hashes we drop after copy-webpack-plugin, above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I glossed over something here. The regex on this line matches either a 32- or 20-character hash, since all it checks for is wp
followed by at least 20 hex characters. I will add a comment stating this.
The [contenthash]
in webpack's output.filename results in a 20-character hash, which is why this line had to be changed:
Lines 329 to 335 in a4b1510
const hash = minify && !debugMinify ? 'wp[contenthash]' : ''; | |
var config = _.extend({}, baseConfig, { | |
output: { | |
path: outputDir, | |
publicPath: '/assets/js/', | |
filename: `[name]${hash}${suffix}` |
strangely, the [contenthash]
placeholder consumed by copy-webpack-plugin
gives a 32-character hash:
code-dot-org/apps/Gruntfile.js
Lines 886 to 893 in a4b1510
new CopyPlugin([ | |
{ | |
from: 'build/locales', | |
to: minify | |
? '[path]/[name]wp[contenthash].[ext]' | |
: '[path]/[name].[ext]', | |
toType: 'template' | |
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these references be added to the comment so this knowledge doesn't get lost in the future? Without this reference I don't think it'll be obvious why it's a '20 or 32 character string' rather than one or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
2eacc18
to
9b02582
Compare
c86406c
to
707cd06
Compare
PTAL, @islemaster. After rebasing (apologies) onto Parts 1 & 2, I have done some local verification which is now listed in the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, I just added a couple comments and clarifying questions, mostly around the documentation.
Overall I'm still a bit overwhelmed by the complexity of the two overlapping asset-compilation pipelines we've mashed together, and I'm not entirely sure I understand/remember how every piece fits together any longer, but this seems to be a great incremental step in the direction of simplifying them and also unblocks what could be huge performance optimizations as well.
apps/webpack.js
Outdated
const suffix = minify && !debugMinify ? '.min.js' : '.js'; | ||
const suffix = minify ? '.min.js' : '.js'; | ||
// This generates a 20-hex-character hash. | ||
const hash = minify ? 'wp[contenthash]' : ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be simplified by just adding the hash to suffix
rather than a second variable, e.g., const suffix = minify ? 'wp[contenthash].min.js' : '.js'
apps/docs/build.md
Outdated
@@ -172,49 +172,75 @@ different configurations get run. Here is the approximate flow of commands/files | |||
|
|||
The rails asset pipeline can be configured to do bundling and minification, | |||
but it is a significantly less powerful tool than webpack. As a result, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're doing an updating pass on this documentation text: The above comment was written 3 years ago by someone no longer working on this project, and it's unclear what 'significantly less powerful tool' means or whether it's still relevant, though it seems to hint at some basis for the current architecture. I'd suggest either removing the line or updating it to refer to something more specific and unambiguous in the current design.
apps/docs/build.md
Outdated
only use the rails asset pipeline to add "digests" to the few assets which | ||
do not already contain hashes generated by webpack, and to retain each | ||
hashed/digested asset for a few deploys so that links to them do not break | ||
immediately after each deploy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions about this line:
-
What are the 'few assets which do not already contain hashes generated by webpack'? (I see
application.js
/application.css
are listed in the paragraph below, but also 'a few other'.) Are these a small handful of 'legacy' assets that are in the process of being migrated over fully (so that this number will eventually be zero), or will there always be a 'few' assets in this state moving forward? Are they a certain category of asset that can be specified more specifically? (Edit: I see this is described in a bit more detail in the 'Future work' in the design doc- perhaps worth referencing that here, but not necessary, at least I understand it a bit better.) -
Does this line mean that all hashed/digested assets are retained for a few deploys by the rails asset pipeline, or only the 'few assets' will be retained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The "few assets which do not already contain hashes generated by webpack" are the union of:
a. The 85 calls tojavascript_include_tag
,stylesheet_link_tag
orasset_path
which do not point toapplication.js
orapplication.css
(as described in Future work).
b.application.js
andapplication.css
(a) will be relatively easy to move into webpack.
(b) will be hard to change any time soon, so I'm advocating that we leave application.js
and application.css
as the only two things not being referenced via webpack_asset_path
once follow-up work is complete.
- all hashed/digested assets are retained for a few deploys by the rails asset pipeline.
apps/docs/build.md
Outdated
When loading a bundle in dashboard or pegasus in production via the | ||
`webpack_asset_path` helper, the helper will look in the webpack manifest to | ||
determine the full path of the asset to load, including the content hash. | ||
This method is preferred over those methods listed in the previous paragraph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that this method is interchangeable with the previous methods? Is it preferred because the previous methods are deprecated and will be eventually removed? Does it depend on where and how the asset is being stored / generated?
Also, is there a difference between 'bundle' and 'asset'? They are used to refer to the same thing in this paragraph. Is an image, for example, both a 'bundle' and an 'asset', or is it an 'asset' but not a 'bundle'? Does this preferred approach only work for 'bundles', or for all 'assets'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
webpack_asset_path
is not interchangeable with previous methods. Previous methods (javascript_include_tag
, stylesheet_link_tag
and asset_path
) are deprecated, except for references to application.js / application.css.
a "bundle" is a file which was output specifically by webpack as defined here: https://webpack.js.org/concepts/#output
the rails asset pipeline seems to refer to both its inputs and its outputs as "assets", and this is how I'd recommend that we use this term.
Please let me know if that clarifies things for you. I will go through and make another pass of this document to follow (and perhaps define) this particular part of the terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that helps quite a bit.
That said, maybe it's because I'm not directly familiar with how we currently use Webpack loaders, but how we're using the concept of 'bundle' (more specifically, what kinds of content we imply by our use of the term) still isn't quite clear to me. In earlier JS-pipeline frameworks, a 'bundle' was strictly a set of compiled JavaScript sources that would be executed as a self-contained unit, through a well-defined interface of entry-module exports. Now with lazy-loading, code-splitting, and Webpack's whole giant ecosystem of loader plugins, a Webpack 'bundle' can contain pretty much any kind of content you want, but sometimes it can still mean 'mostly javascript, with a sprinkle of related content'.
I'd like a bit more clarity on what kinds of content we reasonably expect to load in webpack bundles (if it's more than strictly JS sources) - especially if this pipeline will be our 'preferred' approach for incorporating assets into our application moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the reference to the term "bundle" from this part of my added comments. there are a few uses of this term earlier on the page, but I'd like to postpone tackling updates to those section because our bundling strategy is likely to change in the course of the upcoming optimizations.
@@ -54,7 +54,7 @@ Rerun `assets:precompile` to regenerate new assets and try again." | |||
# updated if there are any changes to Sprockets processors. | |||
module NoDoubleDigest | |||
def digest_path | |||
logical_path.match?(/wp\h{32}/) ? logical_path : super | |||
logical_path.match?(/wp\h{20}/) ? logical_path : super |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these references be added to the comment so this knowledge doesn't get lost in the future? Without this reference I don't think it'll be obvious why it's a '20 or 32 character string' rather than one or the other.
Thank you for the detailed review, @wjordan ! I will make another pass through the comments and docs to make sure the answers to your questions in this last round of feedback are clarified, without having to dig up this PR to understand what's going on. i agree it is confusing to use two asset managing tools (webpack plus rails / sprockets). one follow-up steps described in future work are completed, we'll essentially be using the rails asset pipeline only for (1) application.js/css, and (2) to retain copies of hashed/digested assets for a few deploys. If you think it might simplify things to find a different solution for reason (2) (including possibly implementing it ourselves), we might be able to simplify things a bit further by not passing the webpack output files through rails asset pipeline at all. if we get to that point, each asset would at least only get handled by one asset management tool. food for thought. |
I'm not sure what happened to this comment on assets_sync.rake, but I've added a brief reference to Gruntfile.js to see when each content hash length might be added. |
Thanks again for your detailed feedback, Will. I've updated comments to help clarify the answers to your questions. Please let me know if you see areas which require further clarification. |
The updated comments all look great and I think I have a much clearer understanding of the system now, thanks for your patience! 👍 |
Background
See webpack content hashing design doc
Description of this PR
There are 4 major categories of work in this PR:
enable content hashing in webpack entry points
d61c30d make webpack start adding content hashes to js output filenames in production mode
add content hashes to files from apps/i18n and apps/lib
change the way we process files via grunt so that webpack can add content hashes to them via
copy-webpack-plugin
. Libs fall into 1 of 2 categories:some libs (
bramble
,piskel
andace
) must not use hashed filenames because these libs reference their own files by filename. these continue to be copied directly into the apps package atbuild/package/js/
.some libs contain both minified and unminified copies of each js file. This PR stops including both in the output bundle and only includes one, depending on whether we are in production / minification mode or not. The implementaton of this is explained here:
code-dot-org/apps/Gruntfile.js
Lines 892 to 896 in dd19f04
start using webpack_asset_path
webpack_asset_path
replaces all instances ofminifiable_asset_path
,asset_path
, andjavascript_include_tag
which point to files in thejs/
subdirectory of the apps package. This step is simple in theory but touches many files.Some assets are still referenced using
asset_path
. This is addressed in the Future Work section of the design doc.avoid double-hashing
As explained in the design doc, we must not add a rails asset digest to assets which already have a webpack content hash. 7a56fdf and a4b1510 show the code changes relevant to this. The following existing comments help explain how this mechanism works:
code-dot-org/dashboard/lib/tasks/asset_sync.rake
Lines 47 to 60 in a4b1510
code-dot-org/apps/webpack.js
Lines 124 to 129 in a4b1510
Documentation
see updates to apps/docs/build.md at: https://github.com/code-dot-org/code-dot-org/pull/30901/files#diff-1c0b3456c9097c37376217f075c73680
Verification
yarn build:dist test:unit
passes locallyyarn build:dist
andrake assets:precompile