Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 3: Use webpack content hash instead of rails asset digests #30901

Merged
merged 29 commits into from Oct 6, 2019

Conversation

davidsbailey
Copy link
Member

@davidsbailey davidsbailey commented Sep 23, 2019

Background

See webpack content hashing design doc

Description of this PR

There are 4 major categories of work in this PR:

enable content hashing in webpack entry points

d61c30d make webpack start adding content hashes to js output filenames in production mode

add content hashes to files from apps/i18n and apps/lib

change the way we process files via grunt so that webpack can add content hashes to them via copy-webpack-plugin. Libs fall into 1 of 2 categories:

  • some libs (bramble, piskel and ace) must not use hashed filenames because these libs reference their own files by filename. these continue to be copied directly into the apps package at build/package/js/.

  • some libs contain both minified and unminified copies of each js file. This PR stops including both in the output bundle and only includes one, depending on whether we are in production / minification mode or not. The implementaton of this is explained here:

    // Libraries in this directory are assumed to have .js and .min.js
    // copies of each source file. In development mode, copy only foo.js.
    // In production mode, copy only foo.min.js and rename it to foo.js.
    // This allows the manifest to contain a single mapping from foo.js
    // to a target file with the correct contents given the mode.

start using webpack_asset_path

webpack_asset_path replaces all instances of minifiable_asset_path, asset_path, and javascript_include_tag which point to files in the js/ subdirectory of the apps package. This step is simple in theory but touches many files.

Some assets are still referenced using asset_path. This is addressed in the Future Work section of the design doc.

avoid double-hashing

As explained in the design doc, we must not add a rails asset digest to assets which already have a webpack content hash. 7a56fdf and a4b1510 show the code changes relevant to this. The following existing comments help explain how this mechanism works:

  • # Patch Sprockets to skip digesting already-digested webpack ('wp') assets.
    # Webpack adds its own hash to imported image files and doesn't have any
    # knowledge of Sprockets digests, so the Sprockets processed-asset digest
    # path should equal the logical path for these assets.
    #
    # This means that the digest for these assets is based on the webpack content
    # and not the Sprockets-processed output, so the wp-digest will not get
    # updated if there are any changes to Sprockets processors.
    module NoDoubleDigest
    def digest_path
    logical_path.match?(/wp\h{20}/) ? logical_path : super
    end
    end
    Sprockets::Asset.prepend NoDoubleDigest

  • // note that in the name template given below, a dash prefixing
    // the hash is explicitly avoided. If rails tries to serve
    // this file when asset digests are turned off, it will return a
    // 404 because it thinks the hash is a digest and it won't
    // be able to find the file without the hash. :( :(
    loader: 'url-loader?limit=1024&name=[name]wp[hash].[ext]'

Documentation

see updates to apps/docs/build.md at: https://github.com/code-dot-org/code-dot-org/pull/30901/files#diff-1c0b3456c9097c37376217f075c73680

Verification

  • Drone runs unit + ui tests without minified js or asset precompilation.
  • yarn build:dist test:unit passes locally
  • the follow load locally without errors after yarn build:dist and rake assets:precompile
    • gamelab code + animations tab (p5, piskel)
    • weblab (bramble)
    • artist (blockly)
    • /home (with cookieBanner)
    • /learn (pegasus, with cookieBanner)

@davidsbailey davidsbailey changed the base branch from staging to remove-unminify-plugin September 23, 2019 16:59
@davidsbailey davidsbailey changed the base branch from remove-unminify-plugin to staging September 23, 2019 21:03
@davidsbailey davidsbailey force-pushed the content-hash branch 4 times, most recently from 04a8591 to f8e152b Compare September 24, 2019 20:25
@davidsbailey davidsbailey changed the base branch from staging to generate-webpack-manifest September 24, 2019 20:33
@davidsbailey davidsbailey changed the title Content hash Part 3: Use webpack content hash instead of rails asset digests Sep 24, 2019
@davidsbailey davidsbailey force-pushed the content-hash branch 2 times, most recently from 59454da to a4b1510 Compare September 25, 2019 18:47
@davidsbailey davidsbailey marked this pull request as ready for review September 25, 2019 19:28
@davidsbailey
Copy link
Member Author

@islemaster and @wjordan , can you please provide a review of the overall design linked in the PR description? CC @Hamms and @joshlory in case they have input.

@davidsbailey
Copy link
Member Author

davidsbailey commented Sep 25, 2019

reviewing tip: when taking a first look at the Files changed for this PR, go to the file filter... and unselect .haml.

@@ -54,7 +54,7 @@ Rerun `assets:precompile` to regenerate new assets and try again."
# updated if there are any changes to Sprockets processors.
module NoDoubleDigest
def digest_path
logical_path.match?(/wp\h{32}/) ? logical_path : super
logical_path.match?(/wp\h{20}/) ? logical_path : super
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wjordan , I am particularly interested in your opinion on this aspect of the plan, where we tell sprockets to skip digesting for assets which already have webpack content hashes.

@davidsbailey davidsbailey changed the base branch from generate-webpack-manifest to staging September 25, 2019 19:37
@davidsbailey davidsbailey changed the base branch from staging to generate-webpack-manifest September 25, 2019 19:38
Copy link
Contributor

@islemaster islemaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me. If we're going to split this responsibility between Webpack and Rails Asset Pipeline, having a clear name like webpack_asset_path helps a lot.

// Ideally, the target file would have the .min.js suffix in
// production mode. This could be accomplished by nesting these files
// within a minifiable-lib directory in the output package so that the
// manifest plugin could do special processing on these files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this ideal? I thought it was a good thing that we were simplifying down to one way to reference a "built" bundle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this comment wasn't very clear. The proposed future change would be to change the name of the file in the output bundle so that the names of minified files have .min in them, not to include more copies of the file.

In production mode today we have something like this in the manifest:

  "js/droplet/droplet-full.js": "/assets/js/droplet/droplet-fullwp0573b060df80c10bff8ba38a881f31a0.js"

if we took these extra steps, we could have something with .min in the path value like this:

  "js/droplet/droplet-full.js": "/assets/js/droplet/droplet-full.minwp0573b060df80c10bff8ba38a881f31a0.js"

This is slightly better because you can see that you're getting a minified resource when you look at what your browser is downloading, whereas with the currently proposed approach a developer may say "why am I getting non-minified droplet?" and spend some time investigating before discovering these comments.

@@ -54,7 +54,7 @@ Rerun `assets:precompile` to regenerate new assets and try again."
# updated if there are any changes to Sprockets processors.
module NoDoubleDigest
def digest_path
logical_path.match?(/wp\h{32}/) ? logical_path : super
logical_path.match?(/wp\h{20}/) ? logical_path : super
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused - does this logic depend on Webpack generating exactly 20-character hashes? How does this interact with the 32-character hashes we drop after copy-webpack-plugin, above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I glossed over something here. The regex on this line matches either a 32- or 20-character hash, since all it checks for is wp followed by at least 20 hex characters. I will add a comment stating this.

The [contenthash] in webpack's output.filename results in a 20-character hash, which is why this line had to be changed:

const hash = minify && !debugMinify ? 'wp[contenthash]' : '';
var config = _.extend({}, baseConfig, {
output: {
path: outputDir,
publicPath: '/assets/js/',
filename: `[name]${hash}${suffix}`

strangely, the [contenthash] placeholder consumed by copy-webpack-plugin gives a 32-character hash:

new CopyPlugin([
{
from: 'build/locales',
to: minify
? '[path]/[name]wp[contenthash].[ext]'
: '[path]/[name].[ext]',
toType: 'template'
},

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these references be added to the comment so this knowledge doesn't get lost in the future? Without this reference I don't think it'll be obvious why it's a '20 or 32 character string' rather than one or the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

@davidsbailey davidsbailey changed the base branch from generate-webpack-manifest to staging September 26, 2019 19:01
@davidsbailey
Copy link
Member Author

PTAL, @islemaster. After rebasing (apologies) onto Parts 1 & 2, I have done some local verification which is now listed in the PR description.

Copy link
Contributor

@islemaster islemaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind_blown_2

Copy link
Contributor

@wjordan wjordan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, I just added a couple comments and clarifying questions, mostly around the documentation.

Overall I'm still a bit overwhelmed by the complexity of the two overlapping asset-compilation pipelines we've mashed together, and I'm not entirely sure I understand/remember how every piece fits together any longer, but this seems to be a great incremental step in the direction of simplifying them and also unblocks what could be huge performance optimizations as well.

apps/webpack.js Outdated
const suffix = minify && !debugMinify ? '.min.js' : '.js';
const suffix = minify ? '.min.js' : '.js';
// This generates a 20-hex-character hash.
const hash = minify ? 'wp[contenthash]' : '';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be simplified by just adding the hash to suffix rather than a second variable, e.g., const suffix = minify ? 'wp[contenthash].min.js' : '.js'

@@ -172,49 +172,75 @@ different configurations get run. Here is the approximate flow of commands/files

The rails asset pipeline can be configured to do bundling and minification,
but it is a significantly less powerful tool than webpack. As a result, we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're doing an updating pass on this documentation text: The above comment was written 3 years ago by someone no longer working on this project, and it's unclear what 'significantly less powerful tool' means or whether it's still relevant, though it seems to hint at some basis for the current architecture. I'd suggest either removing the line or updating it to refer to something more specific and unambiguous in the current design.

only use the rails asset pipeline to add "digests" to the few assets which
do not already contain hashes generated by webpack, and to retain each
hashed/digested asset for a few deploys so that links to them do not break
immediately after each deploy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions about this line:

  1. What are the 'few assets which do not already contain hashes generated by webpack'? (I see application.js / application.css are listed in the paragraph below, but also 'a few other'.) Are these a small handful of 'legacy' assets that are in the process of being migrated over fully (so that this number will eventually be zero), or will there always be a 'few' assets in this state moving forward? Are they a certain category of asset that can be specified more specifically? (Edit: I see this is described in a bit more detail in the 'Future work' in the design doc- perhaps worth referencing that here, but not necessary, at least I understand it a bit better.)

  2. Does this line mean that all hashed/digested assets are retained for a few deploys by the rails asset pipeline, or only the 'few assets' will be retained?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The "few assets which do not already contain hashes generated by webpack" are the union of:
    a. The 85 calls to javascript_include_tag, stylesheet_link_tag or asset_path which do not point to application.js or application.css (as described in Future work).
    b. application.js and application.css

(a) will be relatively easy to move into webpack.

(b) will be hard to change any time soon, so I'm advocating that we leave application.js and application.css as the only two things not being referenced via webpack_asset_path once follow-up work is complete.

  1. all hashed/digested assets are retained for a few deploys by the rails asset pipeline.

When loading a bundle in dashboard or pegasus in production via the
`webpack_asset_path` helper, the helper will look in the webpack manifest to
determine the full path of the asset to load, including the content hash.
This method is preferred over those methods listed in the previous paragraph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that this method is interchangeable with the previous methods? Is it preferred because the previous methods are deprecated and will be eventually removed? Does it depend on where and how the asset is being stored / generated?

Also, is there a difference between 'bundle' and 'asset'? They are used to refer to the same thing in this paragraph. Is an image, for example, both a 'bundle' and an 'asset', or is it an 'asset' but not a 'bundle'? Does this preferred approach only work for 'bundles', or for all 'assets'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webpack_asset_path is not interchangeable with previous methods. Previous methods (javascript_include_tag, stylesheet_link_tag and asset_path) are deprecated, except for references to application.js / application.css.

a "bundle" is a file which was output specifically by webpack as defined here: https://webpack.js.org/concepts/#output

the rails asset pipeline seems to refer to both its inputs and its outputs as "assets", and this is how I'd recommend that we use this term.

Please let me know if that clarifies things for you. I will go through and make another pass of this document to follow (and perhaps define) this particular part of the terminology.

Copy link
Contributor

@wjordan wjordan Sep 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that helps quite a bit.

That said, maybe it's because I'm not directly familiar with how we currently use Webpack loaders, but how we're using the concept of 'bundle' (more specifically, what kinds of content we imply by our use of the term) still isn't quite clear to me. In earlier JS-pipeline frameworks, a 'bundle' was strictly a set of compiled JavaScript sources that would be executed as a self-contained unit, through a well-defined interface of entry-module exports. Now with lazy-loading, code-splitting, and Webpack's whole giant ecosystem of loader plugins, a Webpack 'bundle' can contain pretty much any kind of content you want, but sometimes it can still mean 'mostly javascript, with a sprinkle of related content'.

I'd like a bit more clarity on what kinds of content we reasonably expect to load in webpack bundles (if it's more than strictly JS sources) - especially if this pipeline will be our 'preferred' approach for incorporating assets into our application moving forward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the reference to the term "bundle" from this part of my added comments. there are a few uses of this term earlier on the page, but I'd like to postpone tackling updates to those section because our bundling strategy is likely to change in the course of the upcoming optimizations.

@@ -54,7 +54,7 @@ Rerun `assets:precompile` to regenerate new assets and try again."
# updated if there are any changes to Sprockets processors.
module NoDoubleDigest
def digest_path
logical_path.match?(/wp\h{32}/) ? logical_path : super
logical_path.match?(/wp\h{20}/) ? logical_path : super
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these references be added to the comment so this knowledge doesn't get lost in the future? Without this reference I don't think it'll be obvious why it's a '20 or 32 character string' rather than one or the other.

@davidsbailey
Copy link
Member Author

Overall I'm still a bit overwhelmed by the complexity of the two overlapping asset-compilation pipelines we've mashed together, and I'm not entirely sure I understand/remember how every piece fits together any longer, but this seems to be a great incremental step in the direction of simplifying them and also unblocks what could be huge performance optimizations as well.

Thank you for the detailed review, @wjordan ! I will make another pass through the comments and docs to make sure the answers to your questions in this last round of feedback are clarified, without having to dig up this PR to understand what's going on.

i agree it is confusing to use two asset managing tools (webpack plus rails / sprockets). one follow-up steps described in future work are completed, we'll essentially be using the rails asset pipeline only for (1) application.js/css, and (2) to retain copies of hashed/digested assets for a few deploys. If you think it might simplify things to find a different solution for reason (2) (including possibly implementing it ourselves), we might be able to simplify things a bit further by not passing the webpack output files through rails asset pipeline at all. if we get to that point, each asset would at least only get handled by one asset management tool. food for thought.

@davidsbailey
Copy link
Member Author

Could these references be added to the comment so this knowledge doesn't get lost in the future? Without this reference I don't think it'll be obvious why it's a '20 or 32 character string' rather than one or the other.

I'm not sure what happened to this comment on assets_sync.rake, but I've added a brief reference to Gruntfile.js to see when each content hash length might be added.

@davidsbailey
Copy link
Member Author

Thanks again for your detailed feedback, Will. I've updated comments to help clarify the answers to your questions. Please let me know if you see areas which require further clarification.

@wjordan
Copy link
Contributor

wjordan commented Oct 1, 2019

The updated comments all look great and I think I have a much clearer understanding of the system now, thanks for your patience! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants