Skip to content
This repository has been archived by the owner on Mar 6, 2024. It is now read-only.

Refactor (another one) #47

Merged
merged 52 commits into from
Aug 22, 2018
Merged

Conversation

RReverser
Copy link
Contributor

Primary changes:

  • Upgraded dependencies (in particular, those with vulnerabilities).
  • Fixed code to match new xo lints after upgrade (const and let instead of var, arrow functions where possible, destructuring etc.).
  • Autoformatted code with Prettier (now an option in xo).
  • Added automatic lint and formatting of staged files on commit with husky.
  • Replaced Bluebird with native async functions
  • Added filenames to UglifyJS and PostCSS invocations for better error reporting (previously URL wasn't reported so it was hard to figure out where error occurs).
  • Added helper for consistent URL parsing in CSS (previously URLs for imports and those in properties were parsed differently, one with custom regex and another with PostCSS value parser).
  • Preserving any data URIs and not just those encoded as base64.
  • Added reporting of UglifyJS warnings (sometimes useful for debugging, corresponds to warning level if set via CLI).
  • Simplified external collapser functions to avoid repeating URLs and log message code.
  • Switched to rely on Content-Type provided by the server instead of trying to guess it with mmmagic - the latter previously led to various detection bugs even when server served the image correctly, and generally guessing mime types is more dangerous (as seen in history of X-Content-Type-Options: nosniff).
  • Added printing of final HTML when collapsing with a CLI tool so that it can be saved for inspection.
  • Switched logs to process.stderr instead of process.stdout so that they don't get mixed with the content.
  • Added graceful recovery from UglifyJS errors (just because UglifyJS can't parse something, it doesn't mean that that JS is necessary invalid, and so it's better to log it but move on with collapsing).
  • Improved HTTP resiliency by adding limit to parallel requests (for now hardcoded to 8), request retries (hardcoded to 5) and HTTP cache to avoid repeated fetches.
  • Switched to parse5-html-rewriting-stream for HTML parsing. PostHTML uses htmlparser2 which has worse compliance with spec, and, in particular, doesn't decode/encode text and attributes, which currently means 1) collapsify would try to use raw attributes for validation against forbidden regex, but then decode them before sending request (this might lead to fetching forbidden resources) and 2) when constructing data-URI, potentially invalid attribute characters would not get escaped (this might lead to XSS on collapsed pages). Also this allows to process HTML in a streaming fashion, modifying only the parts we're actually interested in.

 - Add limit of maximum 8 parallel requests.
 - Add 5 retries in case of request failure.
 - Add cache to use while fetching the page.
parse5 has much better compliance with spec than htmlparser2 used in PostHTML, performs attribute decoding as per spec and allows to granularly rewrite only parts of HTML we're interesting in while preserving the surrounding formatting.
}

function collapse(buf) {
return base64Utils.encode(buf);
function collapse(body, opts) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the collapse functions should be asynchronous for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We anyway await them, which supports both sync and async versions with no differences, but yeah, I guess I could do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

.use(require('../plugins/posthtml-flatten-style')(opts))
.use(require('../plugins/posthtml-flatten-script')(opts))
.process(String(buf));
function collapse(body, opts) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns a Promise, but it's not entirely clear right now. Mind also making this async?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


return opts.fetch(url).then(collapse);
async function external({fetch, resourceLocation: url}) {
const {contentType, body} = await fetch(url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Servers don't always return content-types browsers accept for dataURIs. It might be better to prefer a list for common file extensions, and fall back to the content-type if unknown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I definitely don't want to "guess" content-type again, it's a very insecure practice, and there is a reason browsers don't do that. We shouldn't change how it treats the content just by server-side extensions - if real browser wouldn't show the image on original HTML because of incorrect Content-Type, it shouldn't do that on the resulting HTML either.

Servers don't always return content-types browsers accept for dataURIs.

Can you provide an example of content-type that would work on a network response but not dataURI (after removing spaces)? From what I've seen, anything should work; moreover, as per spec, it's optional and even data URIs without any content-type like data:,123 are perfectly valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely not saying we should guess, that was a mistake that was going to be the first thing I removed if I ever touched this again.

I'll have to dig through my archives to see if I can find the cases I found before.

@@ -0,0 +1,46 @@
'use strict';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to eventually see this with snapshot tests instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not part of this PR :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intended as "nitpick"/":heart:"

bin/cli.js Outdated
digits: 2
})
);
process.stdout.write(output);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was avoided because the output can get rather large. If you must, accept a flag to enable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually just redirect output to a file, but yeah, flag sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -0,0 +1,76 @@
'use strict';
const {resolve} = require('url');
const logger = require('bole')('posthtml-flatten-script');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your logger name isn't matching the rest of collapsify.

Copy link
Contributor Author

@RReverser RReverser Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That name was in posthtml-flatten-script and was just moved over here, but I agree it should've been consistent (or at least renamed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (decided to log to collapsers:html group).

url = he.decode(url);
const client = got.extend({
headers: {
'user-agent': `Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10; rv:33.0) Gecko/20100101 Firefox/33.0 Collapsify/${VERSION} node/${
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Firefox 33 is old now. Might be worth updating.

@RReverser
Copy link
Contributor Author

Resolved comments (except for data URI) and bumped package version.

@RReverser

This comment has been minimized.

@@ -11,7 +11,8 @@
"contributors": [
"Christopher Joel <chris@scriptolo.gy> (http://scriptolo.gy)",
"Terin Stock <terinjokes@gmail.com> (http://terinstock.com)",
"Andrew Galloni <andrew@cloudflare.com>"
"Andrew Galloni <andrew@cloudflare.com>",
"Ingvar Stepanyan <me@rreverser.com> (https://rreverser.com)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so sorry.

@RReverser RReverser merged commit d0c515a into cloudflare:master Aug 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants