-
Notifications
You must be signed in to change notification settings - Fork 51
Conversation
Using https://github.com/sgilroy/async-await-codemod to upgrade codebase.
It's now required for correct URL resolution.
- Add limit of maximum 8 parallel requests. - Add 5 retries in case of request failure. - Add cache to use while fetching the page.
parse5 has much better compliance with spec than htmlparser2 used in PostHTML, performs attribute decoding as per spec and allows to granularly rewrite only parts of HTML we're interesting in while preserving the surrounding formatting.
lib/collapsers/binary.js
Outdated
} | ||
|
||
function collapse(buf) { | ||
return base64Utils.encode(buf); | ||
function collapse(body, opts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the collapse
functions should be asynchronous for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We anyway await
them, which supports both sync and async versions with no differences, but yeah, I guess I could do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
lib/collapsers/html.js
Outdated
.use(require('../plugins/posthtml-flatten-style')(opts)) | ||
.use(require('../plugins/posthtml-flatten-script')(opts)) | ||
.process(String(buf)); | ||
function collapse(body, opts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns a Promise, but it's not entirely clear right now. Mind also making this async
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
return opts.fetch(url).then(collapse); | ||
async function external({fetch, resourceLocation: url}) { | ||
const {contentType, body} = await fetch(url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Servers don't always return content-types browsers accept for dataURIs. It might be better to prefer a list for common file extensions, and fall back to the content-type if unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I definitely don't want to "guess" content-type again, it's a very insecure practice, and there is a reason browsers don't do that. We shouldn't change how it treats the content just by server-side extensions - if real browser wouldn't show the image on original HTML because of incorrect Content-Type, it shouldn't do that on the resulting HTML either.
Servers don't always return content-types browsers accept for dataURIs.
Can you provide an example of content-type that would work on a network response but not dataURI (after removing spaces)? From what I've seen, anything should work; moreover, as per spec, it's optional and even data URIs without any content-type like data:,123
are perfectly valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely not saying we should guess, that was a mistake that was going to be the first thing I removed if I ever touched this again.
I'll have to dig through my archives to see if I can find the cases I found before.
@@ -0,0 +1,46 @@ | |||
'use strict'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to eventually see this with snapshot tests instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not part of this PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intended as "nitpick"/":heart:"
bin/cli.js
Outdated
digits: 2 | ||
}) | ||
); | ||
process.stdout.write(output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was avoided because the output can get rather large. If you must, accept a flag to enable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually just redirect output to a file, but yeah, flag sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
lib/plugins/parse5-flatten-script.js
Outdated
@@ -0,0 +1,76 @@ | |||
'use strict'; | |||
const {resolve} = require('url'); | |||
const logger = require('bole')('posthtml-flatten-script'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your logger name isn't matching the rest of collapsify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That name was in posthtml-flatten-script and was just moved over here, but I agree it should've been consistent (or at least renamed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed (decided to log to collapsers:html group).
lib/utils/httpclient.js
Outdated
url = he.decode(url); | ||
const client = got.extend({ | ||
headers: { | ||
'user-agent': `Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10; rv:33.0) Gecko/20100101 Firefox/33.0 Collapsify/${VERSION} node/${ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Firefox 33 is old now. Might be worth updating.
Resolved comments (except for data URI) and bumped package version. |
This comment has been minimized.
This comment has been minimized.
We can just read from package.json directly.
3abf72d
to
f03dbb2
Compare
@@ -11,7 +11,8 @@ | |||
"contributors": [ | |||
"Christopher Joel <chris@scriptolo.gy> (http://scriptolo.gy)", | |||
"Terin Stock <terinjokes@gmail.com> (http://terinstock.com)", | |||
"Andrew Galloni <andrew@cloudflare.com>" | |||
"Andrew Galloni <andrew@cloudflare.com>", | |||
"Ingvar Stepanyan <me@rreverser.com> (https://rreverser.com)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so sorry.
Primary changes:
xo
lints after upgrade (const
andlet
instead ofvar
, arrow functions where possible, destructuring etc.).xo
).husky
.Bluebird
with nativeasync
functionsexternal
collapser functions to avoid repeating URLs and log message code.Content-Type
provided by the server instead of trying to guess it withmmmagic
- the latter previously led to various detection bugs even when server served the image correctly, and generally guessing mime types is more dangerous (as seen in history ofX-Content-Type-Options: nosniff
).process.stderr
instead ofprocess.stdout
so that they don't get mixed with the content.parse5-html-rewriting-stream
for HTML parsing. PostHTML useshtmlparser2
which has worse compliance with spec, and, in particular, doesn't decode/encode text and attributes, which currently means 1) collapsify would try to use raw attributes for validation against forbidden regex, but then decode them before sending request (this might lead to fetching forbidden resources) and 2) when constructing data-URI, potentially invalid attribute characters would not get escaped (this might lead to XSS on collapsed pages). Also this allows to process HTML in a streaming fashion, modifying only the parts we're actually interested in.