Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No, Q is not the most popular physical CSS unit of length #513

Closed
rviscomi opened this issue Nov 18, 2019 · 6 comments · Fixed by #515
Closed

No, Q is not the most popular physical CSS unit of length #513

rviscomi opened this issue Nov 18, 2019 · 6 comments · Fixed by #515
Labels
Projects
Milestone

Comments

@rviscomi
Copy link
Member

@rviscomi rviscomi commented Nov 18, 2019

Regarding https://almanac.httparchive.org/en/2019/css#units @zcorpan on Twitter asked:

What is using the Q unit? Is it Japanese sites, or some common third-party CSS?

I started looking into this and I'm seeing many cases where Q was detected on stylesheets with base64-encoded data. I think these are false positives triggered by a weak regex I wrote for 02_07.sql:

// https://developer.mozilla.org/en-US/docs/Web/CSS/length
var units = ['cap', 'ch', 'em', 'ex', 'ic', 'lh', 'rem',
'rlh', 'vh', 'vw', 'vi', 'vb', 'vmin', 'vmax',
'px', 'cm', 'nm', 'Q', 'in', 'pc', 'pt'];
units = new Map(units.map(u => {
return [u, new RegExp(`\\\\d${u}\\\\b`)];
}));

I reran the query with one small change: ignore any value longer than 20 characters. Most values should be small like 250px or 1.6Q !important. The ones that are longer than 20 characters are probably made up of mostly base64 crap. Just a heuristic but it seems to have made a big difference:

image

Q is much further down the list where we'd expect it. And the major units are mostly unchanged.

I'll coordinate with @argyleink and @una to rewrite this section with the revised data.

@rviscomi rviscomi added this to the Après Ski milestone Nov 18, 2019
@rviscomi rviscomi added this to TODO in Web Almanac via automation Nov 18, 2019
@una

This comment has been minimized.

Copy link

@una una commented Nov 18, 2019

Oh wow! Yes, good catch there @rviscomi

@zcorpan

This comment has been minimized.

Copy link
Contributor

@zcorpan zcorpan commented Nov 18, 2019

Nice catch!

Does the CSS parser you use only give you a string for the whole value? More robust could be to use a CSS parser that gives you a list of component values, where you can iterate each component value and if it's a <dimension-token>, ask for its unit. Although that would probably also be slower to run.

It looks like the query only picks up the first length in a value when there are multiple. Right? That could miscount if pages do things like margin: 0px 1cm or so, though maybe that's so uncommon as to not change the outcome?

Would also be interesting to know about typoed units and use of non-standard units (like mozmm or __qem).

@rviscomi

This comment has been minimized.

Copy link
Member Author

@rviscomi rviscomi commented Nov 18, 2019

Does the CSS parser you use only give you a string for the whole value? More robust could be to use a CSS parser that gives you a list of component values, where you can iterate each component value and if it's a <dimension-token>, ask for its unit. Although that would probably also be slower to run.

Yeah it's only a string value. More granular tokenization would be super useful, but not supported by this parser.

Would also be interesting to know about typoed units and use of non-standard units (like mozmm or __qem).

Yes! A value tokenizer would be a big help for questions like this.

It looks like the query only picks up the first length in a value when there are multiple. Right? That could miscount if pages do things like margin: 0px 1cm or so, though maybe that's so uncommon as to not change the outcome?

You're right. I don't think it would be significant.

@zcorpan

This comment has been minimized.

Copy link
Contributor

@zcorpan zcorpan commented Nov 18, 2019

OK, thanks. https://github.com/tabatkins/parse-css could be used maybe as a wholesale drop-in replacement for next year, or just for tokenizing values specifically (with parseAListOfComponentValues() or maybe just tokenize()). cc @tabatkins

Web Almanac automation moved this from TODO to Done Nov 18, 2019
@rviscomi

This comment has been minimized.

Copy link
Member Author

@rviscomi rviscomi commented Nov 18, 2019

@tabatkins

This comment has been minimized.

Copy link

@tabatkins tabatkins commented Nov 18, 2019

OK, thanks. https://github.com/tabatkins/parse-css could be used maybe as a wholesale drop-in replacement for next year, or just for tokenizing values specifically (with parseAListOfComponentValues() or maybe just tokenize()). cc @tabatkins

Just straight tokenize() should work great; parseAListOfComponentValues will slurp up the contents of functions and blocks (in particular, the stuff between {} in a style block) and require you to tree-walk to find all the dimension tokens anyway. You just want the straight list of tokens.

(And from what I've heard, tokenize() is pretty dang fast, faster than regex-based methods that people had been using before switching to my library.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Web Almanac
  
Done
4 participants
You can’t perform that action at this time.