Group CSS features #1519

nzakas · 2025-04-08T18:54:30Z

One of the valuable parts of the mdn-data package is how it separates CSS features into different categories:

At-rules
functions
properties
selectors
syntaxes
types
units

In the current webref package, it's just a collection of objects that we then need to dig into to figure out what types are contained within. It would be helpful if the categories could be exposed at the top level of the package and list every entry for that category regardless of spec.

The text was updated successfully, but these errors were encountered:

tidoust · 2025-04-09T10:23:51Z

I note the current @webref/css package already separates at the root level between:

at-rules
properties
selectors
and "values", which is a mixed bag of things.

The mixed bag of things exists because CSS specs do not really distinguish between other types when they define concepts. There is a notion of function but the specs do not necessarily use that consistently. That ambiguity seems to appear in mdn-data too. For example, the abs() function appears both as a "function" and as a "syntax" in mdn-data.

CSS specs do use a type definition type too, which could perhaps be used to populate a related category. There seems to be many more type definitions in specs than in what mdn-data currently lists as types. For example, line-color-list, linear-color-stop, ident-token are all type definitions from a spec perspective. If they are not in the list on purpose, is there a way to distinguish between types?

CSS specs define units as value definitions that are for something. It may be relatively easy to assemble the list of units automatically with a short list of underlyling types. For example, looking at all values defined for <angle>, <length> and a few others.

Essentially, the question is: can CSS features be categorized automatically? If not, what amount of manual data would need to be maintained?

nzakas · 2025-04-09T17:49:08Z

Thanks for the response. A follow-up question: assuming everyone wants webref packages to be as useful as possible, is there a reason the specs themselves can't be updated to encode this information where appropriate?

tidoust · 2025-04-10T10:01:02Z

No reason in theory and, on top of trying to reduce the amount of work needed to maintain Webref, we also restrict the amount of data that needs to be manually injected in Webref to a bare minimum as a way to push fixes and improvements back to the underlying specs.

In practice there are ~120 CSS specs at various levels of maturity and activity, with dozens of editors and >3800 open issues. We already maintain a few patches in Webref for things that need fixing in CSS specs to get consistent data (these patches link back to issues raised against the specs). If most CSS specs need to be updated to provide additional semantics, that's likely going to require elbow grease both to convince CSS WG participants that the effort is worth prioritizing and to help with the actual updates. That's also why I'm trying to assess whether missing categories can already be determined automatically from available information.

nzakas · 2025-04-10T15:37:16Z

Ah gotcha, thanks for explaining. 👍

tidoust · 2025-04-28T15:27:17Z

I explored a bit the differences between MDN data and Webref, see underlying code in tidoust/mdn-webref, along with the results:

The webref.json file, which could represent what we may want to end up with in Webref to ease consumption of data.
The report, which highlights differences between the two projects.

As far as I can tell, missing data in Webref is mostly stuff that is non standard or that has been obsoleted, but that is still present in MDN data (and sometimes documented on MDN). I do not know to what extent that data is a must have in Webref. There's more data missing in MDN data, perhaps because the underlying features are more recent and not yet documented.

There may be a few cases where data needs to be slightly improved in specs so that it can start appearing in Webref. One example is <general-enclosed> which is currently defined in a <pre> tag without any class, skipped by the crawler as too generic. That seems easily fixable.

I still do not understand what syntaxes are meant to encompass. I managed to cover most of them by assembling functions and types, but that also creates hundreds of syntaxes that are not accounted for in MDN data. Are syntaxes used in practice? How?

(On top of the features themselves, I note that the grouping information in MDN data does not exist in Webref. That grouping seems more specific to MDN though. Same thing for links to MDN pages).

nzakas · 2025-04-28T15:51:04Z

Syntaxes are used in CSSTree to enable validation:
https://github.com/csstree/csstree/blob/9558ba790daeda2b24935838bf89990699ece66e/lib/data.js#L7

Basically, the parser creates an AST and the lexer validates the AST against these syntax definitions.

tidoust · 2025-05-01T10:20:27Z

Thanks @nzakas. I had not realized that entries in the "types" category in MDN data do not have a syntax key and that the "syntaxes" category collects that information. I'm not sure why functions are listed under the "syntaxes" category too, as that seems to duplicate the information already present in the functions.json file. All in all, I think the "syntaxes" category can be assembled by merging the "functions" and "types" categories, provided entries there do have a syntax key of course.

That initial exploration suggests that the categorization itself can be done automatically, with straightforward reasons that explain why some data is missing in Webref. That's a good first result!

I'll now look into actual syntax values to understand where and why Webref differs from MDN data. I somewhat expect to find more substantive differences as MDN data syntaxes are manually curated to match reality in main browsers if I understand things correctly, while Webref data is more meant to be a view of what latest specs drafts currently define, regardless of what browsers support. When specs lag behind implementations, they need fixing, knowing about the problem creates a good feedback loop. When specs are more recent than implementations, it may be challenging to select the right syntax automatically. Anyway, let's find out ;)

nzakas · 2025-05-01T15:08:43Z

Thanks for the update and all of our work on this. 🙏

This adds a cssmerge post-processor at the crawl level that consolidates definitions from CSS extracts into a single file. The CSS entries in the resulting file are de-duplicated. The module starts with a number of comments that detail the approach. The intent is to create a view similar to that offered in MDN data to address w3c/webref#1519 Comments at the top of the module also detail the main differences between the two views. Main one is structural: we tend to use arrays in Webref so this code also produces arrays, while MDN data uses indexed objects. It's easy to build an indexed view from an array though. Consolidation focuses on syntaxes and the code does not attempt to compute the `units` list that exists in MDN data. It is easy to build that list by looking at values of a handful of CSS properties. We could add it later on to the consolidated file if that proves needed. A couple of remaining open questions: 1. This creates a `css.json` file, similar to the `events.json` file that we create when we post-process events. That's nice but it would be good to ship that file in `@webref/css` in the end. That package already has a `CSS.json` file at the root. How/Where to package the `css.json` file? 2. To de-duplicate, the most recent definition is used. That's the only approach I can think of that can be automated. Consumers of MDN data may need a more nuanced approach where the syntax better matches what core browsers currently support. We can perhaps refine the process later on if needed with some sort of patching mechanism (but my hunch is that this will remain a manual process).

tidoust mentioned this issue May 30, 2025

Add cssmerge post-processor to consolidate CSS extracts w3c/reffy#1849

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Group CSS features #1519

Group CSS features #1519

nzakas commented Apr 8, 2025

tidoust commented Apr 9, 2025

Uh oh!

nzakas commented Apr 9, 2025

Uh oh!

tidoust commented Apr 10, 2025

Uh oh!

nzakas commented Apr 10, 2025

Uh oh!

tidoust commented Apr 28, 2025 •

edited

Loading

Uh oh!

nzakas commented Apr 28, 2025

Uh oh!

tidoust commented May 1, 2025

Uh oh!

nzakas commented May 1, 2025

Uh oh!

Group CSS features #1519

Group CSS features #1519

Comments

nzakas commented Apr 8, 2025

tidoust commented Apr 9, 2025

Uh oh!

nzakas commented Apr 9, 2025

Uh oh!

tidoust commented Apr 10, 2025

Uh oh!

nzakas commented Apr 10, 2025

Uh oh!

tidoust commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nzakas commented Apr 28, 2025

Uh oh!

tidoust commented May 1, 2025

Uh oh!

nzakas commented May 1, 2025

Uh oh!

tidoust commented Apr 28, 2025 •

edited

Loading