Skip to content

Group CSS features #1519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nzakas opened this issue Apr 8, 2025 · 8 comments
Open

Group CSS features #1519

nzakas opened this issue Apr 8, 2025 · 8 comments

Comments

@nzakas
Copy link

nzakas commented Apr 8, 2025

One of the valuable parts of the mdn-data package is how it separates CSS features into different categories:

  • At-rules
  • functions
  • properties
  • selectors
  • syntaxes
  • types
  • units

In the current webref package, it's just a collection of objects that we then need to dig into to figure out what types are contained within. It would be helpful if the categories could be exposed at the top level of the package and list every entry for that category regardless of spec.

@tidoust
Copy link
Member

tidoust commented Apr 9, 2025

I note the current @webref/css package already separates at the root level between:

  • at-rules
  • properties
  • selectors
  • and "values", which is a mixed bag of things.

The mixed bag of things exists because CSS specs do not really distinguish between other types when they define concepts. There is a notion of function but the specs do not necessarily use that consistently. That ambiguity seems to appear in mdn-data too. For example, the abs() function appears both as a "function" and as a "syntax" in mdn-data.

CSS specs do use a type definition type too, which could perhaps be used to populate a related category. There seems to be many more type definitions in specs than in what mdn-data currently lists as types. For example, line-color-list, linear-color-stop, ident-token are all type definitions from a spec perspective. If they are not in the list on purpose, is there a way to distinguish between types?

CSS specs define units as value definitions that are for something. It may be relatively easy to assemble the list of units automatically with a short list of underlyling types. For example, looking at all values defined for <angle>, <length> and a few others.

Essentially, the question is: can CSS features be categorized automatically? If not, what amount of manual data would need to be maintained?

@nzakas
Copy link
Author

nzakas commented Apr 9, 2025

Thanks for the response. A follow-up question: assuming everyone wants webref packages to be as useful as possible, is there a reason the specs themselves can't be updated to encode this information where appropriate?

@tidoust
Copy link
Member

tidoust commented Apr 10, 2025

No reason in theory and, on top of trying to reduce the amount of work needed to maintain Webref, we also restrict the amount of data that needs to be manually injected in Webref to a bare minimum as a way to push fixes and improvements back to the underlying specs.

In practice there are ~120 CSS specs at various levels of maturity and activity, with dozens of editors and >3800 open issues. We already maintain a few patches in Webref for things that need fixing in CSS specs to get consistent data (these patches link back to issues raised against the specs). If most CSS specs need to be updated to provide additional semantics, that's likely going to require elbow grease both to convince CSS WG participants that the effort is worth prioritizing and to help with the actual updates. That's also why I'm trying to assess whether missing categories can already be determined automatically from available information.

@nzakas
Copy link
Author

nzakas commented Apr 10, 2025

Ah gotcha, thanks for explaining. 👍

@tidoust
Copy link
Member

tidoust commented Apr 28, 2025

I explored a bit the differences between MDN data and Webref, see underlying code in tidoust/mdn-webref, along with the results:

  1. The webref.json file, which could represent what we may want to end up with in Webref to ease consumption of data.
  2. The report, which highlights differences between the two projects.

As far as I can tell, missing data in Webref is mostly stuff that is non standard or that has been obsoleted, but that is still present in MDN data (and sometimes documented on MDN). I do not know to what extent that data is a must have in Webref. There's more data missing in MDN data, perhaps because the underlying features are more recent and not yet documented.

There may be a few cases where data needs to be slightly improved in specs so that it can start appearing in Webref. One example is <general-enclosed> which is currently defined in a <pre> tag without any class, skipped by the crawler as too generic. That seems easily fixable.

I still do not understand what syntaxes are meant to encompass. I managed to cover most of them by assembling functions and types, but that also creates hundreds of syntaxes that are not accounted for in MDN data. Are syntaxes used in practice? How?

(On top of the features themselves, I note that the grouping information in MDN data does not exist in Webref. That grouping seems more specific to MDN though. Same thing for links to MDN pages).

@nzakas
Copy link
Author

nzakas commented Apr 28, 2025

Syntaxes are used in CSSTree to enable validation:
https://github.com/csstree/csstree/blob/9558ba790daeda2b24935838bf89990699ece66e/lib/data.js#L7

Basically, the parser creates an AST and the lexer validates the AST against these syntax definitions.

@tidoust
Copy link
Member

tidoust commented May 1, 2025

Thanks @nzakas. I had not realized that entries in the "types" category in MDN data do not have a syntax key and that the "syntaxes" category collects that information. I'm not sure why functions are listed under the "syntaxes" category too, as that seems to duplicate the information already present in the functions.json file. All in all, I think the "syntaxes" category can be assembled by merging the "functions" and "types" categories, provided entries there do have a syntax key of course.

That initial exploration suggests that the categorization itself can be done automatically, with straightforward reasons that explain why some data is missing in Webref. That's a good first result!

I'll now look into actual syntax values to understand where and why Webref differs from MDN data. I somewhat expect to find more substantive differences as MDN data syntaxes are manually curated to match reality in main browsers if I understand things correctly, while Webref data is more meant to be a view of what latest specs drafts currently define, regardless of what browsers support. When specs lag behind implementations, they need fixing, knowing about the problem creates a good feedback loop. When specs are more recent than implementations, it may be challenging to select the right syntax automatically. Anyway, let's find out ;)

@nzakas
Copy link
Author

nzakas commented May 1, 2025

Thanks for the update and all of our work on this. 🙏

tidoust added a commit to w3c/reffy that referenced this issue May 30, 2025
This adds a cssmerge post-processor at the crawl level that consolidates
definitions from CSS extracts into a single file. The CSS entries in the
resulting file are de-duplicated. The module starts with a number of comments
that detail the approach.

The intent is to create a view similar to that offered in MDN data to address
w3c/webref#1519

Comments at the top of the module also detail the main differences between the
two views. Main one is structural: we tend to use arrays in Webref so this code
also produces arrays, while MDN data uses indexed objects. It's easy to build
an indexed view from an array though.

Consolidation focuses on syntaxes and the code does not attempt to compute the
`units` list that exists in MDN data. It is easy to build that list by looking
at values of a handful of CSS properties. We could add it later on to the
consolidated file if that proves needed.

A couple of remaining open questions:
1. This creates a `css.json` file, similar to the `events.json` file that we
create when we post-process events. That's nice but it would be good to ship
that file in `@webref/css` in the end. That package already has a `CSS.json`
file at the root. How/Where to package the `css.json` file?
2. To de-duplicate, the most recent definition is used. That's the only
approach I can think of that can be automated. Consumers of MDN data may need
a more nuanced approach where the syntax better matches what core browsers
currently support. We can perhaps refine the process later on if needed with
some sort of patching mechanism (but my hunch is that this will remain a manual
process).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants