-
-
Notifications
You must be signed in to change notification settings - Fork 48
Description
Use case: when comparing technologies within the same category, it can be useful to know how they all compare to some kind of category-level aggregation over all pages within the category.
The blue line represents an aggregation of all pages within the CMS category, so a user can see how it compares to specific technologies within that category. It could also be possible to compare entire categories.
The technical implementation could look something like this:
- update the
technologiestable schema to include a field indicating whether the row pertains to a technology or a category aggregation- all dimensions supported: rank, client, geo
- backfill all historical data
- provide a param in the API endpoints to distinguish between the two, only returning data for the selected aggregation type (default: technology)
- add categories to the UI, similar to the special "ALL" technology
In terms of the schema changes, we currently have the following fields:
- date (2024-08-01)
- geo (ALL)
- rank (ALL)
- category (CMS)
- app (WordPress)
- client (desktop)
- [stats] where each field is aggregated over the set of pages that use WordPress for the given dimensions
The updated schema would look something like this for the CMS-level aggregation:
- date (2024-08-01)
- type (category)
- geo (ALL)
- rank (ALL)
- category (CMS)
- app (All CMSs)
- client (desktop)
- [stats] where each field is aggregated over the set of pages that one or more CMS for the given dimensions
Calculating category-level data based on technology-level aggregations won't work because percentiles cannot accurately be aggregated together. At best we'd be able to do a weighted average of the medians, but this would also not solve the issue of deduplicating origins that appear multiple times in a category because they use multiple technologies. For example, jQuery UI is always used with jQuery within the JS libraries category, but those websites would be counted twice. So the implementation would need to process the raw origin-level data.
