Tech report: category-level aggregations

Use case: when comparing technologies within the same category, it can be useful to know how they all compare to some kind of category-level aggregation over all pages within the category.

Mockup:
<img width="1093" alt="image" src="https://github.com/user-attachments/assets/08f75ca2-d6d9-4958-97e5-eca31d80a640">

The blue line represents an aggregation of all pages within the CMS category, so a user can see how it compares to specific technologies within that category. It could also be possible to compare entire categories.

The technical implementation could look something like this:

- update the `technologies` table schema to include a field indicating whether the row pertains to a technology or a category aggregation
  - all dimensions supported: rank, client, geo
  - backfill all historical data
- provide a param in the API endpoints to distinguish between the two, only returning data for the selected aggregation type (default: technology)
- add categories to the UI, similar to the special "ALL" technology

In terms of the schema changes, we currently have the following fields:

- date (2024-08-01)
- geo (ALL)
- rank (ALL)
- category (CMS)
- app (WordPress)
- client (desktop)
- [stats] where each field is aggregated over the set of pages that use WordPress for the given dimensions

The updated schema would look something like this for the CMS-level aggregation:

- date (2024-08-01)
- **type** (category)
- geo (ALL)
- rank (ALL)
- category (CMS)
- app (All CMSs)
- client (desktop)
- [stats] where each field is aggregated over the set of pages that one or more CMS for the given dimensions

Calculating category-level data based on technology-level aggregations won't work because percentiles cannot accurately be aggregated together. At best we'd be able to do a weighted average of the medians, but this would also not solve the issue of deduplicating origins that appear multiple times in a category because they use multiple technologies. For example, jQuery UI is always used with jQuery within the JS libraries category, but those websites would be counted twice. So the implementation would need to process the raw origin-level data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tech report: category-level aggregations #928

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Tech report: category-level aggregations #928

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions