Skip to content

w3C mailman ingress fails due to HTML change #600

@laurenmarietta

Description

@laurenmarietta

I was trying to run collect-mail on the provided list of W3C mailing lists and was confused as to why almost all of the URLs weren't returning files.

It seems that, sometime between 2020 and now, W3C has subtly changed the HTML for its mailing list archives. From this Internet Archive page, they used to be listed in a div with class=messages-list, but today's version of the same URL lists them within a main tag under class=messages-list. Because the W3CMailList class explicitly looks for email records under div.messages-list, the scans are coming up empty.

Not sure if it's as simple a fix as replacing div with main or if there are more complicated things to consider here!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions