ECAU: Generic provider #305

ROpdebee · 2021-12-17T19:37:08Z

Could be useful for random "Purchase for mail order" URLs. Have it fetch the image found in the og:image meta property, if it exists.

The text was updated successfully, but these errors were encountered:

ROpdebee · 2022-01-24T13:51:45Z

On second though, I'm not sure whether this is a good idea. It'd rely on heuristics, which can be finnicky, and the image displayed on "Purchase for mail order" pages might not accurately represent a physical release. Also, the generic provider might work for providers which really need a special-purpose provider because they might have multiple images (e.g. DatPiff would work with the og:image property, but also offers a back cover which would be missed by a generic provider).

ROpdebee · 2023-04-23T16:10:17Z

Another possibility which seems to work OK on ebay:

Find the "semantic center" of the page (e.g., the <h1> element)
Find all <img> elements and rank them according to some tree distance metric to the semantic center.
Find the highest ranked image whose dimensions are above some arbitrary limit (e.g. 250x250 or 500x500). Probably need to maximise them beforehand.
(Possibly: Expand the search while the distance doesn't increase too much, with a threshold based on the average distance between images and the semantic center, like 50% of the average, or some more statistically sound threshold taking standard deviation and quartiles into account).

For the tree distance metric, there are a few options that I've tested and all of them seem like they'd work:

Maximise depth of the deepest common ancestor.
Minimise depth between image and common ancestor + depth between h1 and common ancestor - depth of ancestor
Minimise steps between image and common ancestor + steps between h1 and common ancestor - depth of ancestor, with steps calculated as the depth between element and ancestor and the number of predecessor nodes at each level.

Another possibility might be to minimise the total number of nodes "between" the image and the h1 (including the size of any subtree that's between them).

At least for ebay, which I've been experimenting on, simply taking the largest image won't work, since that'll select the logo.

ROpdebee added mb_enhanced_cover_art_uploads new provider labels Dec 17, 2021

ROpdebee assigned ROpdebee and unassigned ROpdebee Jan 24, 2022

ROpdebee mentioned this issue Apr 23, 2023

support Bandcamp pages with custom (non-Bandcamp) domain #637

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECAU: Generic provider #305

ECAU: Generic provider #305

ROpdebee commented Dec 17, 2021

ROpdebee commented Jan 24, 2022 •

edited

ROpdebee commented Apr 23, 2023 •

edited

ECAU: Generic provider #305

ECAU: Generic provider #305

Comments

ROpdebee commented Dec 17, 2021

ROpdebee commented Jan 24, 2022 • edited

ROpdebee commented Apr 23, 2023 • edited

ROpdebee commented Jan 24, 2022 •

edited

ROpdebee commented Apr 23, 2023 •

edited