Add ArtiFinder discovered-artifact integration (website)#20
Open
vahldiek wants to merge 13 commits into
Open
Conversation
Surface ArtiFinder-discovered artifact links (from the pipeline's new artifinder stage) across the site. These links are not manually verified, carry no badges, and are excluded from all AE statistics/scores. - search + profile: render an 'Artifinder' provenance marker (logo + tooltip) next to discovered links; add #artifinder search keyword - new /artifinder.html discovery-statistics page (ECharts) with per-year and per-conference discovery counts and rate over time; linked from nav - new reprodb-artifinder.css/js + ArtiFinder logo asset - methodology: dedicated 'ArtiFinder-Discovered Artifacts' section + data source - about (Contribute): link to ArtiFinder-Data - regenerated data (pipeline): artifinder.json, _data/artifinder_*.yml, artifacts.json + search_data.json gain artifinder_urls
Remove assets/data/artifinder.json (redundant republish of ArtiFinder-Data) and _build/artifinder_matched_urls.json (no longer produced; repo_stats now reads matched GitHub links from artifacts.json). No page referenced them; the discovery page uses the _data/artifinder_*.yml aggregates.
The afConfChart bar chart already shows discovered vs. matched per conference.
search_data.json now also contains discovered artifacts whose papers never went through AE (marked, no badges): 3076 AE + 2770 ArtiFinder-only = 5846.
Author and institution profile pages now list ArtiFinder-discovered (non-AE) papers, marked with the Artifinder sign and shown in a distinct indigo colour (italic title). The author timeline chart gains a separate 'ArtiFinder (discovered)' series. These are view-only and never affect scores/stats. Adds assets/data/artifinder_authors.json.
In profile artifact tables, the paper title now links to the originally collected AE artifact URL (getArtifactUrl); the ArtiFinder-discovered link is shown as a separate clickable Artifinder-marked link (afLink) rather than taking over the title. Purely-discovered (non-AE) rows still link the title to the discovered artifact.
repo_stats re-run now counts GitHub repos ArtiFinder matched to AE papers (~180 net-new), via _inject_artifinder_urls reading artifacts.json.
- new conference x year discovery heatmap (afHeatmap) - rename 'Year Range' card to 'Years Included' - caption now shows the ArtiFinder *data* date (data_updated), not pipeline run
Cross-references the pipeline helper and carries the shared test vector so the author-index normalisation stays byte-identical across Python/JS.
Regenerated: 3076 ae + 2770 artifinder-only rows now carry an explicit source.
21 discovered links now attach to their AE artifact (fuzzy title fallback): artifacts.json gains artifinder_urls, fewer artifinder-only search/profile rows, updated discovery aggregates.
All charts now share one palette: indigo (#4a5aa8) for the ArtiFinder/discovered measure (bars, discovery-rate line+area, heatmap) and orange for the matched-to-AE overlap. Fixes the discovery-rate line which used the site's security red.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Surfaces ArtiFinder-discovered artifact links across the website. These come from the pipeline's new
artifinderstage (companion PR below). ArtiFinder scrapes papers directly to find artifact links; they are not manually verified, carry no badges, and are excluded from all AE statistics and scores.Preview: https://vahldiek.github.io/reprodb.github.io/artifinder.html
Changes
#artifindersearch keyword./artifinder.htmlpage: an overview-style discovery-statistics page (ECharts) showing discovered artifacts per year (matched-to-AE vs. not), discovery rate over time, and a per-conference breakdown. Linked from the Ranking nav group.reprodb-artifinder.css,reprodb-artifinder.js, and an ArtiFinder logo SVG._data/artifinder_{summary,by_year,by_conference}.yml;artifacts.jsonandsearch_data.jsongainartifinder_urls(no other rows changed). The raw discovered links are not republished — they stay in the upstream ArtiFinder-Data repo.Emphasis on AE vs. ArtiFinder
Per policy, ArtiFinder figures are always distinguished from AE results and are reported separately (the dedicated page + methodology). All existing statistics continue to reflect AE-evaluated artifacts only.
Companion PR
Pipeline changes (loader, matching, new stage, schema bump): ReproDB/reprodb-pipeline#17
Notes for reviewers
artifinder+search_datastages (may need thedata-updatelabel for the immutability check)./artifinder.htmlrenders all charts and the per-conference table.CI note
ReproDB/data-schemas@main; it turns green once Add artifinder_urls to schemas (bundle v0.3.0) data-schemas#3 (addsartifinder_urls, bundle v0.3.0) is merged.data-updatelabel to allow it.