Skip to content

feat: [lungmap] add lungmap projects to google datasets catalog (#4808)#4832

Merged
NoopDog merged 3 commits into
mainfrom
fran/4808-lungmap-google-datasets-jsonld
May 22, 2026
Merged

feat: [lungmap] add lungmap projects to google datasets catalog (#4808)#4832
NoopDog merged 3 commits into
mainfrom
fran/4808-lungmap-google-datasets-jsonld

Conversation

@frano-m
Copy link
Copy Markdown
Contributor

@frano-m frano-m commented May 13, 2026

Summary

Adds Schema.org Dataset JSON-LD to LungMAP project detail pages so Google Dataset Search can index them. Completes the three-consumer rollout alongside #4806 (HCA) and #4807 (AnVIL).

LungMAP shares the HCA Azul backend, so this PR refactors the HCA builder into a shared parameterized core (buildProjectJsonLd) and adds a thin LungMAP wrapper supplying its own catalog identity.

Refactor:

  • NEW app/utils/schemaOrg/projectDataset.ts — shared buildProjectJsonLd(data, browserURL, options) core extracted from hcaProjectDataset.ts (lift-and-rename; logic unchanged). Defines ProjectCatalogOptions for catalog identity (catalogName, descriptionFallbackSuffix).
  • app/utils/schemaOrg/hcaProjectDataset.ts — reduced to a 27-line wrapper passing HCA catalog config.

LungMAP:

  • NEW app/utils/schemaOrg/lungmapProjectDataset.ts — 27-line wrapper passing LungMAP catalog config.
  • NEW __tests__/utils/schemaOrg/lungmapProjectDataset.test.ts — 3 tests verifying LungMAP catalog identity + URL pattern + description padding (the shared core is covered by the 14 existing HCA tests).
  • pages/[entityListType]/[...params].tsx — added isLungMap = siteConfig.appTitle?.includes("LungMAP") guard and mount via the generic renderJsonLd helper introduced in [AnVIL DX] Add AnVIL datasets to Google Datasets catalog #4807.

Closes #4808. Stacked on #4831 (AnVIL PR), which is stacked on #4829 (HCA PR). Once #4829 and #4831 merge, rebase this PR's base to main.

Ticket scope audit (MVP)

Field Status
name, description (required)
identifier, url, sameAs, includedInDataCatalog, isAccessibleForFree, keywords, creator, citation ✅ (inherited from shared core; same field coverage as HCA)
funder, license, distribution, measurementTechnique, variableMeasured ⏸ deferred per the HCA PR's deferral list

LungMAP-specific differences from HCA: catalogName = "LungMAP Data Explorer", padding suffix "LungMAP Data Explorer project.". Every other mapping is identical because LungMAP uses HCA's ProjectResponse shape.

Test plan

  • npx tsc --noEmit passes
  • npm run lint, npm run check-format pass
  • npx jest __tests__/utils/schemaOrg — 28/28 tests pass (14 HCA + 11 AnVIL + 3 LungMAP)
  • npm run build-dev:lungmap — 4/10 project detail pages emit JSON-LD with "name":"LungMAP Data Explorer" catalog (remainder are sub-tab routes where processEntityProps short-circuits — same gating pattern as HCA/AnVIL)
  • npm run build-ma-dev:hca-dcp — HCA still 110/116 (no regression from the core extraction)
  • npm run build:anvil-cmg — AnVIL still 375/422 (no regression)
  • npm run build-dev:anvil-catalog — clean, no JSON-LD (correctly gated)
  • Validate output against Google's Rich Results Test and Schema Markup Validator after deploy
  • Request indexing via Google Search Console post-merge

🤖 Generated with Claude Code

@frano-m frano-m changed the base branch from fran/4807-anvil-dx-google-datasets-jsonld to main May 13, 2026 12:29
@frano-m frano-m force-pushed the fran/4808-lungmap-google-datasets-jsonld branch from 73625f0 to 778963c Compare May 22, 2026 06:18
@frano-m frano-m requested a review from Copilot May 22, 2026 06:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Schema.org Dataset JSON-LD for LungMAP project detail pages so Google Dataset Search can index LungMAP projects, reusing the existing HCA/Azul project mapping via a shared, parameterized builder.

Changes:

  • Extracted the HCA project JSON-LD builder into a shared buildProjectJsonLd(..., options) core with per-catalog identity options.
  • Added a LungMAP-specific wrapper builder + unit tests asserting LungMAP catalog identity, URL shape, and description padding.
  • Wired LungMAP JSON-LD rendering into the entity detail page behind a LungMAP app-title guard.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pages/[entityListType]/[...params].tsx Adds LungMAP guard and mounts JSON-LD builder for project detail pages.
app/utils/schemaOrg/projectDataset.ts New shared core for HCA-style project → schema.org Dataset JSON-LD mapping.
app/utils/schemaOrg/lungmapProjectDataset.ts LungMAP wrapper supplying LungMAP catalog identity/options to the shared core.
app/utils/schemaOrg/hcaProjectDataset.ts Refactors HCA builder into a thin wrapper around the shared core.
tests/utils/schemaOrg/lungmapProjectDataset.test.ts Adds LungMAP-specific unit tests for catalog identity and padding behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app/utils/schemaOrg/projectDataset.ts
frano-m and others added 2 commits May 22, 2026 16:31
…er (#4808)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…4808)

- AnVIL suffix expanded to spell out NHGRI Analysis Visualization and Informatics Lab-space
- HCA renamed catalog to "Human Cell Atlas Data Explorer", suffix matches
- LungMAP suffix uses "A project in the LungMAP Data Explorer."
- Update buildDescription jsdoc to reflect that the entity name's length
  carries the 50-char minimum in practice (suffix alone no longer self-sufficient)
- Update test expectations accordingly

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@frano-m frano-m marked this pull request as ready for review May 22, 2026 06:48
@NoopDog NoopDog merged commit 646bea7 into main May 22, 2026
3 checks passed
@frano-m frano-m deleted the fran/4808-lungmap-google-datasets-jsonld branch May 22, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[LungMAP] Add LungMAP projects to Google Datasets catalog

3 participants