Skip to content

change http back to https to make mlcroissant validate happy#12333

Merged
stevenwinship merged 1 commit intodevelopfrom
12014-valid-croissant
Apr 16, 2026
Merged

change http back to https to make mlcroissant validate happy#12333
stevenwinship merged 1 commit intodevelopfrom
12014-valid-croissant

Conversation

@pdurbin
Copy link
Copy Markdown
Member

@pdurbin pdurbin commented Apr 15, 2026

What this PR does / why we need it:

This commit reverts 9e619b6 from #12214. I made that change to mirror a change in the Croissant 1.1 spec but there's a mismatch between the spec change and mlcroissant validate. I opened this issue upstream:

Which issue(s) this PR closes:

Special notes for your reviewer:

Suggestions on how to test this:

Try this before and after this change (pip install -r requirements.txt first)

cd src/test/resources/croissant && ./validate.sh

The "before" output should show an error like this for every single Croissant file:

E0415 16:02:49.695261 8656855360 validate.py:55] Found the following 1 error(s) during the validation:
  -  The current JSON-LD doesn't extend https://schema.org/Dataset.
...

The "after" output should look something like this. That is to say, there are no errors for non-drafts. For drafts, the errors and warnings below are expected and the same as before #12214 for Croissant 1.1 was merged:

 % ./validate.sh         
testing cars/expected/cars-croissant.json
I0415 17:58:41.688423 8656855360 validate.py:53] Done.
testing cars/expected/cars-croissantSlim.json
I0415 17:58:42.074821 8656855360 validate.py:53] Done.
testing draft/expected/draft-croissant.json
E0415 17:58:42.470159 8656855360 validate.py:55] Found the following 1 error(s) during the validation:
  -  [Metadata(Draft Dataset)] ValueError({'Dates or DateTimes should follow the [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601). Got '})
Found the following 2 warning(s) during the validation:
  -  [Metadata(Draft Dataset)] Property "https://schema.org/datePublished" is recommended, but does not exist.
  -  [Metadata(Draft Dataset)] WarningException("Version doesn't follow MAJOR.MINOR.PATCH: DRAFT. For more information refer to: https://semver.org/spec/v2.0.0.html")
testing draft/expected/draft-croissantSlim.json
E0415 17:58:42.853607 8656855360 validate.py:55] Found the following 1 error(s) during the validation:
  -  [Metadata(Draft Dataset)] ValueError({'Dates or DateTimes should follow the [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601). Got '})
Found the following 2 warning(s) during the validation:
  -  [Metadata(Draft Dataset)] Property "https://schema.org/datePublished" is recommended, but does not exist.
  -  [Metadata(Draft Dataset)] WarningException("Version doesn't follow MAJOR.MINOR.PATCH: DRAFT. For more information refer to: https://semver.org/spec/v2.0.0.html")
testing junk/expected/junk-croissant.json
I0415 17:58:43.250323 8656855360 validate.py:53] Done.
...

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No.

Is there a release notes update needed for this change?:

No. I looked at https://github.com/IQSS/dataverse/blob/12014-valid-croissant/doc/release-notes/12014-croissant-1.1.md from the last PR (#12214) and it's still accurate.

Additional documentation:

I updated the API changelog. You can preview the change at https://dataverse-guide--12333.org.readthedocs.build/en/12333/api/changelog.html

This commit reverts 9e619b6 from #12214. We were following the
Croissant 1.1 spec but there's a mismatch between it and
`mlcroissant validate`. We've opened this issue upstream:
mlcommons/croissant#1018
@pdurbin pdurbin added this to the 6.11 milestone Apr 15, 2026
@pdurbin pdurbin added Size: 0.5 A percentage of a sprint. 0.35 hours Croissant Croissant and Kaggle related work Project: Trusted Data labels Apr 15, 2026
@pdurbin pdurbin moved this to Ready for Review ⏩ in IQSS Dataverse Project Apr 15, 2026
@coveralls
Copy link
Copy Markdown

Coverage Status

coverage: 24.873%. remained the same — 12014-valid-croissant into develop

@github-actions
Copy link
Copy Markdown

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:12014-valid-croissant
ghcr.io/gdcc/configbaker:12014-valid-croissant

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

for i in */expected/*.json; do
echo testing $i
mlcroissant validate --jsonld $i
done
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just putting this at the bottom of the PR... https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-12333/1/consoleFull shows a failure to deploy the EC2 instance so I kicked off another run: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-12333/2/

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@landreev landreev self-assigned this Apr 16, 2026
Copy link
Copy Markdown
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
It's an impressive number of files, that needed to be changed to accommodate the fix. (which is a function of how much test coverage you have for croissant - so I mean it as a good thing, obvs.)

@github-project-automation github-project-automation Bot moved this from Ready for Review ⏩ to Ready for QA ⏩ in IQSS Dataverse Project Apr 16, 2026
@landreev landreev removed their assignment Apr 16, 2026
@stevenwinship stevenwinship self-assigned this Apr 16, 2026
@stevenwinship stevenwinship moved this from Ready for QA ⏩ to QA ✅ in IQSS Dataverse Project Apr 16, 2026
@stevenwinship stevenwinship merged commit 7edf3a0 into develop Apr 16, 2026
22 checks passed
@github-project-automation github-project-automation Bot moved this from QA ✅ to Merged 🚀 in IQSS Dataverse Project Apr 16, 2026
@stevenwinship stevenwinship deleted the 12014-valid-croissant branch April 16, 2026 20:38
@stevenwinship stevenwinship removed their assignment Apr 16, 2026
@stevenwinship stevenwinship moved this from Merged 🚀 to Done 🧹 in IQSS Dataverse Project Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Croissant Croissant and Kaggle related work Project: Trusted Data Size: 0.5 A percentage of a sprint. 0.35 hours

Projects

Status: Done 🧹

Development

Successfully merging this pull request may close these issues.

4 participants