Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate deterministic ids when formatting notebooks #9359

Merged
merged 7 commits into from Jan 4, 2024
Merged

Conversation

zanieb
Copy link
Member

@zanieb zanieb commented Jan 2, 2024

When formatting notebooks, we populate the id field for cells that do not have one. Previously, we generated a UUID v4 which resulted in non-deterministic formatting. Here, we generate the UUID from a seeded random number generator instead of using true randomness. For example, here are the first five ids it would generate:

7fb27b94-1602-401d-9154-2211134fc71a
acae54e3-7e7d-407b-bb7b-55eff062a284
9a63283c-baf0-4dbc-ab1f-6479b197f3a8
8dd0d809-2fe7-4a7c-9628-1538738b07e2
72eea511-9410-473a-a328-ad9291626812

We also add a check that an id is not present in another cell to prevent accidental introduction of duplicate ids.

The specification is lax, and we could just use incrementing integers e.g. 0, 1, ... but I have a minor preference for retaining the UUID format. Some discussion here — I'm happy to go either way though.

Discovered via #9293

Copy link

github-actions bot commented Jan 2, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@zanieb zanieb marked this pull request as ready for review January 3, 2024 03:38
if raw_notebook.nbformat == 4 && raw_notebook.nbformat_minor >= 5 {
// We use a mock random number generator to generate deterministic uuids
let mut rng = rand::rngs::mock::StepRng::new(0, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a seeded number generator rather than the StepRng one, so that the IDs are deterministic but look random rather than structured as they do now? Or was this the only option for deterministic UUIDs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there's another option for a seeded number generator, this one was just the most obvious way I saw.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we can use the StdRng seeded with 0

7fb27b94-1602-401d-9154-2211134fc71a
acae54e3-7e7d-407b-bb7b-55eff062a284
9a63283c-baf0-4dbc-ab1f-6479b197f3a8
8dd0d809-2fe7-4a7c-9628-1538738b07e2
72eea511-9410-473a-a328-ad9291626812

@zanieb zanieb added the formatter Related to the formatter label Jan 3, 2024
Copy link
Member

@konstin konstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with natural numbers for simplicity but it doesn't matter much since the users shouldn't see that id anyway.

@zanieb zanieb merged commit aaa0097 into main Jan 4, 2024
17 checks passed
@zanieb zanieb deleted the zb/notebook-id-fmt branch January 4, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants