Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disambiguating duplicated note model UUIDs in collection #155

Merged
merged 2 commits into from
Dec 11, 2021

Conversation

aplaice
Copy link
Collaborator

@aplaice aplaice commented Dec 5, 2021

The issue of note model UUIDs being cloned along with the note models themselves has been fixed (#136). However, many users' collections will already contain such duplicated note model UUIDs. Also, if a user clones a note type on another platform, then the UUID will still be duplicated.

The disambiguation is run before export and snapshot, since that's when it's most needed to avoid broken deck.jsons.

In the long, run we could perhaps switch to running the code only after syncing (since our "attack surface" would be cloning of note models on other platforms).

Running disambiguate_note_model_uuids takes 10 ms on a collection with 20 note types (without duplicates), which is IMO an acceptable overhead.

The file disambiguate_uuids.py could also contain the (far lower priority) disambiguation of deck config UUIDs (see #135).

I've chosen to set a new UUID immediately (rather than just removing it and having it be regenerated when it's needed (possibly during the export/snapshot immediately after), which would work just as well) because IMO it's more comprehensible to the end-user — the new and old UUIDs are both listed together and they can inspect their deck.jsons.


The key question is how to determine which note model is "the original". I think that using the note type id (notetypes.id) is a sufficiently good proxy.

Details

When creating a new note model in Anki (e.g. via cloning), the id of the new note type is the time in milliseconds since the epoch, unless that id is already taken, in which case it's that time + 1. In either case, the id of any newer note model will be greater than the id of any older note models. AFAICT AnkiDroid uses the same rust backend, so its behaviour should be the same. AnkiMobile generally reuses Anki's code, so it should also behave the same way.

(I haven't dug into the old Anki python version of the code, but from what I've read online, it's "always" been the case that the notetype should be the unix time (in ms). Looking at some of my own old (built-in) note types, their ids correspond to the time when I started using Anki, further supporting the hypothesis.)

AFAICT based on skimming the code and testing, updating an existing note type (even changing the number of fields and the number of cards) does not change its id (which feels logical).

People might also generate model ids outside Anki, for instance using genanki. Genanki's recommendation is to generate a random integer between 2^31 and 2^32. Since all unix times (in ms) from this millenium are greater than 2^32, then assuming that people followed genanki's recommendations, then the ids of any genanki note models will be smaller than the ids of any note models created from these note models (by cloning) in Anki, so again ordering of id maps onto newness, well.

There are some special cases, for instance when upgrading a collection which had a note model id of 0, but here again the generated id is less than 2^32, so it will also be smaller than any time-based id.

Summary

Overall, while it's in principle possible that the cloned copy of a note model will have a smaller id than the original note model, it seems highly unlikely.

The issue of note model UUIDs being cloned along with the note models
themselves has been fixed (Stvad#136).  However, many users' collections
will already contain such duplicated note model UUIDs.  Also, if a
user clones a note type on another platform, then the UUID will still
be duplicated.

The disambiguation is run before export and snapshot, since that's
when it's most needed to avoid broken `deck.json`s.

In the long, run we could perhaps switch to running the code only
after syncing (since our "attack surface" would be cloning of note
models on other platforms).

Running `disambiguate_note_model_uuids` takes 10 ms on a collection
with 20 note types (without duplicates), which is IMO an acceptable
overhead.

The file `disambiguate_uuids.py` could also contain the (far lower
priority) disambiguation of deck config UUIDs (see Stvad#135).
@aplaice aplaice merged commit 5f79ca9 into Stvad:master Dec 11, 2021
@aplaice aplaice deleted the disambiguate_note_model_fix branch December 11, 2021 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant