New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor import/export code #1018
Comments
I did work on some SpecialFields PRs, and knowing that you intended to refactor the export/import logic soon, I took some notes. While SpecialFields makes the importing algorithm much more powerful, I don't think the abstractions it provides are necessarily easy to understand. So the following are some ideas of how to better abstract the functionality SpecialFields provides: Changing Note Type upon import
Private Fields:
Private Tags:
Update even if note type or note has older
|
Happy to answer any questions in regards to this. @hgiesel has a great outline here |
Thanks Henrik, that looks really helpful. The scheduler work will likely come first (I've already done some initial work on it), but will come back to this. |
Also for reference: https://forums.ankiweb.net/t/importing-rewrite-note-sort-order/13449 |
Leading/trailing whitespace causing issues: https://forums.ankiweb.net/t/import-export-causes-unwanted-duplicates/14060 |
Reminder to add checks for invalid card IDs: https://forums.ankiweb.net/t/issue-with-huge-card-ids/12165 |
Just a brief update on this: Rumo has implemented .colpkg import/export, and is currently looking into apkg exporting. The plan is to approach the update in two steps:
While we work on the two steps, the old Python code can be kept around, so that we don't break existing workflows using the SpecialFields add-on. |
New apkg and csv import/export implementations are now in main, with functionality roughly equivalent to the old Python code. Now we can turn our attention to the new functionality suggested above. @hgiesel gave us a great summary. Some thoughts/discussion starters on the various features: Private fields: Henrik's suggested approach sounds good here - we could use a checkbox like "[x] Exclude in shared decks" in the Fields dialog. Private tags:
Private vs public export: I quite like this idea - we already use the scheduling option to control whether flags should be stripped, and the two use cases are quite different. @AnKingMed, could you explain in more detail which of the lumped settings you want to be able to control independently, and why? Updates when notetype has changed: This is probably the hardest part. I think it gets even more complicated if we try allow the user to merge fields, or worry about preserving the content of fields that no longer exist. Brainstorming here - how does the following sound?
|
Private fields:
Private tags:
Private vs public export:
Note that the special fields add-on is just for importing. Ideally, exporting would also be considered and users could choose to not export certain fields. We've also recently made an add-on for "export single tag for sharing" so that users can export just a single tag and strip all the other tags out. This is really useful when you tag cards by a lecture, export just that tag, and the person you share with can then import that file with "all fields are special" and "combine tagging" so that it essentially just imports that one tag without affecting any field content or any of the existing tags |
Private tags: I think you suggested to make that list of prefixes configurable, right? I'd argue against that. Tags are something that is used across decks and notetypes. Having configurable tags could make sharing more complicated than it needs to be. Example: Instead, what do you think about not making it configurable, but use a prefix or infix of
E.g. this subtree would show the tags Any repeated Private vs public export:
|
There are times when a user has 100 tags and user B has the same 100 tags (using the same shared deck). User B chooses to alter some of those 100 tags, but also adds a new tag. They want to share that new tag with user A, but aren't ready to share the other changes yet. In this case, there needs to be a way to share just that singular tag without updating the other tags For deck description, there are many instances in sharing updates with each other to compile things that I've needed to protect the deck description (i.e. I as the deck creator had updated it, but I was importing files from other people for the new update and they had the old description). For 99% of people, they need just a couple functions, the other functions are for the deck creators and maintainers. Although it's significantly less people using those functions, they're absolutely necessary. I've used every combination of the settings I outlined above in creating and maintaining decks and have a few years of experience with it :) |
It's a good point, and having a hard-coded list of private tags would allow us to simplify the UI. But if deck authors want asymmetric handling of private tags (eg excluding some on import that they'd like included on export), I'm not sure such an approach would work (unless they renamed the tag prior to import/export, which is impractical). I presume that's what you are doing with the MissedQ tag @AnKingMed?
I think we may have had our wires crossed here. I was asking whether a single "for public/for private" checkbox could work on deck export. All the options you're describing seem to be related to importing, which is a separate matter. |
@dae the MissedQ tag is just my personal tag that I use when I miss a practice question and decide a flashcard is relevant to that. Sorry I didn't understand this was just exporting, however it's probably still worth considering the whole picture when addressing exports and imports. I think the most ideal way to handle exporting tags is to select exactly which tags you want to export. For example, a dialog that has "select all" or "select none" and then you can choose
|
I'd prefer to avoid a list of all tags if possible, as it adds a fair bit of extra implementation complexity. I'm still trying to understand the different use cases that are covered here. Does the following sound correct? Are there other cases I've missed?
Could 2b) perhaps be handled when importing, instead of when exporting? Eg if the importer allowed you to only import specific tags? You could accomplish 2a) by searching in the browse screen, then exporting from there. |
2b is definitely an importing issue. There are also instances, however, where I want to export multiple tags. The solution right now would be to export a single tag multiple times and then import each file into the new profile. I think the list of tags is ideal if at all possible if you're going for the best solution |
Does the single-tag filtered need to be done on the exporting side though? Couldn't the filtering be done on the importing side, if the importing screen provided a way to include or exclude specific tags? |
I think that's much more appropriate on the exporting side. I generally share tags with others and I only want to share certain ones. It would be an extra step to have to share everything and then tell them what to do |
Is the goal really to support all the use cases mentioned here? Call me Bernie Sanders, but I think we should focus on the 99% who'd have a tough time figuring out the correct settings for their use case. Changing Notetypes
I think you're right that embedding a change-notetype feature is not the right way, @dae. It would be quite powerful, but easy to mess up and would still not solve all the problems.
This would mean that subsequent updates would continue to cause conflicts. Let's say you have a notetype with id 1 and schema A and the import contains one with also id 1, but schema B. Then a notetype with schema B and id 2 is added to your collection. Then you do another import with the same notetype, but there's still no notetype with matching id and schema, so B3 is added. What about merging the notetypes? The new notetype would contain the union of fields and card types. The user may decide if card types with the same name, but different templates would be updated or preserved. Private vs. Public@hgiesel's proposals sound good to me.
If there's only one toggle for importing/exporting private fields or not, it seems reasonable to me to treat private tags the same. |
I disagree with this. The 99% strongly rely on the 1% making high quality decks in these instances and focusing energy on high quality deck making utilities is important. |
You both have valid points. Deck authors do bring a lot of value to the ecosystem, so we can't ignore their needs. At the same time, there is a higher bar for entry into the core code than in add-ons, and our goal here is to identify the important features and try to add them in a way that is maintainable, and fits in with the rest of the UI (the current special fields add-on UI is not particularly understandable). @AnKingMed could you give us some "user stories" about why you're exporting individual tags, so we better understand your workflow? What sort of information do these individual tags represent, and why do you need to share them with others? @RumovZ re using JSON for this: that is a good point, and JSON may well be more appropriate here. With colpkg/apkg/csv importing mostly done at this point, and fallback to the old importers with the new setting enabled, I feel like we're in a good place now. So long as we continue to support the legacy codepaths, there's no need to rush to add these features to our apkg importer, and perhaps it's worth pursuing the JSON importer further first. While features focusing on shuffling data between collaborating deck authors may make more sense for the JSON exporter, some of the other proposals like private fields and tags probably still make sense for the apkg exporter, as they would be useful in the more common case of deck author distributing changes to end users. But presumably in that use case, we don't need to worry about things like single-tag imports.
Yep, good point. The way the original import code addressed this case is to add 1 to the old notetype id when adding a new one, and keep doing so until it found a free id or found a notetype with a matching schema.
That feels more complex than the approach above - WDYT? Re: private tags, a few thoughts:
|
I'm not saying we should ignore deck creators, just address their needs in some way that doesn't aggravate the experience of the average user. E.g. by putting highly specialised features in an add-on or a part of Anki, that's only for advanced usage anyway (JSON export). Changing Notetypes
Sure, but it's also much more useful. Say you add a private field to a notetype from a shared deck. Then you receive an update for its notes. With Preserve you'd miss out on the updates, with Update you'd lose your private field data. Merge would offer the best of the two worlds. The user would only need to tell Anki whether to update the card templates. Private TagsA separate node would better fit the hereditary nature of marking tags as private, but it would introduce the concept of meta tags. No idea what would be less complex to implement in the end.
Hmm, then Now that I think about it I see some more issues with both a prefix and a parent tag:
Maybe adding a dedicated flag to the tag config would be better after all? |
With these big premade decks, often one student will go through a lecture and tag all relevant notes. They can then share that individual tag and the other user can import that individual tag. User #2 may, however, have changed some of the premade deck tags and importing all of User #1's tags into their database would either make it really messy or overwrite all of that |
Good point about the private field case. Would this work in other cases like renames though? For example, a deck author publishes a deck with "field 1" and "field 2", which the user imports. Then they rename the fields to "field a" and "field b", and reshare the deck with a modified template. If we were to take a union of the notetypes, you'd end up with two copies of all the field content, and two templates, one showing field 1/2 and one showing field a/b. I suspect this is the most common workflow, where users consume shared decks and want to fetch new versions from the deck author, without maintaining any local changes.
My thinking is that #foo and foo are two distinct tags, just as foo and Foo are two distinct symbols/variables in most programming languages. The Go language uses case to denote public/private methods, so a class can have a private "foo()" and a public "Foo()" at the same time if it wants. All our code would have to do is check if a tag started with a # (or optionally, its ancestor) to decide if it's private. The browser could optionally add a (private) label to such tags to make the meaning of the symbol more clear, but I do not think it should hide the # symbol, as that would be confusing if there also happened to be an unprefixed tag of the same name.
True, but I'm not sure that's a big deal for tags that the user would permanently want private. It also might play better with mtime-based update checks - consider the case where a note with mtime 123 and tag foo is exported, and imported into a user's collection. If it had been marked as private separately, and the user toggled it to public and re-exported, the importing side would not see any mtime change. For ephemeral exclusions it would be awkward, but perhaps we could handle such cases a different way? |
No, it's meant as an alternative beside Preserve and Update, or possibly as a replacement for Preserve.
Yes, the suggestion to have the GUI strip the prefix seemed reasonable to me at first, but I've realised the problems you describe.
I gather the problem is that the tags table lacks an mtime column? |
Sorry, I couldn't locate this - could you quote or restate the issue here? Wouldn't it be as simple as the following?
maybe_update_note() compares note modification times. If privateness were a property stored in the tags table, toggling it would not update the mtime on notes that use the tag, and thus the change in available tags on import would not be known to the importer, as the mtime would still match.
Very old Anki versions stored the tags in a separate table, instead of embedding them in a note. I seem to recall it not performing well at the time, as the tags table could grow into the millions of entries. That may have been poor table design though; I do not recall the details off the top of my head. Recently I've been working a bit on the AnkiDroid code to see if I can give them a bit of a push towards moving to the new backend. Until they've fully bought into it and can update to new versions promptly, we're somewhat limited in the breaking changes we can make, as I don't want clients to have to go through expensive upgrade/downgrade steps just to talk to each other. |
My bad! I was referring to the first answer here, but upon reading it again, I realise it doesn't say what I thought it did.
Wouldn't we just update the tags separately from the notes, like we currently do with most other objects?
I understand, but do you think this may happen in the long term? I wonder if I should think of tags more like full Anki objects or rather plain strings with some additional related metadata. |
It will require rewriting the majority of our tag handling code, so the gains we get from doing so would need to be fairly large to justify the effort. Operations that modify the text of tags are the hierarchy would become faster, but accessing the tags of a note would require looking them up in the DB and potentially caching the results. The syncing protocol would need adjusting, and importing would need to deal with the tags as strings, as the tag ids in the source and target decks may not match. I wonder whether it's worth the effort? |
Thanks for explaining, I finally understand. This would indeed make updating a bit more cumbersome. Currently, an exported note is an exact clone of its source note, so it rightfully has the same mtime. With privateness, we create a divergent ad-hoc note, so the mtime may or may not make sense. Maybe we need to invariably bump the mtime on partially exported notes? |
You're right, it would be a problem there too. Bumping the mtime each time could cause newer changes to be overwritten with older ones. One alternative I can think of is to check for source.mtime >= target.mtime instead of source.mtime > target.mtime, ie updating in the matching mtime case. Ideally we'd only add them to the list of updated notes if the tag or field content had actually changed, so we'd need to compare the two in the identical case before applying/counting the changes. |
Right. I think we should check for identity in the I think this once more shows the inadequacy of using apkgs for updates. Would be great if deck authors could generate a JSON file only containing the diff between their originally shipped deck and current collection. |
I suspect JSON imports will improve on some issues and introduce others, and diff-based updates rely on the target collection having the same or similar base point as the collection that was used to generate the diff. In any case, it sounds like you're more enthusiastic about the JSON side of things at the moment, and we don't currently have a fully-fleshed-out implementation to compare to the existing functionality of that add-on. So maybe it would be worth investing some more time on that side of things first, and then circling back to the apkg side of things later? |
Sounds good!
I imagine deck authors would share an apkg initially and then offer JSON-based updates. These would be much smaller in size and less likely to conflict with any changes endusers have made. Does this sound like a direction in which you'd like to go? (Of course, I'd try to find some low-hanging fruit first and leave media and so on for a later iteration.) |
Agreed on dealing with the media later; one possibility would be to allow a .ankijson file inside an .apkg container instead of an .anki2 (we'd need to decide whether that would be better than a separate extension or not) Deck authors tend to both add new cards and modify existing ones, as well as sometimes modifying the notetype structure or changing notes to a different notetype, so ideally we will handle all of those cases
Sounding good. Apologies I haven't been as involved recently; I've been diving into the AnkiDroid code lately, and have also had other non-Anki related things going on. |
Repurposing apkgs is an interesting idea. |
I'd like to refactor the import/export code soon, moving some or most of it to Rust. This will probably come after the scheduling rework.
When working on this:
reminder: special fields when importing #507
The text was updated successfully, but these errors were encountered: