How does Anki Miner choose what word to mine? #9

StyraxBenzoin · 2026-05-15T15:25:31Z

StyraxBenzoin
May 15, 2026

If there are multiple instances of a word in the target media, how does it choose which one to mine? Is it just the first instance of occurrence? Or is there some factor that scores sentences based on some metrics such as length, or number of unknown words (closer to i+1 = better)?

0xzerolight
May 15, 2026
Maintainer

Hi, no worries about asking all the questions. I'm happy that you're interested enough to ask them.

So to answer them:

Sentences are chosen on first-occurence. Whichever appears earliest in the subtitles is chosen, no extra i+1 calculations.
A known word is a word in an Anki note's first field. So Anki Miner checks the first field of all notes, and if the word is in one of them, it is considered known. No consideration of new/young/mature card intervals. This is done to prevent duplicate cards when processing.
For known word searching, it uses the first field of notes, to make sure to cover all note types. So regardless of what the first field is called, it is assumed as the expression/word field for searching. Lapis field names are just my personal preference for my own study, and are used as placeholder defaults that can be changed. Anki Miner is universal for any note type though, users can customise what fields get filled in the app settings, and what the fields are called for their note types.
By default it goes through all your Anki notes during processing. So you can delete and re-mine cards with no issues, it will always check which exist and which don't. There is an optional local database option - but it wouldn't update for deletions so it wouldn't be useful for your use case.

I hope that answers your questions well, tell me if you have more.

Also, about your use case - how many cards do you have? To hit the AnkiWeb limit, you must have like 50k+ I assume? Im 1/3 of the way to the 100MB limit and I've got 20k cards. Maybe it's because I've got shorter definitions though, Anki Miner doesn't produce nearly as much text in definitions as Yomitan and other tools.

Thanks for being interested in the project though - I'd appreciate any more improvement ideas or bug reports. And if you can share it with other interested Japanese learners.

0 replies

StyraxBenzoin · 2026-05-15T18:10:38Z

StyraxBenzoin
May 15, 2026
Author

I hope that answers your questions well, tell me if you have more.

Yes that's great, thank you for taking the time to answer.

Sentences are chosen on first-occurence. Whichever appears earliest in the subtitles is chosen, no extra i+1 calculations.

Perhaps this could be a future feature? Having an option to target i+1 sentences would truly elevate its practicality! Although it would probably be a lot of work, so I understand if it's too big of an ask.

Also, about your use case - how many cards do you have? To hit the AnkiWeb limit, you must have like 50k+ I assume?

I have about 8.5k Japanese sentence cards (plus 2k Kanji and other unrelated decks) with about 60Mb used in AnkiWeb storage. My notes may be slightly heavier in text size with additional monolingual dictionaries.

I'm not concerned about running out anytime soon, but I just liked the idea of keeping my main profile uncluttered from auto mined cards, and thought using a sentence bank approach seemed like an interesting idea. As great as automation is there will inevitably be some undesirable ones (e.g. bad screenshot, low context, too abstract, too many unknown words, etc.)

My plan was to pick the good ones that were i+1 and import them to my main profile, then use Backfill from Yomitan addon to generate missing data fields and add my prefered monolingual dictionaries (I try to avoid English definitions).

But maybe, I'm thinking about it all wrong, and I should just use Anki Miner as is, on a per episode basis, keep the good ones and backfill, then delete the rest.

What is your preferred workflow?

3 replies

0xzerolight May 15, 2026
Maintainer

@StyraxBenzoin

Perhaps this could be a future feature? Having an option to target i+1 sentences would truly elevate its practicality! Although it would probably be a lot of work, so I understand if it's too big of an ask.

Nothing is too big of an ask :). All useful features deserve to be added. I'm more than happy to listen to any enhancement requests, there are plenty of things I won't think of on my own.
I'll add the i+1 feature to the list of ideas for sure. It seems like a considerable amount of people would be interested in it. I'll start work on it soon, as an optional setting with some extra sentence filtering to get only i+1 sentences. Something for the next release for sure.

As great as automation is there will inevitably be some undesirable ones (e.g. bad screenshot, low context, too abstract, too many unknown words, etc.)

Undesirable cards are (at least in my usage) very rare (sub 1%). I just delete them if I come across them in study. I don't really look over mined cards before studying them - I think Anki Miner is already close to human-quality in card creation, as long as it is set up carefully with preferred settings. Still a lot of new features to add though, this is not a finished project.

My plan was to pick the good ones that were i+1 and import them to my main profile, then use Backfill from Yomitan addon to generate missing data fields and add my prefered monolingual dictionaries (I try to avoid English definitions).

I'm adding custom dictionary support beyond just JMDict at the moment. Monolingual dictionaries will be supported very soon - my intention is to make it so that nobody needs to modify cards after they're mined, so that they're made perfectly with all user-preferred things like monolingual definitions. Could you expand on what you Backfill though? Besides monolingual definitions, I mean. I want to make it so that you don't have to do that manually.

What is your preferred workflow?

I don't really bother with i+1 and extensive quality control in my vocab mining.
I believe that as long as the card is fully complete with all info, it's good. I try not to focus on the context of the card either way, that's more of a fallback if I half-forget it.

Ultimately I want to learn Japanese to full fluency, so not understanding parts of a sentence is natural. Having multiple unknowns in a sentence motivates me to figure them out. I think it somewhat stimulates learning even more, especially figuring out how the word is used specifically in a certain sentence.

Anki Miner creates cards for all the unknowns too though, so technically they can all be learned in a single study session together.

But maybe, I'm thinking about it all wrong, and I should just use Anki Miner as is, on a per episode basis, keep the good ones and backfill, then delete the rest.

I don't think there are 'wrong' approaches to studying, it's all a matter of personal preference ultimately. Whatever keeps you at it is best. I do think focusing on the interesting parts is best - too much optimisation causes stress and burnout in my experience.

Please tell me more about your workflow though - my intention is to make Anki Miner a true standalone mining tool (as I mention above). I'd like to hear about anything you do manually - my aim is to provide options to fully automate all users' workflows. Like advanced filtering options to not have to sort for 'good' cards, and to fill everything to user spec to avoid having to 'backfill'.

Please continue to report and suggest things to make it better for your personal workflow. Anything you have to do manually is something that should be improved to help you and other users.

StyraxBenzoin May 16, 2026
Author

I'll add the i+1 feature to the list of ideas for sure. It seems like a considerable amount of people would be interested in it. I'll start work on it soon, as an optional setting with some extra sentence filtering to get only i+1 sentences. Something for the next release for sure.

That sounds amazing. Without it, my other idea was to mine everything with Anki Miner and then possibly use Morphman/Ankimorphs to sort them to i+1 cards, but I've never set it up and seems fairly complex. Definitely not something the average learner does. Building in i+1 selection into Anki Miner would lower the barrier to entry a lot.

Could you expand on what you Backfill though?

Using the backfill-anki-yomitan my intention is to populate the remaining Senren notetype fields that Anki Miner does not fill.

My personal backfill preset for Senren notetype

{ "targets": { "word": { "handlebar": "{expression}", "replace": false }, "reading": { "handlebar": "{reading}", "replace": false }, "wordAudio": { "handlebar": "{audio}", "replace": false }, "definition": { "handlebar": "{primary-definition}", "replace": false }, "glossary": { "handlebar": "{single-glossary-三省堂国語辞典-第七版},{single-glossary-旺文社国語辞典-第十一版-画像無し},{single-glossary-明鏡国語辞典-第二版},{single-glossary-新明解国語辞典-第八版},{single-glossary-デジタル大辞泉},{single-glossary-大辞林-第三版}", "replace": false }, "pitchAccents": { "handlebar": "{pitch-accents}", "replace": false }, "pitchPositions": { "handlebar": "{pitch-accent-positions}", "replace": true }, "pitchCategories": { "handlebar": "{pitch-accent-categories}", "replace": true }, "frequencies": { "handlebar": "{frequencies}", "replace": true }, "freqSort": { "handlebar": "{frequency-harmonic-rank}", "replace": true } } }

I don't really bother with i+1 and extensive quality control in my vocab mining.
I believe that as long as the card is fully complete with all info, it's good. I try not to focus on the context of the card either way, that's more of a fallback if I half-forget it.

That's fair enough. We all have out own philosophies and workflows! As you said, there are no 'wrong' approaches to studying.

Once again thanks for taking the time to reply and for being so receptive and proactive with ideas! :)

0xzerolight May 16, 2026
Maintainer

@StyraxBenzoin

Thanks for the reply.

I sifted through your backfill preset. I think that besides the pitch category label, the v2.4.0 release should cover everything else neatly. The monolingual dictionary usage is the big feature for this release, still working on it. I'll look into adding pitch category labels too though, to fully cover the backfill preset.

i+1 sentences are already implemented as an optional feature, coming in v2.4.0 as well.

Working on those other enhancement ideas too. Thanks again for all the ideas, I appreciate the help :). v2.4.0 should be out soon once I finish the multi-dict support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does Anki Miner choose what word to mine? #9

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How does Anki Miner choose what word to mine? #9

Uh oh!

StyraxBenzoin May 15, 2026

Replies: 2 comments · 3 replies

Uh oh!

0xzerolight May 15, 2026 Maintainer

Uh oh!

StyraxBenzoin May 15, 2026 Author

Uh oh!

0xzerolight May 15, 2026 Maintainer

Uh oh!

Uh oh!

StyraxBenzoin May 16, 2026 Author

Uh oh!

0xzerolight May 16, 2026 Maintainer

StyraxBenzoin
May 15, 2026

Replies: 2 comments 3 replies

0xzerolight
May 15, 2026
Maintainer

StyraxBenzoin
May 15, 2026
Author

0xzerolight May 15, 2026
Maintainer

StyraxBenzoin May 16, 2026
Author

0xzerolight May 16, 2026
Maintainer