How does Anki Miner choose what word to mine? #9
-
|
If there are multiple instances of a word in the target media, how does it choose which one to mine? Is it just the first instance of occurrence? Or is there some factor that scores sentences based on some metrics such as length, or number of unknown words (closer to i+1 = better)? Other questions;
I think my current use case would be to mine with Anki Miner into a different Anki user profile used purely as a sentence bank. Using a separate profile means if I mine a large volume of cards in batch, I can keep them from counting towards my AnkiWeb sync storage limit I can then use Cross Profile Search and Import addon to search words in the sentence bank and add them to my main profile. In order to do that in a separate profile and have Anki Miner know what is in my known vocabulary, I have exported my vocab from jiten.moe and use that in the blacklist setting. Thank for making this great program, it's really cool! Sorry for all the questions, I just want to get to understand it better! :) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
|
Hi, no worries about asking all the questions. I'm happy that you're interested enough to ask them. So to answer them:
I hope that answers your questions well, tell me if you have more. Also, about your use case - how many cards do you have? To hit the AnkiWeb limit, you must have like 50k+ I assume? Im 1/3 of the way to the 100MB limit and I've got 20k cards. Maybe it's because I've got shorter definitions though, Anki Miner doesn't produce nearly as much text in definitions as Yomitan and other tools. Thanks for being interested in the project though - I'd appreciate any more improvement ideas or bug reports. And if you can share it with other interested Japanese learners. |
Beta Was this translation helpful? Give feedback.
-
Yes that's great, thank you for taking the time to answer.
Perhaps this could be a future feature? Having an option to target i+1 sentences would truly elevate its practicality! Although it would probably be a lot of work, so I understand if it's too big of an ask.
I have about 8.5k Japanese sentence cards (plus 2k Kanji and other unrelated decks) with about 60Mb used in AnkiWeb storage. My notes may be slightly heavier in text size with additional monolingual dictionaries. I'm not concerned about running out anytime soon, but I just liked the idea of keeping my main profile uncluttered from auto mined cards, and thought using a sentence bank approach seemed like an interesting idea. As great as automation is there will inevitably be some undesirable ones (e.g. bad screenshot, low context, too abstract, too many unknown words, etc.) My plan was to pick the good ones that were i+1 and import them to my main profile, then use Backfill from Yomitan addon to generate missing data fields and add my prefered monolingual dictionaries (I try to avoid English definitions). But maybe, I'm thinking about it all wrong, and I should just use Anki Miner as is, on a per episode basis, keep the good ones and backfill, then delete the rest. What is your preferred workflow? |
Beta Was this translation helpful? Give feedback.
Hi, no worries about asking all the questions. I'm happy that you're interested enough to ask them.
So to answer them:
Sentences are chosen on first-occurence. Whichever appears earliest in the subtitles is chosen, no extra i+1 calculations.
A known word is a word in an Anki note's first field. So Anki Miner checks the first field of all notes, and if the word is in one of them, it is considered known. No consideration of new/young/mature card intervals. This is done to prevent duplicate cards when processing.
For known word searching, it uses the first field of notes, to make sure to cover all note types. So regardless of what the first field is called, it is assumed as the express…