Skip to content

My contributions to the Japanese learning community.

License

Notifications You must be signed in to change notification settings

MarvNC/JP-Resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JP Resources

Check out these Yomichan dictionaries

My contributions to the Japanese learning community. For questions and support, please make a thread in the questions forum in TheMoeWay. For suggestions please mention @Marv.

Contribution

Contributions are welcome, feel free to open a pull request. Note that there is a Prettier config file in the repo for auto formatting with the extension.

In addition, the Markdown All in One extension can be used to automatically generate and update a table of contents as well as assist in markdown editing.

Other Resources

Dictionaries

These are absolutely essential.

Special Thanks

Much thanks to:

  • Renji-xD for originally rewriting the handlebar to find a minimum value.
  • KamWithK for developing cool Anki addons to use with this guide.
  • Aquafina-water-bottle for much handlebar wizardry in rewriting the frequency handlebar to be radically better and developing a python script that greatly improves the backfilling process.
  • GrumpyThomas, pj, and aka_baka for some suggestions.
  • Michel for converting the Chinese txt frequency files

Frequency Dictionaries

I sometimes get asked about what frequency dictionaries to use and the differences between them, so here are a few essential dictionaries I would recommend.

  • JPDB
    • Frequency data scraped from https://jpdb.io in May of 2022. Due to the way the data was scraped, some terms are missing frequencies and the jpdb dictionary itself is limited to terms in JMDict. For example, 経緯 only has an entry for the いきさつ reading so it should not be used as a dictionary for sorting (the more common/correct reading is けいい). The corpus of JPDB is quite good for immersion learners as it covers anime, dramas, light novels, visual novels, and web novels so the frequencies will be relatively accurate to what you're actually reading. This dictionary is notable for displaying the frequencies of kana readings separately, so you can often get a sense of how often a word is written with kanji or not.
  • Innocent Ranked
    • The Innocent Corpus from the Yomichan page but reordered to be sorted by rank. It is based on data from 5000+ novels. A weakness is that it does not differentiate based on reading, so all readings of a term will show the same value.
  • BCCWJ
    • From the publication:
    • The balanced corpus of contemporary written Japanese (BCCWJ) is Japan’s first 100 million words balanced corpus. It consists of three subcorpora (publication subcorpus, library subcorpus, and special-purpose subcorpus) and covers a wide range of text registers including books in general, magazines, newspapers, governmental white papers, best-selling books, an internet bulletin-board, a blog, school textbooks, minutes of the national diet, publicity newsletters of local governments, laws, and poetry verses.

    • It has extremely wide coverage with most terms you'll encounter having an entry in this list even if other frequency lists don't. In addition, it differentiates between readings quite well. Make sure to install the LUW version as it has more terms.
  • CC100
    • Made by the mind behind arujisho, this uses the CC100 dataset which was made by crawling the web. Coverage is very wide, and there is reason behind the way readings are differentiated which is why I use this as my Yomichan sort dictionary.

Sorting Mined Anki Cards by Frequency

When reading and adding cards from the content you're reading, you'll come across a variety of words with varying degrees of usefulness. Especially as a beginner, you'll want to learn the useful words as soon as possible and learn the less useful words later. With this we can sort a backlog of mined cards by frequency using various installed Yomichan frequency lists.

This handlebar for Yomichan will add a {freq} field that will use your installed frequency dictionaries to send a numerical frequency value to Anki depending on the sort option applied, with the default being the (recommended) harmonic mean.

How-To

  • First, in your Anki card template create a new field for frequency, we can name this Frequency or whatever you like.

  • Then in Yomichan options, insert the following handlebars code at the end of the menu in Configure Anki card templates....

    freq Handlebar

    [!NOTE] This is the same handlebar that is used in jp-mining-note, but with a different name and with the options included. If you want to use it with jp-mining-note, you can copy the code below and rename freq in the first line to jpmn-frequency-sort, then remove the options section.

    {{#*inline "freq"}}
        {{~! Frequency sort handlebars: v24.01.10.1 ~}}
        {{~! The latest version can be found at https://github.com/MarvNC/JP-Resources#freq-handlebar ~}}
        {{~#scope~}}
            {{~! Options ~}}
            {{~#set "opt-ignored-freq-dict-regex"~}} ^(JLPT.*)|(HSK.*)$ {{~/set~}}
            {{~#set "opt-ignored-freq-value-regex"~}}{{~/set~}}
            {{~#set "opt-keep-freqs-past-first-regex"~}} ^()$ {{~/set~}}
            {{~set "opt-no-freq-default-value" 9999999 ~}}
            {{~set "opt-freq-sorting-method" "harmonic" ~}} {{~! "min", "first", "avg", "harmonic" ~}}
    
            {{~set "opt-grammar-override" true ~}}
            {{~set "opt-grammar-override-value" 0 ~}}
            {{~#set "opt-grammar-override-dict-regex"~}} ^(日本語文法辞典\(全集\)|毎日のんびり日本語教師|JLPT文法解説まとめ|どんなときどう使う 日本語表現文型辞典|絵でわかる日本語)$ {{~/set~}}
            {{~! End of options ~}}
    
            {{~! Do not change the code below unless you know what you are doing. ~}}
            {{~set "result-freq" -1 ~}} {{~! -1 is chosen because no frequency dictionaries should have an entry as -1 ~}}
            {{~set "prev-freq-dict" "" ~}}
            {{~set "t" 1 ~}}
            {{~set "found-grammar-dict" false ~}}
    
            {{~! search for grammar dictionary ~}}
            {{~#each definition.definitions~}}
                {{~#set "rx-match-grammar-dicts" ~}}
                    {{~#regexMatch (get "opt-grammar-override-dict-regex") "u"~}}{{this.dictionary}}{{~/regexMatch~}}
                {{/set~}}
                {{~! rx-match-grammar-dicts is not empty if a grammar dictionary was found ~}}
                {{~#if (op "!==" (get "rx-match-grammar-dicts") "") ~}}
                    {{~set "found-grammar-dict" true ~}}
                {{/if~}}
            {{~/each~}}
    
            {{~! Additional case when "Result grouping mode" is set to "No Grouping"~}}
            {{~#set "rx-match-grammar-dicts" ~}}
                {{~#regexMatch (get "opt-grammar-override-dict-regex") "u"~}}{{this.definition.dictionary}}{{~/regexMatch~}}
            {{/set~}}
            {{~! rx-match-grammar-dicts is not empty if a grammar dictionary was found ~}}
            {{~#if (op "!==" (get "rx-match-grammar-dicts") "") ~}}
                {{~set "found-grammar-dict" true ~}}
            {{/if~}}
    
            {{~#each definition.frequencies~}}
    
                {{~! rx-match-ignored-freq is not empty if ignored <=> rx-match-ignored-freq is empty if not ignored ~}}
                {{~#set "rx-match-ignored-freq" ~}}
                    {{~#regexMatch (get "opt-ignored-freq-dict-regex") "u"~}}{{this.dictionary}}{{~/regexMatch~}}
                {{/set~}}
    
                {{~#set "rx-match-ignored-value" ~}}
                    {{~#regexMatch (get "opt-ignored-freq-value-regex") "u"~}}{{this.frequency}}{{~/regexMatch~}}
                {{/set~}}
                {{~#if (op "&&" (op "===" (get "rx-match-ignored-freq") "") (op "===" (get "rx-match-ignored-value") ""))~}}
    
                    {{~!
                        only uses the 1st frequency of any dictionary.
                        For example, if JPDB lists 440 and 26189㋕, only the first 440 will be used.
                    ~}}
                    {{~set "read-freq" false ~}}
                    {{~#if (op "!==" (get "prev-freq-dict") this.dictionary ) ~}}
                        {{~set "read-freq" true ~}}
                        {{~set "prev-freq-dict" this.dictionary ~}}
                    {{/if~}}
    
                    {{~#if (op "!" (get "read-freq") ) ~}}
                        {{~#set "rx-match-keep-freqs" ~}}
                            {{~#regexMatch (get "opt-keep-freqs-past-first-regex") "u"~}}{{this.dictionary}}{{~/regexMatch~}}
                        {{/set~}}
    
                        {{~! rx-match-keep-freqs is not empty if keep freqs ~}}
                        {{~#if (op "!==" (get "rx-match-keep-freqs") "") ~}}
                            {{~set "read-freq" true ~}}
                        {{/if~}}
                    {{/if~}}
    
                    {{~#if (get "read-freq") ~}}
                        {{~#set "numericFrequencyMatch"}}{{~#regexMatch "\d+" ""}}{{~this.frequency~}}{{/regexMatch~}}{{/set~}}
                        {{~set "f" (op "+" (get "numericFrequencyMatch")) ~}}
                        {{~#if (op "===" (get "opt-freq-sorting-method") "min") ~}}
                            {{~#if
                                (op "||"
                                    (op "===" (get "result-freq") -1)
                                    (op ">" (get "result-freq") (get "f"))
                                )
                            ~}}
                                {{~set "result-freq" (op "+" (get "f")) ~}}
                            {{~/if~}}
    
                        {{~else if (op "===" (get "opt-freq-sorting-method") "first") ~}}
                            {{~#if (op "===" (get "result-freq") -1) ~}}
                                {{~set "result-freq" (get "f") ~}}
                            {{~/if~}}
    
                        {{~else if (op "===" (get "opt-freq-sorting-method") "avg") ~}}
    
                            {{~#if (op "===" (get "result-freq") -1) ~}}
                                {{~set "result-freq" (get "f") ~}}
                            {{~else~}}
                                {{~!
                                    iterative mean formula (to prevent floating point overflow):
                                        $S_{(t+1)} = S_t + \frac{1}{t+1} (x - S_t)$
                                    - example java implementation: https://stackoverflow.com/a/1934266
                                    - proof: https://www.heikohoffmann.de/htmlthesis/node134.html
                                ~}}
                                {{~set "result-freq"
                                    (op "+"
                                        (get "result-freq")
                                        (op "/"
                                            (op "-"
                                                (get "f")
                                                (get "result-freq")
                                            )
                                            (get "t")
                                        )
                                    )
                                }}
                            {{~/if~}}
                            {{~set "t" (op "+" (get "t") 1) ~}}
    
                        {{~else if (op "===" (get "opt-freq-sorting-method") "harmonic") ~}}
                            {{~#if (op ">" (get "f") 0) ~}} {{~! ensures only positive numbers are used ~}}
                                {{~#if (op "===" (get "result-freq") -1) ~}}
                                    {{~set "result-freq" (op "/" 1 (get "f")) ~}}
                                {{~else ~}}
                                    {{~set "result-freq"
                                        (op "+"
                                            (get "result-freq")
                                            (op "/" 1 (get "f"))
                                        )
                                    }}
                                    {{~set "t" (op "+" (get "t") 1) ~}}
                                {{~/if~}}
                            {{~/if~}}
    
                        {{~else if (op "===" (get "opt-freq-sorting-method") "debug") ~}}
    
                            {{ this.dictionary }}: {{ this.frequency }} -> {{ get "f" }} <br>
    
                        {{~else~}}
                            (INVALID opt-freq-sorting-method value)
                        {{~/if~}}
    
                    {{~/if~}}
    
                {{~/if~}}
    
            {{~/each~}}
    
            {{~! (x) >> 0 apparently floors x: https://stackoverflow.com/a/4228528 ~}}
            {{~#if (op "===" (get "result-freq") -1) ~}}
                {{~set "result-freq" (get "opt-no-freq-default-value") ~}}
            {{~ else if (op "===" (get "opt-freq-sorting-method") "avg") ~}}
                {{~set "result-freq"
                    (op ">>" (get "result-freq") 0 )
                ~}}
            {{~ else if (op "===" (get "opt-freq-sorting-method") "harmonic") ~}}
                {{~set "result-freq"
                    (op ">>"
                        (op "*"
                            (op "/" 1 (get "result-freq"))
                            (get "t")
                        )
                        0
                    )
                ~}}
            {{~/if~}}
    
            {{~! override final result if grammar dictionary ~}}
            {{~#if (
                op "&&"
                    (op "===" (get "found-grammar-dict") true)
                    (op "===" (get "opt-grammar-override") true)
                )
            ~}}
                {{~set "result-freq" (get "opt-grammar-override-value") ~}}
            {{/if}}
    
            {{~get "result-freq"~}}
        {{~/scope~}}
    {{/inline}}
  • In Configure Anki card format..., we may need to refresh the card model for the new field to show up.

    • To do this, change the model to something else and change it back.
    • ⚠️This will clear your fields, so take a screenshot to remember what you had.
      • You can try duplicating your card model in Anki and switching to/from that model, so hopefully your card fields will remain.
  • When your frequency field shows up, type in {freq} in its value box to use the handlebar.

freq Settings

The default settings within the freq handlebars code should work for most people. However, it can be customized if desired. To access the settings, head back to Yomichan's templates (Yomichan options → AnkiConfigure Anki card templates...), and view the lines right below {{#*inline "freq"}}.

Ignoring Frequency Dictionaries
  • By default, JLPT_Level is ignored. If you want to ignore other dictionaries, edit the opt-ignored-freq-dict-regex variable and join the dictionary names with |. For example, to ignore My amazing frequency dictionary, do the following:

    {{~#set 'opt-ignored-freq-dict-regex'~}} ^(JLPT_Level|My amazing frequency dictionary)$ {{~/set~}}
Ignoring Frequency Values
  • By default, any frequency value containing is ignored as it represents a value that does not appear in JPDB. If you want to ignore other values, edit the opt-ignored-freq-value-regex variable and join the ignored symbols with |. For example, to also ignore entries containing a symbol, do the following:

    {{~#set 'opt-ignored-freq-value-regex'~}} ❌|⚠ {{~/set~}}
  • If you do not wish to ignore any values, remove the ❌ symbol from the regex.

    {{~#set 'opt-ignored-freq-value-regex'~}}{{~/set~}}
Default Value For No Frequencies
  • When no frequencies are listed for the expression, the default value given is 9999999. This puts the card at the very end of the queue.

    Some users may prefer setting the default value to 0, as it would be a conscious decision to add a term without a frequency value and you may want to prioritize learning the term immediately. To do this, change the opt-no-freq-default-value variable. For example:

    {{~set 'opt-no-freq-default-value' 0~}}
Default Value For Grammar Dictionaries
  • By default, if you create a card for any grammar point, the frequency will be automatically set to 0. This is because it is very likely that you would want to prioritize reviewing grammar points as much as possible.

    The {freq} handlebars code determines whether a card is a grammar point or not by your installed grammar dictionaries. If the definition within the exported term contains any grammar dictionary, then it is considered as a grammar point. Otherwise, the term is treated like any other term.

    Note: This may incorrectly override the frequency for some terms that might not be considered as a grammar point. For example, 以前 can be used as a standalone word, but is an entry under the 毎日のんびり日本語教師 dictionary. In other words, with this feature enabled, 以前 will have its {freq} value incorrectly overwritten (to 0 by default).

    This incorrect override usually only happens for very common words anyways (JPDB ranks 以前 as 721), so this should not be a very big problem.

    The following table summarizes the options related to this.

    Option Description
    opt-grammar-override If set to true (default), overrides the resulting frequency with opt-grammar-override-value if at least one dictionary is determined to be a grammar dictionary. Set this variable to false in order to disable the behavior.
    opt-grammar-override-value The exact frequency value used for grammar dictionaries.
    opt-grammar-override-dict-regex The regex used in order to determine if a dictionary is a grammar dictionary. Edit this like any other dict-regex variable, i.e. by concatenating strings with |.
Sorting Method
  • The sorting method determines the resulting value of {freq}. By default, the harmonic frequency is chosen. This can be modified by changing opt-freq-sorting-method, e.g.

    {{~set 'opt-freq-sorting-method' 'first'~}}

    The following table shows the available sorting methods. Note that these are case sensitive!

    Sorting Method Description
    min Gets the smallest frequency available.
    first Gets the first frequency listed in Yomichan.
    The order of frequency dictionaries is determined by the Priority column under Yomichan settings → Configure installed and enabled dictionaries.... Dictionaries are sorted from highest to lowest priority.
    avg Gets the average (i.e. the arithmetic mean) of the frequencies.
    harmonic Gets the harmonic mean of the frequencies, which can be thought of as an in-between of min and avg. See below for more details. This is the default value.
    debug Internal mode to shows the dictionaries and frequencies for each dictionary, after being filtered from opt-ignored-freq-dict-regex and opt-keep-freqs-past-first-regex. Useful when testing the aforementioned regexes.

    The harmonic mean has the following properties that may make it more attractive to use over avg:

    • "The harmonic mean of a list of numbers tends strongly toward the least elements of the list."1 In other words, a frequency dictionary with an abnormally large value will not greatly affect the resulting value. Conversely, a frequency dictionary with an abnormally small value will affect the resulting value more than avg, but still less so than simply using min.
    • The harmonic mean is always greater than (or equal) to the minimum number and always less than (or equal) to the arithmetic mean.2

    This makes harmonic ideal for people who want a statistic that takes into account all numbers, but does not arbitrarily deviate due to large outliers (which avg can easily do).

Reading Multiple Frequencies from the Same Dictionary
  • Some frequency dictionaries have multiple numbers displayed. Among these dictionaries, there are two ways that these these can be stored:

    1. The frequency is stored as one string. For example, with 青空文庫熟語, the frequency is "160 (5406)". Only the first number (160) can be grabbed from this, and any numbers past this cannot be received without hacking the code.

    2. The frequency is stored as multiple strings. For example with JPDB, the frequency for 読む is stored as "440" and "26189 ㋕" (with the latter being read as 21689).

      By default, only the first number (440) will be considered in the sorting method. If you want the sorting method to also consider other numbers (such as 26189), add the desired dictionary to the opt-keep-freqs-past-first-regex variable, similarly to how dictionaries are added to opt-ignored-freq-dict-regex (concatenated with |).

      For example, adding JPDB to the variable will result in the following:

      {{~#set 'opt-keep-freqs-past-first-regex'~}} ^(JPDB)$ {{~/set~}}

      And adding JPDB VN3 万 as well will result in the following:

      {{~#set 'opt-keep-freqs-past-first-regex'~}} ^(JPDB|JPDB VN3万)$ {{~/set~}}

Usage

Use the AnkiAutoReorder addon to have your backlog sort automatically on refresh.

  • Enter your search query (search_to_sort) and your sort_field into the addon's config (Tools > Addons > AutoReorder > Config).
    • You can get the appropriate search query by going to the Browse window, then ctrl click your deck name and the "New" card state. The string at the top is the search query you can use in the addon settings, it should have the deck name and is:new.
  • Then reorder your deck by frequency from Tools > Reposition Cards. Remember to do this every day after adding new cards.

I also recommend installing the Advanced Browser addon to display the frequency field in Anki's browse page.

Below: right click the column headers at the top with Advanced Browser installed to select new fields to be displayed.

Alternatively, after installing Advanced Browser, you could sort by the frequency field and press ctrl + a then ctrl + shift + s to select all cards and reorder.

Backfilling Old Cards

If you already have a large backlog of old cards without frequency values, you might need to fill in these values first or they won't be sorted. There are two methods listed below to do exactly that. The command line method runs much faster than the Anki method, but requires some command line knowledge to pull off.

Of course, you could just opt to finish reviewing these cards first instead of backfilling the old cards.

Warning: Make sure to backup your collection before trying either method below.

Differences between the backfill .txt files

  • JPDB.txt - Japanese list from jpdb.io
  • cc100.txt - The CC100 dataset as described in the Frequency Dictionaries section.
  • vnsfreqSTARS.txt and vnsfreq.txt - Japanese frequency lists from visual novels
  • BLCUcoll.txt and BLCUlit.txt - Chinese frequency lists from colloquial and literary text from the BLCU BCC Corpus.
  • SUBTLEX-CH.txt - Chinese frequency list based on movie/TV subtitles from SUBTLEX-CH.

Note that the Japanese ones are selected by default when backfilling via the command line; you will have to use the --freq-lists option to specify other lists.

Backfilling: Command Line (Recommended)
  • Install the latest version of Python if you do not have it already installed. Any Python version 3.8 or above should work.

  • Install AnkiConnect if you do not have it already installed.

  • Open Anki. If you just installed AnkiConnect, make sure to restart Anki so AnkiConnect is properly running.

    • Note that this will not work via WSL due to networking constraints. If you are on Windows you will have to use Command Prompt or Powershell.
  • Run the following commands:

    git clone "https://github.com/MarvNC/JP-Resources.git"
    cd JP-Resources
    cd frequency
    
    # Linux users might have to use `python3` instead of `python`.
    # Replace "Expression" with the exact field name that contains the word/expression.
    python backfill.py "Expression"

    Here are some more examples on how to use backfill.py:

    # View all possible arguments.
    python backfill.py --help
    
    # Searches for the expression in the field "Word" instead of "Expression"
    # Note that this is case sensitive!
    python backfill.py "Word"
    
    # Sets all expressions without any found frequencies to the default value of '0'.
    python backfill.py "Expression" --default 0
    
    # Uses the field "FrequencySort" instead of the default ("Frequency").
    # This also changes the default query to search an empty `FrequencySort` field.
    python backfill.py "Expression" --freq-field "FrequencySort"
    
    # Uses a custom query instead of the default ("Expression:* Frequency:").
    # Note: For powershell users, you must escape the quotes with an additional backtick:
    #     --query "Frequency: \`"note:My mining note\`""
    python backfill.py "Expression" --query "Frequency: \"note:My mining note\""
    
    # This custom query can be used to override all of your existing frequencies,
    # instead of just backfilling. RUN THIS WITH CAUTION!
    python backfill.py "Expression" --query "\"note:My mining note\""
    
    # Changes the order of which frequency list is used first.
    python backfill.py "Expression" --freq-lists "vnsfreq.txt" "JPDB.txt"
Backfilling: Within Anki
  • This is a hacky method to backfill your old cards. Again, make sure to backup your collection before attempting this, it could cause significant lag to your Anki. In addition, for users of Anki 2.1.50+ increase your backup interval before attempting the import as it will take a long time. A backup occurring while you're waiting on Anki to delete cards will just cause more lag.

  • Create a frequency list in .txt format that contains a list of expressions followed by frequency values. You can use the ones I have created here, I recommend downloading the JPDB list as it's the most exhaustive. However the VN Stars list also fills in some of the gaps that JPDB doesn't cover, so you could import it first, then import JPDB afterward for maximum frequency coverage.

  • In Anki, create a new temporary deck and move your backlogged cards to the new deck, then tag them for later.

    • Search for the backlogged new cards using deck:{deckname} is:new in your card browser, then hit ctrl + a to select them all then ctrl + d to bring up the "Change Deck" menu from which you can create a new deck (named temp or whatever you like) and move them.
    • Select this new deck, then tag them using ctrl + a then ctrl + shift + a to add a new tag, where you can type in something like backlog.
  • With this temporary deck selected, go to File -> Import, then select the txt frequency list. Map the first field to your term/expression field, then the second field to your frequency field. Make sure to enable "Update existing notes when first field matches." Then import it to your temporary deck.

  • This will update your existing notes' frequency values, but it'll also import a LOT of new unneeded cards.

    • Search for your backlogged cards using tag:backlog and then again hit ctrl + a then ctrl + d to move them back to your vocabulary deck. Now we can simply delete the temporary deck along with the all the new cards that were added, just make sure you aren't deleting any actual cards first.
  • Finally, you can right click the backlog tag in the sidebar and delete it.

Fitting in Cards Without Frequencies

If you frequently make cards that don't contain frequencies, such as sentence or grammar cards, you won't be able to pull frequencies from dictionaries. If you tag all of these cards specifically, you can use this plugin to generate random frequencies for these cards.

Backfilling Stylized Frequencies in JP Mining Note

In the JP Mining Note Anki note type, there is also a FrequenciesStylized field for displaying the values from various frequency dictionaries on the front of the card. Due to the specific formatting requirements of this field, it cannot be backfilled with the above methods. A separate script is provided in the frequencies/frequenciesstylized folder for this purpose.

Warning: As always, back up your entire collection before performing any steps from this section

Configuring the dictionary list

Before running the script, you will need to configure the list of frequency dictionaries to be used:

The set of frequency dictionaries to use can be configured by editing the dict_names.py file. The default values in this file are shown below:

dict_names = [
    ('JPDB-stylized.txt', 'JPDB'),
    ('../vnsfreq.txt', 'VN Freq'),
    ('JLPT-stylized.txt', 'JLPT')
]

The order of the dictionaries in this list determines the order that the frequencies will appear in the FrequenciesStylized field. Within each entry, the first parameter is the relative filepath to the frequency list, and the second parameter is the display name you want to use for that dictionary.

For example, the above configuration produces the following result for 返事:

If you change the dict_names.py file to:

dict_names = [
    ('../vnsfreq.txt', 'VN Freq'),
    ('JPDB-stylized.txt', 'jpdb'),
    ('JLPT-stylized.txt', 'jlpt')
]

Then it will now produce this output: (note the lowercase dictionary names)

Note the ../ in the filepath for the VN Freq dictionary. This script can use any of the frequency lists that are used by backfill.py. However, if there is a stylized version of a frequency list, then it is highly recommended that you use that one, rather than the simpler version. This is because the stylized version includes additional formatting, such as JPDB's ㋕ marker for kana frequencies.

Stylized versions of frequency lists also include the reading for each word, so if your cards have the WordReadingHiragana field filled in, then the script can ensure that only the frequencies for the correct reading are used. If your notes do not have the WordReadingHiragana field filled, then it's highly recommended that you fill it using the instructions on the JP Mining Note docs.

Included Stylized Frequency Dictionaries
  • JPDB-stylized.txt - Same as JPDB.txt above, but includes the ㋕ marker to indicate kana form frequency, and word readings to differentiate between different words that use the same kanji.
  • cc100-stylized.txt - The CC100 dataset as described in the Frequency Dictionaries section.
  • JLPT-stylized.txt - Provides the JLPT level for words tested on the JLPT. Extracted from stephenmk's yomichan-jlpt-vocab yomichan dictionary.

Running the script

Once you have configured the list of dictionaries to use, you can run the script. The simplest way to run this script is to navigate into the frequencies/frequenciesstylized folder, and run:

# Linux users might have to use `python3` instead of `python`.
python backfill-stylized.py

This will search your collection for all notes of type JP Mining Note with an empty FrequenciesStylized field. It will then fill those fields with the appropriate frequency information as determined by your configuration in dict_names.py. It will also tag every note it modifies with the tag backfill-stylized. There are two options that can be used with this script:

query

The --query option works in the same way as it does in the standard backfill.py script. This allows you to use a custom query to find the cards to modify.

For example, if you want to overwrite the FrequenciesStylized field for all JP Mining Notes, and not just those where the field is already empty, you can use the following:

# This custom query can be used to override all of your existing frequencies,
# instead of just backfilling. RUN THIS WITH CAUTION!
python backfill-stylized.py --query "\"note:JP Mining Note\""

One thing to be careful of is that your custom query must only return notes of type JP Mining Note with Word and FrequenciesStylized fields. If it returns any other type of note, it will throw an error. You can ensure only JP Mining Notes are returned by always including \"note:JP Mining Note\" in your queries.

tag

By default, every note that is modified by this script will be tagged with the tag backfill-stylized. This makes it easy to revert your changes if you make a mistake. To reset the modified cards and start again, search for them in the Anki browser using tag:backfill-stylized, select all the cards, and then clear the FrequenciesStylized field using the procedure in the next section.

Once you are happy with your cards, you can remove the tags by searching Anki for tag:backfill-stylized, and using Notes -> Remove Tags... to remove backfill-stylized.

If you want to use a different tag, you can use the --tag option:

# Tags all modified notes with "modified-stylized-freq"
python backfill-stylized.py --tag "modified-stylized-freq"

If you don't want the script to tag any notes, use --tag ""

# Prevents the script from tagging any notes
python backfill-stylized.py --tag ""

Clearing the FrequenciesStylized field

If you have never edited the FrequenciesStylized field on a note, then it is probably completely empty, and backfill-stylized.py will be able to find the note.

However, in some cases, the FrequenciesStylized field might look empty, when in fact it has some hidden HTML tags in it. In this case, the script will not be able to find these notes, since it is only looking for notes where this field is empty.

FrequenciesStylized looks empty But it actually has hidden HTML elements

You can clear this HTML directly by clicking on the HTML toggle button marked in the above image. Then just delete the HTML from the editor.

If you need to completely clear the FrequenciesStylized field for several cards at once, first select all the relevent cards in the Anki browser. Then, go to Notes -> Find and Replace... and enter the options shown below.

WARNING: Unless you know exactly what you're doing, only use the options shown below. Using different options has the potential to delete an arbitrary amount of information from an arbitrary number of cards in your collection

After clicking OK, the FrequenciesStylized field for all selected notes will be completely emptied.

Anki Card Blur

When adding cards from VNs, we might find some risque content that we still want to look at while reviewing because it's cute. However, you might review in places where you don't always want other people to see your cards. Using this card template, we can blur media in Anki and have the option persist throughout a review session.

x Blur disabled Blur enabled
SFW
NSFW

Media: ハミダシクリエイティブ © まどそふと

How-To

  • Decide on a tag for NSFW cards. I use -NSFW so the tag is sorted first for easy access. If you choose something else you'll need to replace all instances of -NSFW in this guide with your tag name (with ctrl + h in a text editor or an online tool).

  • Tag your NSFW cards with this tag in Anki (see ShareX Hotkey).

  • Download the anki-persistence script (minified.js or script.js) from here. Then rename it __persistence.js and place it in your Anki user/media folder.

Card Template/Code

  • In your card template where you want the image to go, paste in this HTML, renaming {{Picture}} to match the name of the field that contains your media.
<div id="main_image" class="{{Tags}}">
  <a onclick="toggleNsfw()">{{Picture}}</a>
</div>
  • Then, at the end of the template paste in this code:
<script src="__persistence.js"></script>

<script>
  // nsfw https://github.com/MarvNC/JP-Resources
  (function () {
    const nsfwDefaultPC = true;
    const nsfwDefaultMobile = false;
    const imageDiv = document.getElementById('main_image');
    const image = imageDiv.querySelector('a img');
    if (!image) {
      imageDiv.parentNode.removeChild(imageDiv);
    }
    let loaded = false;
    setInterval(() => {
      if (!loaded) {
        if (typeof Persistence === 'undefined') {
          return;
        }
        loaded = true;

        let onMobile = document.documentElement.classList.contains('mobile');
        let nsfwAllowed = onMobile ? nsfwDefaultMobile : nsfwDefaultPC;
        if (Persistence.isAvailable() && Persistence.getItem('nsfwAllowed') == null) {
          Persistence.setItem('nsfwAllowed', nsfwAllowed);
        } else if (Persistence.isAvailable()) {
          nsfwAllowed = Persistence.getItem('nsfwAllowed');
        }
        setImageStyle(nsfwAllowed);
      }
    }, 50);
  })();

  function toggleNsfw() {
    if (Persistence.isAvailable()) {
      let nsfwAllowed = !!Persistence.getItem('nsfwAllowed');
      nsfwAllowed = !nsfwAllowed;
      Persistence.setItem('nsfwAllowed', nsfwAllowed);
      setImageStyle(nsfwAllowed);
    } else {
      setImageStyle(undefined, true);
    }
  }

  function setImageStyle(nsfwAllowed = undefined, toggle = false) {
    const imageDiv = document.getElementById('main_image');
    const image = imageDiv.querySelector('img');

    if (nsfwAllowed != undefined) {
      imageDiv.classList.toggle('nsfwAllowed', nsfwAllowed);
    } else if (toggle) {
      imageDiv.classList.toggle('nsfwAllowed');
    }
  }
</script>

CSS

Then in your card styling paste in the following css, making sure to replace -NSFW with your tag name.

#main_image.nsfwAllowed {
  border-top: 2.5px dashed fuchsia !important;
}
#main_image {
  border-top: 2.5px solid springgreen;
}
#main_image img {
  cursor: pointer;
}
#main_image.-NSFW {
  border-left: 2.5px dashed red;
  border-right: 2.5px dashed red;
  border-bottom: 2.5px dashed red;
}
#main_image.nsfwAllowed.-NSFW {
  border-top: 2.5px dashed red !important;
}
#main_image.-NSFW img {
  filter: blur(30px);
}
#main_image.nsfwAllowed img {
  filter: blur(0px) !important;
}

Usage

During a review session, you can click/tap the image to toggle card blurring. When the blurring is enabled, there will be a solid green line at the top of the image. When blurring is not enabled, there will be a fuchsia dotted line, and when the card is NSFW the borders will be dotted red. This option will persist throughout a review session but the setting will reset after exiting the session.

Default to Enabled/Disabled

In the code we pasted in the template there are variables that can change whether blurring is enabled by default on desktop/mobile separately; the thought being that this script is primarily intended for reviewing on a phone. These variables can be changed with true marking that cards will not be blurred by default.

const nsfwDefaultPC = true;
const nsfwDefaultMobile = false;

Non Persistent/NoJS Version

If you want all cards to be blurred by default and for it to stay that way, you can simply do something like this instead. The .mobile part can be removed so it works on desktop as well.

HTML

<div class="main_image {{Tags}}">{{Picture}}</div>

CSS

.mobile .-NSFW img {
  filter: blur(30px);
}

.mobile .-NSFW img:hover {
  filter: blur(0px);
}

ShareX Hotkey for NSFW cards

I use the hotkeys in this guide (highly recommended) for adding images/audio to new cards while reading. For the screenshot hotkey, I have a hotkey in addition to the normal one that adds a -NSFW tag to the new card for convenience so they don't have to be tagged manually after creation. In the argument part of step 8, just use this code instead:

-NoProfile -Command "$medianame = \"%input\" | Split-Path -leaf; $data = Invoke-RestMethod -Uri http://127.0.0.1:8765 -Method Post -ContentType 'application/json; charset=UTF-8' -Body '{\"action\": \"findNotes\", \"version\": 6, \"params\": {\"query\":\"added:1\"}}'; $sortedlist = $data.result | Sort-Object -Descending {[Long]$_}; $noteid = $sortedlist[0]; Invoke-RestMethod -Uri http://127.0.0.1:8765 -Method Post -ContentType 'application/json; charset=UTF-8' -Body \"{`\"action`\": `\"updateNoteFields`\", `\"version`\": 6, `\"params`\": {`\"note`\":{`\"id`\":$noteid, `\"fields`\":{`\"Picture`\":`\"<img src=$medianame>`\"}}}}\"; " Invoke-RestMethod -Uri http://127.0.0.1:8765 -Method Post -ContentType 'application/json; charset=UTF-8' -Body \"{ `\"action`\": `\"addTags`\",`\"version`\": 6,`\"params`\": {`\"notes`\": [$noteid],`\"tags`\": `\"-NSFW`\"}}\";

Anki Automatically Highlight in Sentence

It's good practice to have your word highlighted within the target sentence so it's easier to see. You can do this for new cards by using the cloze options in Yomichan, but that doesn't affect existing cards that don't have the word highlighted. Here's some code to highlight the target expression within already existing cards. It's quite flexible, being able to work to some degree for most cards even if the sentence doesn't exactly contain the expression, or if it contains the expression but in hiragana or katakana.

To use it, simply append the following script to the end of a card.

  • You need to modify the lines specifying your field names by changing {{Expression}} and {{Reading}} to match your field.
  • You also need to modify the selector to select the part of your card containing your sentence. An easy way to do this would be to wrap your sentence in a div with an id of sentence so the selector is #sentence as it is by default. For example, <div id="sentence">{{Sentence}}</div>.
<script>
  // https://github.com/MarvNC/JP-Resources
  (function () {
    const expression = '{{Expression}}';
    const reading = '{{Reading}}';

    const sentenceElement = document.querySelector('#sentence');
    highlightWord(sentenceElement, expression, reading);
  })();

  function highlightWord(sentenceElement, expression, reading) {
    const sentence = sentenceElement.innerHTML;

    if (!sentence.match(/<(strong|b)>/)) {
      let possibleReplaces = [
        // shorten kanji expression
        shorten(expression, sentence, 1),
        // shorten with kana reading
        shorten(reading, sentence, 2),
        // find katakana
        shorten(hiraganaToKatakana(expression), sentence, 2),
        // find katakana with kana reading
        shorten(hiraganaToKatakana(reading), sentence, 2),
      ];

      // find and use longest one that is a substring of the sentence
      replace = possibleReplaces
        .filter((str) => str && sentence.includes(str))
        .reduce((a, b) => (a.length > b.length ? a : b));

      sentenceElement.innerHTML = sentenceElement.innerHTML.replace(
        new RegExp(replace, 'g'),
        `<strong>${replace}</strong>`
      );
    }
  }
  // takes an expression and shortens it until it's in the sentence
  function shorten(expression, sentence, minLength) {
    while (expression.length > minLength && !sentence.match(expression)) {
      expression = expression.substr(0, expression.length - 1);
    }
    return expression;
  }

  function hiraganaToKatakana(hiragana) {
    return hiragana.replace(/[\u3041-\u3096]/g, function (c) {
      return String.fromCharCode(c.charCodeAt(0) + 0x60);
    });
  }
</script>

Anki Automatic Hint Sentence for Kana Cards

Kana-only terms might be annoying to review in Anki as they're quite arbitrary and don't necessarily derive meaning from a kanji. This makes them potentially harder to recall than kanji terms, but not necessarily for much benefit as you'd come across onomatopoeia with context in the while making them somewhat self explanatory as to what they're describing.

Because of this, you might find it helpful to conditionally display the sentence on the front of your cards to be able to learn kana terms along with the context.

Media: 蒼の彼方のフォーリズム EXTRA1 © sprite

In order to conditionally display the sentence on the front, put the following html on the front of your card template where you want your hint sentence.

  • Replace all instances of {{Sentence}} with the name of your sentence field, and the same with {{Expression}} in the code.

  • The anchor linking to jpdb is completely optional, and is used to make easy searches on jpdb. If you don't want this you can just replace the second line with {{Sentence}}.

<div id="hintSentence" style="display: none">
  <a href="https://jpdb.io/search?q={{Sentence}}">{{Sentence}}</a>
</div>

Put this code at the bottom of your front card template, making sure to rename {{Expression}} to match your field name.

<script>
  // https://github.com/MarvNC/JP-Resources
  (function () {
    // prevent loading this js on back side of card
    if (document.getElementById('answer')) {
      return;
    }

    const expression = '{{Expression}}';
    const furigana = '{{Reading}}';

    const kanjiRegex = /[\u4e00-\u9faf]/g;

    if (!expression.match(kanjiRegex)) {
      const hintSentence = document.getElementById('hintSentence');
      hintSentence.style.display = 'block';
      const sentenceElement = document.querySelector('#hintSentence a');

      highlightWord(sentenceElement, expression, furigana);
    }
  })();
</script>

Yomichan Text Replacement Patterns

Some text replacement patterns in Yomichan Settings -> Translation -> Custom Text Replacement Patterns that I've found useful for better parsing.

If it might save you some time, you can optionally download the text replacement patterns from here, export your config, replace them in the appropriate spot, and reimport. Thanks to Julian for providing the export.

  • Some expressions may occasionally be written using numerals and most dictionaries only have entries for the kanji version. You could try replacing 0 with 十 and so on for larger numbers, but it dosen't seem to be worth it in my experience.
    • 鯛も1人はうまからず
    • 3種

1|1 ->
2|2 ->
3|3 ->
4|4 ->
5|5 ->
6|6 ->
7|7 ->
8|8 ->
9|9 ->

  • Occasionally there are things with dots, dashes, or other miscellaneous things in it that you want to scan.
    • コピ・ルアク
    • 「ど、どうですか……?モノに、なってきてます……?」

・|、|\-|\.|‐|\s -> (nothing)

  • Sometimes katakana verbs will use ッ in the past tense form and won't be picked up by Yomichan.
    • ハモッた
    • テンパッた

->

  • I should also mention the most important replacement pattern, replacing the 々 with the previous kanji as most monolingual dictionaries don't have entries for the 々 version. Credits to TheMoeWay's guide for the idea.
    • 囂々
    • 侃々諤々

(.)々 -> $1$1

Fixing the Font Language in Anki

If you're displaying Japanese/Chinese/Korean text in Anki, you might often get incorrect glyphs, as there are differences in the display of unified Han characters for different languages. In general, I recommend setting a lang tag in your Anki card template so that the card is rendered correctly.

At the beginning of your card template (both front and back sides) add the following line, replacing ja with zh, zh-hk, zh-tw or ko as appropriate. You can look up the ISO language code for the language you want to display online.

<span lang="ja">

Then at the very bottom of your card template just add a closing span tag.

</span>

You may already have set a custom font using CSS which is a good way to customize your cards but it does not guarantee full compatibility in cases where the glyph is not present in the font you're using.

Footnotes

  1. https://en.wikipedia.org/wiki/Harmonic_mean#Relationship_with_other_means

  2. https://en.wikipedia.org/wiki/Pythagorean_means#Inequalities_among_means