In [1]:
NAME = "Dustin Seltz"

Purpose:

This aims to answer Question 4: Taking into account the student’s progress and goals, what is the best set of 
kanji / vocab to teach to them next?

This file aims to use the frequency information regarding Twitter, News, Wikipedia, and Aozora (from https://scriptin.github.io/kanji-frequency/ ) and compare it to the difficulty levels and frequency from WaniKani, JLPT, Grade, and Genki in order to tell a user (who wants to learn how to read one Twitter or News or Wikipedia or Aozora) the optimal sequence to follow in learning Kanji. 

This file also contains some queries Oleksandra requested. Given that you have completed a certain level from a sequence (ex: "N1" from JLPT, "grade 6" from school, or level "50" from WaniKani) what are the next kanji you should learn?

Input:

cleaned_link.csv 
This file contains information for the 2136 Jōyō kanji. This program uses the difficulty levels and frequency information. 

Output:

Tells the user which learning sequence should be used to quickly learn each of the fours datasources. Currently only inline output, no csv. 

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
from numpy import isnan

In [3]:
##filename = "combined_genki_lessons.csv"
#filename = "new_combined_genki.csv"
filename = "../Question1/cleaned_link.csv"
df = pd.read_csv(filename)
print(len(df))
df.head()
#Index and Unnamed: 0 are the Joyo ranking. 
#When using the Joyo ranking I use the Unnamed: 0 as the column, so the index should be irrelevant to this program. 
#I would clean that up, but I actually just decided to remove Joyo analysis since it wasn't really relevant

2136


Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,kanji,strokes,frequency,grade,jlpt,parts,radicals,on_readings,...,Number of Appearances on Wikipedia,Percentage of Appearances on Wikipedia,Rank of Appearances on Wikipedia,Number of Appearances on News,Percentage of Appearances on News,Rank of Appearances on News,Number of Appearances on Aozora,Percentage of Appearances on Aozora,Rank of Appearances on Aozora,Genki_Lesson
0,0,0,亜,7.0,1509.0,junior high,N1,"['一', '｜', '口']",{'二': 'two'},['ア'],...,172858.0,0.00022,836.0,689.0,6.7e-05,1306.0,3506.0,6.8e-05,1710.0,
1,1,1,哀,9.0,1715.0,junior high,N1,"['亠', '口', '衣']","{'口': 'mouth, opening'}",['アイ'],...,19390.0,2.5e-05,1884.0,167.0,1.6e-05,1842.0,10141.0,0.000197,971.0,
2,2,2,挨,10.0,2258.0,junior high,,"['厶', '扎', '矢', '乞']",{'手 (扌龵)': 'hand'},['アイ'],...,12111.0,1.5e-05,2138.0,13.0,1e-06,2634.0,6784.0,0.000132,1249.0,
3,3,3,愛,13.0,640.0,grade 4,N3,"['冖', '夂', '心', '爪']","{'心 (忄, ⺗)': 'heart'}",['アイ'],...,754387.0,0.000962,269.0,5340.0,0.000518,503.0,54392.0,0.001057,213.0,
4,4,4,曖,17.0,,junior high,,"['冖', '夂', '心', '日', '爪']","{'日': 'sun, day'}",['アイ'],...,116055.0,0.000148,1025.0,30.0,3e-06,2371.0,1001.0,1.9e-05,2661.0,


In [4]:
#The source is actually not clear on where exactly the "News" is sourced
datasources = ["Twitter", "Aozora", "Wikipedia"]#, "News"]
kanjis = df["kanji"]

In [5]:
def createBins(numberOfBins, col):
    #Create a sorted version of the column. #No, that'll ruin the ordering
    #arr = []
    #for value in col:
    #    arr.append(value)
    #arr.sort()
    
    #qcut to get bin numbers for the column. 
    #    Each column's kanji now has a numeric level equivalent based on frequency. 
    #    This will allow us to compare bins, like N5 through N1 vs bins 1 through 5. 
    bins = pd.qcut(col, numberOfBins, labels=False)
    #Lets not start at 0, levels start at 1 for everything. 
    bins += 1
    print("Created bin column:", bins)
    print("Using ranges ", pd.qcut(col, numberOfBins))
    return bins

In [6]:
#Col name should be the col of the dataframe with the levels, 
#    translator translates those level strings to integer levels from 1..max_level, inclusive, 
#        Ex: "N5" translator should say is 1, while "N1" should be 5. (lowest to highest difficulty)
#    max_level is how many levels, like 60 for WaniKani's 1..60 system, or 5 for JLPT. 
def getAvgLevelDiff(level_col_name, translator, max_level):
    results = []
    for sourceName in datasources:
        bins = createBins(max_level, df["Rank of Appearances on "+sourceName])
        rankLevel_col = pd.DataFrame(bins)
        rankLevel_col.columns = ["Rank Converted to Level"]
        new_df = df.join(rankLevel_col)
        numberOfComparisons = 0
        numberOfLevelDifference = 0
        for kanji in kanjis:
            row = new_df[kanjis == kanji].iloc[0]
            colValue = row[level_col_name]
            #We need a numeric representation of the level, which our caller will define
            colLevel = translator(colValue)
            #We then separate into bins for comparison. 
            #Then take the average of ("level" (bin) number compared with the actual level)
            rankLevel = row["Rank Converted to Level"]
            diff = abs(colLevel - rankLevel)
            if(not isnan(diff)):
                numberOfComparisons += 1
                numberOfLevelDifference += diff
                print(kanji+": level "+str(colValue)+" translated to "+str(colLevel)+" and corresponds to rank " 
                      +str(rankLevel)+" with a diff of abs(level - rank)="+str(diff))
        averageLevelDifference = numberOfLevelDifference / numberOfComparisons
        results.append(averageLevelDifference)
        print(sourceName+" vs "+level_col_name+" average level difference="+str(averageLevelDifference))
    return results

In [None]:
#How strongly does WaniKani level corrolate with each source?
#WaniKani levels range from 1 to 60, with higher being harder (or rather, learned later. Harder or more obscure). 
intIsJustItself = lambda x: x
wani_levels = 60
wani_results = getAvgLevelDiff("wanikani_level", intIsJustItself, wani_levels)

Created bin column: 0       38.0
1       36.0
2       36.0
3        2.0
4       52.0
5        4.0
6       30.0
7       26.0
8       35.0
9       54.0
10      35.0
11       5.0
12      19.0
13      28.0
14       7.0
15      25.0
16       7.0
17      24.0
18      24.0
19      38.0
20      36.0
21      43.0
22      21.0
23      59.0
24      31.0
25      58.0
26      25.0
27      22.0
28      24.0
29      36.0
        ... 
2106    53.0
2107    10.0
2108    53.0
2109    14.0
2110    53.0
2111    57.0
2112    13.0
2113    35.0
2114    24.0
2115    25.0
2116    51.0
2117    15.0
2118    43.0
2119    30.0
2120    47.0
2121    52.0
2122    38.0
2123    49.0
2124    22.0
2125    16.0
2126    57.0
2127    19.0
2128     8.0
2129     2.0
2130    51.0
2131    38.0
2132    21.0
2133    37.0
2134    36.0
2135    27.0
Name: Rank of Appearances on Twitter, Length: 2136, dtype: float64
Using ranges  0       (1364.883, 1401.367]
1         (1285.917, 1326.4]
2         (1285.917, 1326.4]
3           (36.483

援: level 22.0 translated to 22.0 and corresponds to rank 10.0 with a diff of abs(level - rank)=12.0
園: level 16.0 translated to 16.0 and corresponds to rank 8.0 with a diff of abs(level - rank)=8.0
煙: level 18.0 translated to 18.0 and corresponds to rank 31.0 with a diff of abs(level - rank)=13.0
猿: level 46.0 translated to 46.0 and corresponds to rank 39.0 with a diff of abs(level - rank)=7.0
遠: level 16.0 translated to 16.0 and corresponds to rank 12.0 with a diff of abs(level - rank)=4.0
鉛: level 26.0 translated to 26.0 and corresponds to rank 50.0 with a diff of abs(level - rank)=24.0
塩: level 17.0 translated to 17.0 and corresponds to rank 25.0 with a diff of abs(level - rank)=8.0
演: level 23.0 translated to 23.0 and corresponds to rank 13.0 with a diff of abs(level - rank)=10.0
縁: level 44.0 translated to 44.0 and corresponds to rank 37.0 with a diff of abs(level - rank)=7.0
汚: level 32.0 translated to 32.0 and corresponds to rank 27.0 with a diff of abs(level - rank)=5.0
王: leve

勘: level 49.0 translated to 49.0 and corresponds to rank 25.0 with a diff of abs(level - rank)=24.0
患: level 37.0 translated to 37.0 and corresponds to rank 44.0 with a diff of abs(level - rank)=7.0
貫: level 52.0 translated to 52.0 and corresponds to rank 41.0 with a diff of abs(level - rank)=11.0
寒: level 12.0 translated to 12.0 and corresponds to rank 14.0 with a diff of abs(level - rank)=2.0
喚: level 51.0 translated to 51.0 and corresponds to rank 50.0 with a diff of abs(level - rank)=1.0
堪: level 59.0 translated to 59.0 and corresponds to rank 43.0 with a diff of abs(level - rank)=16.0
換: level 36.0 translated to 36.0 and corresponds to rank 11.0 with a diff of abs(level - rank)=25.0
敢: level 57.0 translated to 57.0 and corresponds to rank 40.0 with a diff of abs(level - rank)=17.0
款: level 57.0 translated to 57.0 and corresponds to rank 60.0 with a diff of abs(level - rank)=3.0
間: level 8.0 translated to 8.0 and corresponds to rank 1.0 with a diff of abs(level - rank)=7.0
閑: level

菊: level 46.0 translated to 46.0 and corresponds to rank 40.0 with a diff of abs(level - rank)=6.0
吉: level 44.0 translated to 44.0 and corresponds to rank 15.0 with a diff of abs(level - rank)=29.0
喫: level 18.0 translated to 18.0 and corresponds to rank 30.0 with a diff of abs(level - rank)=12.0
詰: level 29.0 translated to 29.0 and corresponds to rank 25.0 with a diff of abs(level - rank)=4.0
却: level 38.0 translated to 38.0 and corresponds to rank 44.0 with a diff of abs(level - rank)=6.0
客: level 9.0 translated to 9.0 and corresponds to rank 16.0 with a diff of abs(level - rank)=7.0
脚: level 45.0 translated to 45.0 and corresponds to rank 35.0 with a diff of abs(level - rank)=10.0
逆: level 28.0 translated to 28.0 and corresponds to rank 17.0 with a diff of abs(level - rank)=11.0
虐: level 53.0 translated to 53.0 and corresponds to rank 39.0 with a diff of abs(level - rank)=14.0
九: level 1.0 translated to 1.0 and corresponds to rank 20.0 with a diff of abs(level - rank)=19.0
久: level

敬: level 33.0 translated to 33.0 and corresponds to rank 30.0 with a diff of abs(level - rank)=3.0
景: level 25.0 translated to 25.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=2.0
軽: level 10.0 translated to 10.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=8.0
傾: level 38.0 translated to 38.0 and corresponds to rank 45.0 with a diff of abs(level - rank)=7.0
携: level 40.0 translated to 40.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=22.0
継: level 36.0 translated to 36.0 and corresponds to rank 31.0 with a diff of abs(level - rank)=5.0
慶: level 59.0 translated to 59.0 and corresponds to rank 39.0 with a diff of abs(level - rank)=20.0
憩: level 47.0 translated to 47.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=24.0
警: level 17.0 translated to 17.0 and corresponds to rank 21.0 with a diff of abs(level - rank)=4.0
鶏: level 58.0 translated to 58.0 and corresponds to rank 33.0 with a diff of abs(level - rank)=25.0
芸: lev

拘: level 49.0 translated to 49.0 and corresponds to rank 50.0 with a diff of abs(level - rank)=1.0
肯: level 51.0 translated to 51.0 and corresponds to rank 56.0 with a diff of abs(level - rank)=5.0
厚: level 20.0 translated to 20.0 and corresponds to rank 26.0 with a diff of abs(level - rank)=6.0
恒: level 52.0 translated to 52.0 and corresponds to rank 43.0 with a diff of abs(level - rank)=9.0
洪: level 56.0 translated to 56.0 and corresponds to rank 41.0 with a diff of abs(level - rank)=15.0
皇: level 33.0 translated to 33.0 and corresponds to rank 38.0 with a diff of abs(level - rank)=5.0
紅: level 34.0 translated to 34.0 and corresponds to rank 34.0 with a diff of abs(level - rank)=0.0
荒: level 42.0 translated to 42.0 and corresponds to rank 22.0 with a diff of abs(level - rank)=20.0
郊: level 51.0 translated to 51.0 and corresponds to rank 56.0 with a diff of abs(level - rank)=5.0
香: level 37.0 translated to 37.0 and corresponds to rank 16.0 with a diff of abs(level - rank)=21.0
候: leve

裁: level 23.0 translated to 23.0 and corresponds to rank 39.0 with a diff of abs(level - rank)=16.0
債: level 36.0 translated to 36.0 and corresponds to rank 54.0 with a diff of abs(level - rank)=18.0
催: level 29.0 translated to 29.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=6.0
歳: level 46.0 translated to 46.0 and corresponds to rank 11.0 with a diff of abs(level - rank)=35.0
載: level 24.0 translated to 24.0 and corresponds to rank 21.0 with a diff of abs(level - rank)=3.0
際: level 21.0 translated to 21.0 and corresponds to rank 14.0 with a diff of abs(level - rank)=7.0
埼: level 39.0 translated to 39.0 and corresponds to rank 12.0 with a diff of abs(level - rank)=27.0
在: level 20.0 translated to 20.0 and corresponds to rank 10.0 with a diff of abs(level - rank)=10.0
材: level 14.0 translated to 14.0 and corresponds to rank 27.0 with a diff of abs(level - rank)=13.0
剤: level 40.0 translated to 40.0 and corresponds to rank 36.0 with a diff of abs(level - rank)=4.0
財: l

治: level 16.0 translated to 16.0 and corresponds to rank 12.0 with a diff of abs(level - rank)=4.0
持: level 9.0 translated to 9.0 and corresponds to rank 4.0 with a diff of abs(level - rank)=5.0
時: level 7.0 translated to 7.0 and corresponds to rank 1.0 with a diff of abs(level - rank)=6.0
滋: level 43.0 translated to 43.0 and corresponds to rank 31.0 with a diff of abs(level - rank)=12.0
慈: level 51.0 translated to 51.0 and corresponds to rank 53.0 with a diff of abs(level - rank)=2.0
辞: level 16.0 translated to 16.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=7.0
磁: level 34.0 translated to 34.0 and corresponds to rank 52.0 with a diff of abs(level - rank)=18.0
鹿: level 36.0 translated to 36.0 and corresponds to rank 15.0 with a diff of abs(level - rank)=21.0
式: level 15.0 translated to 15.0 and corresponds to rank 16.0 with a diff of abs(level - rank)=1.0
識: level 17.0 translated to 17.0 and corresponds to rank 20.0 with a diff of abs(level - rank)=3.0
軸: level 42.0

俊: level 40.0 translated to 40.0 and corresponds to rank 45.0 with a diff of abs(level - rank)=5.0
春: level 15.0 translated to 15.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=3.0
瞬: level 46.0 translated to 46.0 and corresponds to rank 19.0 with a diff of abs(level - rank)=27.0
旬: level 37.0 translated to 37.0 and corresponds to rank 41.0 with a diff of abs(level - rank)=4.0
巡: level 40.0 translated to 40.0 and corresponds to rank 35.0 with a diff of abs(level - rank)=5.0
盾: level 48.0 translated to 48.0 and corresponds to rank 45.0 with a diff of abs(level - rank)=3.0
准: level 53.0 translated to 53.0 and corresponds to rank 58.0 with a diff of abs(level - rank)=5.0
殉: level 60.0 translated to 60.0 and corresponds to rank 57.0 with a diff of abs(level - rank)=3.0
純: level 34.0 translated to 34.0 and corresponds to rank 25.0 with a diff of abs(level - rank)=9.0
循: level 55.0 translated to 55.0 and corresponds to rank 54.0 with a diff of abs(level - rank)=1.0
順: level 

伸: level 36.0 translated to 36.0 and corresponds to rank 25.0 with a diff of abs(level - rank)=11.0
臣: level 29.0 translated to 29.0 and corresponds to rank 41.0 with a diff of abs(level - rank)=12.0
芯: level 47.0 translated to 47.0 and corresponds to rank 52.0 with a diff of abs(level - rank)=5.0
身: level 8.0 translated to 8.0 and corresponds to rank 9.0 with a diff of abs(level - rank)=1.0
辛: level 44.0 translated to 44.0 and corresponds to rank 12.0 with a diff of abs(level - rank)=32.0
侵: level 41.0 translated to 41.0 and corresponds to rank 45.0 with a diff of abs(level - rank)=4.0
信: level 15.0 translated to 15.0 and corresponds to rank 8.0 with a diff of abs(level - rank)=7.0
津: level 36.0 translated to 36.0 and corresponds to rank 12.0 with a diff of abs(level - rank)=24.0
神: level 11.0 translated to 11.0 and corresponds to rank 2.0 with a diff of abs(level - rank)=9.0
唇: level 47.0 translated to 47.0 and corresponds to rank 46.0 with a diff of abs(level - rank)=1.0
娠: level 38

切: level 3.0 translated to 3.0 and corresponds to rank 5.0 with a diff of abs(level - rank)=2.0
折: level 14.0 translated to 14.0 and corresponds to rank 26.0 with a diff of abs(level - rank)=12.0
拙: level 59.0 translated to 59.0 and corresponds to rank 57.0 with a diff of abs(level - rank)=2.0
窃: level 58.0 translated to 58.0 and corresponds to rank 57.0 with a diff of abs(level - rank)=1.0
接: level 26.0 translated to 26.0 and corresponds to rank 20.0 with a diff of abs(level - rank)=6.0
設: level 21.0 translated to 21.0 and corresponds to rank 22.0 with a diff of abs(level - rank)=1.0
雪: level 7.0 translated to 7.0 and corresponds to rank 34.0 with a diff of abs(level - rank)=27.0
摂: level 56.0 translated to 56.0 and corresponds to rank 40.0 with a diff of abs(level - rank)=16.0
節: level 19.0 translated to 19.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=4.0
説: level 17.0 translated to 17.0 and corresponds to rank 17.0 with a diff of abs(level - rank)=0.0
舌: level 19.

藻: level 60.0 translated to 60.0 and corresponds to rank 55.0 with a diff of abs(level - rank)=5.0
造: level 26.0 translated to 26.0 and corresponds to rank 26.0 with a diff of abs(level - rank)=0.0
像: level 13.0 translated to 13.0 and corresponds to rank 11.0 with a diff of abs(level - rank)=2.0
増: level 21.0 translated to 21.0 and corresponds to rank 13.0 with a diff of abs(level - rank)=8.0
憎: level 47.0 translated to 47.0 and corresponds to rank 46.0 with a diff of abs(level - rank)=1.0
蔵: level 33.0 translated to 33.0 and corresponds to rank 19.0 with a diff of abs(level - rank)=14.0
贈: level 38.0 translated to 38.0 and corresponds to rank 48.0 with a diff of abs(level - rank)=10.0
臓: level 34.0 translated to 34.0 and corresponds to rank 36.0 with a diff of abs(level - rank)=2.0
即: level 43.0 translated to 43.0 and corresponds to rank 32.0 with a diff of abs(level - rank)=11.0
束: level 14.0 translated to 14.0 and corresponds to rank 29.0 with a diff of abs(level - rank)=15.0
足: lev

短: level 12.0 translated to 12.0 and corresponds to rank 22.0 with a diff of abs(level - rank)=10.0
嘆: level 31.0 translated to 31.0 and corresponds to rank 54.0 with a diff of abs(level - rank)=23.0
端: level 27.0 translated to 27.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=9.0
誕: level 22.0 translated to 22.0 and corresponds to rank 7.0 with a diff of abs(level - rank)=15.0
鍛: level 46.0 translated to 46.0 and corresponds to rank 41.0 with a diff of abs(level - rank)=5.0
団: level 19.0 translated to 19.0 and corresponds to rank 13.0 with a diff of abs(level - rank)=6.0
男: level 4.0 translated to 4.0 and corresponds to rank 7.0 with a diff of abs(level - rank)=3.0
段: level 27.0 translated to 27.0 and corresponds to rank 12.0 with a diff of abs(level - rank)=15.0
断: level 21.0 translated to 21.0 and corresponds to rank 17.0 with a diff of abs(level - rank)=4.0
弾: level 37.0 translated to 37.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=14.0
暖: level 

邸: level 51.0 translated to 51.0 and corresponds to rank 49.0 with a diff of abs(level - rank)=2.0
亭: level 50.0 translated to 50.0 and corresponds to rank 34.0 with a diff of abs(level - rank)=16.0
貞: level 51.0 translated to 51.0 and corresponds to rank 41.0 with a diff of abs(level - rank)=10.0
帝: level 46.0 translated to 46.0 and corresponds to rank 41.0 with a diff of abs(level - rank)=5.0
訂: level 50.0 translated to 50.0 and corresponds to rank 49.0 with a diff of abs(level - rank)=1.0
庭: level 12.0 translated to 12.0 and corresponds to rank 29.0 with a diff of abs(level - rank)=17.0
停: level 23.0 translated to 23.0 and corresponds to rank 24.0 with a diff of abs(level - rank)=1.0
偵: level 31.0 translated to 31.0 and corresponds to rank 51.0 with a diff of abs(level - rank)=20.0
堤: level 50.0 translated to 50.0 and corresponds to rank 48.0 with a diff of abs(level - rank)=2.0
提: level 22.0 translated to 22.0 and corresponds to rank 27.0 with a diff of abs(level - rank)=5.0
程: lev

督: level 29.0 translated to 29.0 and corresponds to rank 30.0 with a diff of abs(level - rank)=1.0
徳: level 31.0 translated to 31.0 and corresponds to rank 26.0 with a diff of abs(level - rank)=5.0
篤: level 59.0 translated to 59.0 and corresponds to rank 54.0 with a diff of abs(level - rank)=5.0
毒: level 15.0 translated to 15.0 and corresponds to rank 31.0 with a diff of abs(level - rank)=16.0
独: level 26.0 translated to 26.0 and corresponds to rank 26.0 with a diff of abs(level - rank)=0.0
読: level 10.0 translated to 10.0 and corresponds to rank 10.0 with a diff of abs(level - rank)=0.0
栃: level 55.0 translated to 55.0 and corresponds to rank 30.0 with a diff of abs(level - rank)=25.0
凸: level 57.0 translated to 57.0 and corresponds to rank 49.0 with a diff of abs(level - rank)=8.0
突: level 26.0 translated to 26.0 and corresponds to rank 17.0 with a diff of abs(level - rank)=9.0
届: level 24.0 translated to 24.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=6.0
屯: level

伴: level 38.0 translated to 38.0 and corresponds to rank 49.0 with a diff of abs(level - rank)=11.0
判: level 21.0 translated to 21.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=2.0
坂: level 15.0 translated to 15.0 and corresponds to rank 16.0 with a diff of abs(level - rank)=1.0
阪: level 16.0 translated to 16.0 and corresponds to rank 3.0 with a diff of abs(level - rank)=13.0
板: level 29.0 translated to 29.0 and corresponds to rank 24.0 with a diff of abs(level - rank)=5.0
版: level 30.0 translated to 30.0 and corresponds to rank 27.0 with a diff of abs(level - rank)=3.0
班: level 48.0 translated to 48.0 and corresponds to rank 44.0 with a diff of abs(level - rank)=4.0
畔: level 60.0 translated to 60.0 and corresponds to rank 56.0 with a diff of abs(level - rank)=4.0
般: level 36.0 translated to 36.0 and corresponds to rank 34.0 with a diff of abs(level - rank)=2.0
販: level 24.0 translated to 24.0 and corresponds to rank 21.0 with a diff of abs(level - rank)=3.0
飯: level 

復: level 26.0 translated to 26.0 and corresponds to rank 17.0 with a diff of abs(level - rank)=9.0
福: level 13.0 translated to 13.0 and corresponds to rank 5.0 with a diff of abs(level - rank)=8.0
腹: level 27.0 translated to 27.0 and corresponds to rank 6.0 with a diff of abs(level - rank)=21.0
複: level 32.0 translated to 32.0 and corresponds to rank 37.0 with a diff of abs(level - rank)=5.0
覆: level 49.0 translated to 49.0 and corresponds to rank 37.0 with a diff of abs(level - rank)=12.0
払: level 35.0 translated to 35.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=12.0
沸: level 51.0 translated to 51.0 and corresponds to rank 46.0 with a diff of abs(level - rank)=5.0
仏: level 15.0 translated to 15.0 and corresponds to rank 34.0 with a diff of abs(level - rank)=19.0
物: level 9.0 translated to 9.0 and corresponds to rank 4.0 with a diff of abs(level - rank)=5.0
粉: level 31.0 translated to 31.0 and corresponds to rank 37.0 with a diff of abs(level - rank)=6.0
紛: level 42

牧: level 43.0 translated to 43.0 and corresponds to rank 31.0 with a diff of abs(level - rank)=12.0
睦: level 59.0 translated to 59.0 and corresponds to rank 55.0 with a diff of abs(level - rank)=4.0
僕: level 12.0 translated to 12.0 and corresponds to rank 10.0 with a diff of abs(level - rank)=2.0
墨: level 46.0 translated to 46.0 and corresponds to rank 38.0 with a diff of abs(level - rank)=8.0
撲: level 43.0 translated to 43.0 and corresponds to rank 44.0 with a diff of abs(level - rank)=1.0
没: level 52.0 translated to 52.0 and corresponds to rank 33.0 with a diff of abs(level - rank)=19.0
堀: level 40.0 translated to 40.0 and corresponds to rank 32.0 with a diff of abs(level - rank)=8.0
本: level 2.0 translated to 2.0 and corresponds to rank 1.0 with a diff of abs(level - rank)=1.0
奔: level 58.0 translated to 58.0 and corresponds to rank 58.0 with a diff of abs(level - rank)=0.0
翻: level 50.0 translated to 50.0 and corresponds to rank 52.0 with a diff of abs(level - rank)=2.0
凡: level 56

葉: level 10.0 translated to 10.0 and corresponds to rank 5.0 with a diff of abs(level - rank)=5.0
陽: level 12.0 translated to 12.0 and corresponds to rank 21.0 with a diff of abs(level - rank)=9.0
溶: level 48.0 translated to 48.0 and corresponds to rank 41.0 with a diff of abs(level - rank)=7.0
腰: level 24.0 translated to 24.0 and corresponds to rank 25.0 with a diff of abs(level - rank)=1.0
様: level 13.0 translated to 13.0 and corresponds to rank 3.0 with a diff of abs(level - rank)=10.0
踊: level 48.0 translated to 48.0 and corresponds to rank 28.0 with a diff of abs(level - rank)=20.0
養: level 13.0 translated to 13.0 and corresponds to rank 30.0 with a diff of abs(level - rank)=17.0
擁: level 52.0 translated to 52.0 and corresponds to rank 56.0 with a diff of abs(level - rank)=4.0
謡: level 54.0 translated to 54.0 and corresponds to rank 49.0 with a diff of abs(level - rank)=5.0
曜: level 16.0 translated to 16.0 and corresponds to rank 5.0 with a diff of abs(level - rank)=11.0
抑: level 

暦: level 45.0 translated to 45.0 and corresponds to rank 53.0 with a diff of abs(level - rank)=8.0
歴: level 19.0 translated to 19.0 and corresponds to rank 28.0 with a diff of abs(level - rank)=9.0
列: level 15.0 translated to 15.0 and corresponds to rank 26.0 with a diff of abs(level - rank)=11.0
劣: level 49.0 translated to 49.0 and corresponds to rank 48.0 with a diff of abs(level - rank)=1.0
烈: level 29.0 translated to 29.0 and corresponds to rank 42.0 with a diff of abs(level - rank)=13.0
裂: level 43.0 translated to 43.0 and corresponds to rank 40.0 with a diff of abs(level - rank)=3.0
恋: level 17.0 translated to 17.0 and corresponds to rank 16.0 with a diff of abs(level - rank)=1.0
連: level 19.0 translated to 19.0 and corresponds to rank 5.0 with a diff of abs(level - rank)=14.0
廉: level 60.0 translated to 60.0 and corresponds to rank 53.0 with a diff of abs(level - rank)=7.0
練: level 13.0 translated to 13.0 and corresponds to rank 10.0 with a diff of abs(level - rank)=3.0
錬: level

茨: level 52.0 translated to 52.0 and corresponds to rank 57.0 with a diff of abs(level - rank)=5.0
芋: level 34.0 translated to 34.0 and corresponds to rank 49.0 with a diff of abs(level - rank)=15.0
引: level 3.0 translated to 3.0 and corresponds to rank 5.0 with a diff of abs(level - rank)=2.0
印: level 26.0 translated to 26.0 and corresponds to rank 22.0 with a diff of abs(level - rank)=4.0
因: level 17.0 translated to 17.0 and corresponds to rank 21.0 with a diff of abs(level - rank)=4.0
姻: level 59.0 translated to 59.0 and corresponds to rank 59.0 with a diff of abs(level - rank)=0.0
員: level 12.0 translated to 12.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=6.0
院: level 10.0 translated to 10.0 and corresponds to rank 16.0 with a diff of abs(level - rank)=6.0
陰: level 45.0 translated to 45.0 and corresponds to rank 26.0 with a diff of abs(level - rank)=19.0
飲: level 10.0 translated to 10.0 and corresponds to rank 16.0 with a diff of abs(level - rank)=6.0
隠: level 25

課: level 13.0 translated to 13.0 and corresponds to rank 37.0 with a diff of abs(level - rank)=24.0
蚊: level 48.0 translated to 48.0 and corresponds to rank 46.0 with a diff of abs(level - rank)=2.0
牙: level 36.0 translated to 36.0 and corresponds to rank 52.0 with a diff of abs(level - rank)=16.0
我: level 26.0 translated to 26.0 and corresponds to rank 6.0 with a diff of abs(level - rank)=20.0
画: level 6.0 translated to 6.0 and corresponds to rank 9.0 with a diff of abs(level - rank)=3.0
芽: level 44.0 translated to 44.0 and corresponds to rank 44.0 with a diff of abs(level - rank)=0.0
賀: level 22.0 translated to 22.0 and corresponds to rank 30.0 with a diff of abs(level - rank)=8.0
雅: level 40.0 translated to 40.0 and corresponds to rank 46.0 with a diff of abs(level - rank)=6.0
餓: level 48.0 translated to 48.0 and corresponds to rank 51.0 with a diff of abs(level - rank)=3.0
介: level 35.0 translated to 35.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=17.0
回: level 5

岸: level 11.0 translated to 11.0 and corresponds to rank 16.0 with a diff of abs(level - rank)=5.0
岩: level 15.0 translated to 15.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=3.0
眼: level 32.0 translated to 32.0 and corresponds to rank 4.0 with a diff of abs(level - rank)=28.0
頑: level 14.0 translated to 14.0 and corresponds to rank 45.0 with a diff of abs(level - rank)=31.0
顔: level 10.0 translated to 10.0 and corresponds to rank 3.0 with a diff of abs(level - rank)=7.0
願: level 13.0 translated to 13.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=5.0
企: level 21.0 translated to 21.0 and corresponds to rank 42.0 with a diff of abs(level - rank)=21.0
伎: level 36.0 translated to 36.0 and corresponds to rank 51.0 with a diff of abs(level - rank)=15.0
危: level 16.0 translated to 16.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=7.0
机: level 32.0 translated to 32.0 and corresponds to rank 30.0 with a diff of abs(level - rank)=2.0
気: level

極: level 27.0 translated to 27.0 and corresponds to rank 12.0 with a diff of abs(level - rank)=15.0
玉: level 2.0 translated to 2.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=16.0
巾: level 47.0 translated to 47.0 and corresponds to rank 44.0 with a diff of abs(level - rank)=3.0
斤: level 5.0 translated to 5.0 and corresponds to rank 58.0 with a diff of abs(level - rank)=53.0
均: level 31.0 translated to 31.0 and corresponds to rank 46.0 with a diff of abs(level - rank)=15.0
近: level 5.0 translated to 5.0 and corresponds to rank 4.0 with a diff of abs(level - rank)=1.0
金: level 5.0 translated to 5.0 and corresponds to rank 4.0 with a diff of abs(level - rank)=1.0
菌: level 45.0 translated to 45.0 and corresponds to rank 57.0 with a diff of abs(level - rank)=12.0
勤: level 34.0 translated to 34.0 and corresponds to rank 28.0 with a diff of abs(level - rank)=6.0
琴: level 43.0 translated to 43.0 and corresponds to rank 43.0 with a diff of abs(level - rank)=0.0
筋: level 33.0 t

肩: level 24.0 translated to 24.0 and corresponds to rank 21.0 with a diff of abs(level - rank)=3.0
建: level 15.0 translated to 15.0 and corresponds to rank 17.0 with a diff of abs(level - rank)=2.0
研: level 8.0 translated to 8.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=10.0
県: level 9.0 translated to 9.0 and corresponds to rank 37.0 with a diff of abs(level - rank)=28.0
倹: level 60.0 translated to 60.0 and corresponds to rank 59.0 with a diff of abs(level - rank)=1.0
兼: level 40.0 translated to 40.0 and corresponds to rank 31.0 with a diff of abs(level - rank)=9.0
剣: level 35.0 translated to 35.0 and corresponds to rank 27.0 with a diff of abs(level - rank)=8.0
拳: level 41.0 translated to 41.0 and corresponds to rank 47.0 with a diff of abs(level - rank)=6.0
軒: level 51.0 translated to 51.0 and corresponds to rank 22.0 with a diff of abs(level - rank)=29.0
健: level 27.0 translated to 27.0 and corresponds to rank 26.0 with a diff of abs(level - rank)=1.0
険: level 16

更: level 30.0 translated to 30.0 and corresponds to rank 15.0 with a diff of abs(level - rank)=15.0
効: level 25.0 translated to 25.0 and corresponds to rank 36.0 with a diff of abs(level - rank)=11.0
幸: level 16.0 translated to 16.0 and corresponds to rank 11.0 with a diff of abs(level - rank)=5.0
拘: level 49.0 translated to 49.0 and corresponds to rank 38.0 with a diff of abs(level - rank)=11.0
肯: level 51.0 translated to 51.0 and corresponds to rank 45.0 with a diff of abs(level - rank)=6.0
厚: level 20.0 translated to 20.0 and corresponds to rank 33.0 with a diff of abs(level - rank)=13.0
恒: level 52.0 translated to 52.0 and corresponds to rank 53.0 with a diff of abs(level - rank)=1.0
洪: level 56.0 translated to 56.0 and corresponds to rank 54.0 with a diff of abs(level - rank)=2.0
皇: level 33.0 translated to 33.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=10.0
紅: level 34.0 translated to 34.0 and corresponds to rank 22.0 with a diff of abs(level - rank)=12.0
荒: l

宰: level 56.0 translated to 56.0 and corresponds to rank 47.0 with a diff of abs(level - rank)=9.0
栽: level 54.0 translated to 54.0 and corresponds to rank 56.0 with a diff of abs(level - rank)=2.0
彩: level 48.0 translated to 48.0 and corresponds to rank 39.0 with a diff of abs(level - rank)=9.0
採: level 32.0 translated to 32.0 and corresponds to rank 39.0 with a diff of abs(level - rank)=7.0
済: level 21.0 translated to 21.0 and corresponds to rank 19.0 with a diff of abs(level - rank)=2.0
祭: level 12.0 translated to 12.0 and corresponds to rank 27.0 with a diff of abs(level - rank)=15.0
斎: level 42.0 translated to 42.0 and corresponds to rank 27.0 with a diff of abs(level - rank)=15.0
細: level 17.0 translated to 17.0 and corresponds to rank 10.0 with a diff of abs(level - rank)=7.0
菜: level 31.0 translated to 31.0 and corresponds to rank 38.0 with a diff of abs(level - rank)=7.0
最: level 10.0 translated to 10.0 and corresponds to rank 5.0 with a diff of abs(level - rank)=5.0
裁: level 

紙: level 7.0 translated to 7.0 and corresponds to rank 8.0 with a diff of abs(level - rank)=1.0
脂: level 51.0 translated to 51.0 and corresponds to rank 50.0 with a diff of abs(level - rank)=1.0
視: level 24.0 translated to 24.0 and corresponds to rank 19.0 with a diff of abs(level - rank)=5.0
紫: level 49.0 translated to 49.0 and corresponds to rank 33.0 with a diff of abs(level - rank)=16.0
詞: level 19.0 translated to 19.0 and corresponds to rank 31.0 with a diff of abs(level - rank)=12.0
歯: level 12.0 translated to 12.0 and corresponds to rank 30.0 with a diff of abs(level - rank)=18.0
試: level 9.0 translated to 9.0 and corresponds to rank 24.0 with a diff of abs(level - rank)=15.0
詩: level 13.0 translated to 13.0 and corresponds to rank 14.0 with a diff of abs(level - rank)=1.0
資: level 21.0 translated to 21.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=2.0
飼: level 32.0 translated to 32.0 and corresponds to rank 44.0 with a diff of abs(level - rank)=12.0
誌: level 3

集: level 10.0 translated to 10.0 and corresponds to rank 7.0 with a diff of abs(level - rank)=3.0
酬: level 54.0 translated to 54.0 and corresponds to rank 53.0 with a diff of abs(level - rank)=1.0
醜: level 60.0 translated to 60.0 and corresponds to rank 40.0 with a diff of abs(level - rank)=20.0
襲: level 43.0 translated to 43.0 and corresponds to rank 32.0 with a diff of abs(level - rank)=11.0
十: level 1.0 translated to 1.0 and corresponds to rank 1.0 with a diff of abs(level - rank)=0.0
汁: level 35.0 translated to 35.0 and corresponds to rank 44.0 with a diff of abs(level - rank)=9.0
充: level 39.0 translated to 39.0 and corresponds to rank 27.0 with a diff of abs(level - rank)=12.0
住: level 8.0 translated to 8.0 and corresponds to rank 13.0 with a diff of abs(level - rank)=5.0
柔: level 43.0 translated to 43.0 and corresponds to rank 36.0 with a diff of abs(level - rank)=7.0
重: level 9.0 translated to 9.0 and corresponds to rank 6.0 with a diff of abs(level - rank)=3.0
従: level 26.0 tr

剰: level 53.0 translated to 53.0 and corresponds to rank 57.0 with a diff of abs(level - rank)=4.0
常: level 17.0 translated to 17.0 and corresponds to rank 7.0 with a diff of abs(level - rank)=10.0
情: level 13.0 translated to 13.0 and corresponds to rank 5.0 with a diff of abs(level - rank)=8.0
場: level 8.0 translated to 8.0 and corresponds to rank 3.0 with a diff of abs(level - rank)=5.0
畳: level 47.0 translated to 47.0 and corresponds to rank 28.0 with a diff of abs(level - rank)=19.0
蒸: level 33.0 translated to 33.0 and corresponds to rank 44.0 with a diff of abs(level - rank)=11.0
縄: level 36.0 translated to 36.0 and corresponds to rank 42.0 with a diff of abs(level - rank)=6.0
壌: level 53.0 translated to 53.0 and corresponds to rank 60.0 with a diff of abs(level - rank)=7.0
嬢: level 45.0 translated to 45.0 and corresponds to rank 28.0 with a diff of abs(level - rank)=17.0
錠: level 58.0 translated to 58.0 and corresponds to rank 53.0 with a diff of abs(level - rank)=5.0
譲: level 39

正: level 2.0 translated to 2.0 and corresponds to rank 3.0 with a diff of abs(level - rank)=1.0
生: level 3.0 translated to 3.0 and corresponds to rank 1.0 with a diff of abs(level - rank)=2.0
成: level 11.0 translated to 11.0 and corresponds to rank 3.0 with a diff of abs(level - rank)=8.0
西: level 5.0 translated to 5.0 and corresponds to rank 9.0 with a diff of abs(level - rank)=4.0
声: level 5.0 translated to 5.0 and corresponds to rank 4.0 with a diff of abs(level - rank)=1.0
制: level 21.0 translated to 21.0 and corresponds to rank 12.0 with a diff of abs(level - rank)=9.0
姓: level 36.0 translated to 36.0 and corresponds to rank 27.0 with a diff of abs(level - rank)=9.0
征: level 49.0 translated to 49.0 and corresponds to rank 43.0 with a diff of abs(level - rank)=6.0
性: level 14.0 translated to 14.0 and corresponds to rank 5.0 with a diff of abs(level - rank)=9.0
青: level 5.0 translated to 5.0 and corresponds to rank 4.0 with a diff of abs(level - rank)=1.0
斉: level 43.0 translated to

疎: level 54.0 translated to 54.0 and corresponds to rank 48.0 with a diff of abs(level - rank)=6.0
訴: level 25.0 translated to 25.0 and corresponds to rank 37.0 with a diff of abs(level - rank)=12.0
礎: level 49.0 translated to 49.0 and corresponds to rank 47.0 with a diff of abs(level - rank)=2.0
双: level 42.0 translated to 42.0 and corresponds to rank 40.0 with a diff of abs(level - rank)=2.0
壮: level 50.0 translated to 50.0 and corresponds to rank 38.0 with a diff of abs(level - rank)=12.0
早: level 4.0 translated to 4.0 and corresponds to rank 7.0 with a diff of abs(level - rank)=3.0
争: level 11.0 translated to 11.0 and corresponds to rank 17.0 with a diff of abs(level - rank)=6.0
走: level 5.0 translated to 5.0 and corresponds to rank 14.0 with a diff of abs(level - rank)=9.0
奏: level 38.0 translated to 38.0 and corresponds to rank 40.0 with a diff of abs(level - rank)=2.0
相: level 9.0 translated to 9.0 and corresponds to rank 4.0 with a diff of abs(level - rank)=5.0
荘: level 54.0 tr

端: level 27.0 translated to 27.0 and corresponds to rank 16.0 with a diff of abs(level - rank)=11.0
誕: level 22.0 translated to 22.0 and corresponds to rank 53.0 with a diff of abs(level - rank)=31.0
鍛: level 46.0 translated to 46.0 and corresponds to rank 50.0 with a diff of abs(level - rank)=4.0
団: level 19.0 translated to 19.0 and corresponds to rank 18.0 with a diff of abs(level - rank)=1.0
男: level 4.0 translated to 4.0 and corresponds to rank 3.0 with a diff of abs(level - rank)=1.0
段: level 27.0 translated to 27.0 and corresponds to rank 12.0 with a diff of abs(level - rank)=15.0
断: level 21.0 translated to 21.0 and corresponds to rank 13.0 with a diff of abs(level - rank)=8.0
弾: level 37.0 translated to 37.0 and corresponds to rank 28.0 with a diff of abs(level - rank)=9.0
暖: level 32.0 translated to 32.0 and corresponds to rank 37.0 with a diff of abs(level - rank)=5.0
談: level 9.0 translated to 9.0 and corresponds to rank 13.0 with a diff of abs(level - rank)=4.0
壇: level 49.

訂: level 50.0 translated to 50.0 and corresponds to rank 54.0 with a diff of abs(level - rank)=4.0
庭: level 12.0 translated to 12.0 and corresponds to rank 13.0 with a diff of abs(level - rank)=1.0
停: level 23.0 translated to 23.0 and corresponds to rank 32.0 with a diff of abs(level - rank)=9.0
偵: level 31.0 translated to 31.0 and corresponds to rank 37.0 with a diff of abs(level - rank)=6.0
堤: level 50.0 translated to 50.0 and corresponds to rank 47.0 with a diff of abs(level - rank)=3.0
提: level 22.0 translated to 22.0 and corresponds to rank 25.0 with a diff of abs(level - rank)=3.0
程: level 28.0 translated to 28.0 and corresponds to rank 9.0 with a diff of abs(level - rank)=19.0
艇: level 53.0 translated to 53.0 and corresponds to rank 44.0 with a diff of abs(level - rank)=9.0
締: level 27.0 translated to 27.0 and corresponds to rank 37.0 with a diff of abs(level - rank)=10.0
諦: level 22.0 translated to 22.0 and corresponds to rank 47.0 with a diff of abs(level - rank)=25.0
泥: level

匂: level 30.0 translated to 30.0 and corresponds to rank 36.0 with a diff of abs(level - rank)=6.0
肉: level 5.0 translated to 5.0 and corresponds to rank 15.0 with a diff of abs(level - rank)=10.0
虹: level 47.0 translated to 47.0 and corresponds to rank 54.0 with a diff of abs(level - rank)=7.0
日: level 2.0 translated to 2.0 and corresponds to rank 1.0 with a diff of abs(level - rank)=1.0
入: level 1.0 translated to 1.0 and corresponds to rank 2.0 with a diff of abs(level - rank)=1.0
乳: level 23.0 translated to 23.0 and corresponds to rank 31.0 with a diff of abs(level - rank)=8.0
尿: level 55.0 translated to 55.0 and corresponds to rank 58.0 with a diff of abs(level - rank)=3.0
任: level 21.0 translated to 21.0 and corresponds to rank 21.0 with a diff of abs(level - rank)=0.0
妊: level 38.0 translated to 38.0 and corresponds to rank 58.0 with a diff of abs(level - rank)=20.0
忍: level 44.0 translated to 44.0 and corresponds to rank 26.0 with a diff of abs(level - rank)=18.0
認: level 21.0 t

繁: level 40.0 translated to 40.0 and corresponds to rank 33.0 with a diff of abs(level - rank)=7.0
藩: level 58.0 translated to 58.0 and corresponds to rank 37.0 with a diff of abs(level - rank)=21.0
晩: level 15.0 translated to 15.0 and corresponds to rank 14.0 with a diff of abs(level - rank)=1.0
番: level 8.0 translated to 8.0 and corresponds to rank 9.0 with a diff of abs(level - rank)=1.0
蛮: level 60.0 translated to 60.0 and corresponds to rank 51.0 with a diff of abs(level - rank)=9.0
盤: level 38.0 translated to 38.0 and corresponds to rank 41.0 with a diff of abs(level - rank)=3.0
比: level 19.0 translated to 19.0 and corresponds to rank 17.0 with a diff of abs(level - rank)=2.0
皮: level 5.0 translated to 5.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=18.0
妃: level 49.0 translated to 49.0 and corresponds to rank 55.0 with a diff of abs(level - rank)=6.0
否: level 33.0 translated to 33.0 and corresponds to rank 24.0 with a diff of abs(level - rank)=9.0
批: level 21.0

文: level 2.0 translated to 2.0 and corresponds to rank 2.0 with a diff of abs(level - rank)=0.0
聞: level 10.0 translated to 10.0 and corresponds to rank 3.0 with a diff of abs(level - rank)=7.0
平: level 4.0 translated to 4.0 and corresponds to rank 4.0 with a diff of abs(level - rank)=0.0
兵: level 17.0 translated to 17.0 and corresponds to rank 8.0 with a diff of abs(level - rank)=9.0
併: level 38.0 translated to 38.0 and corresponds to rank 28.0 with a diff of abs(level - rank)=10.0
並: level 28.0 translated to 28.0 and corresponds to rank 16.0 with a diff of abs(level - rank)=12.0
柄: level 42.0 translated to 42.0 and corresponds to rank 23.0 with a diff of abs(level - rank)=19.0
陛: level 49.0 translated to 49.0 and corresponds to rank 55.0 with a diff of abs(level - rank)=6.0
閉: level 33.0 translated to 33.0 and corresponds to rank 27.0 with a diff of abs(level - rank)=6.0
塀: level 47.0 translated to 47.0 and corresponds to rank 45.0 with a diff of abs(level - rank)=2.0
幣: level 47.0 t

凡: level 56.0 translated to 56.0 and corresponds to rank 25.0 with a diff of abs(level - rank)=31.0
盆: level 46.0 translated to 46.0 and corresponds to rank 44.0 with a diff of abs(level - rank)=2.0
麻: level 48.0 translated to 48.0 and corresponds to rank 34.0 with a diff of abs(level - rank)=14.0
摩: level 43.0 translated to 43.0 and corresponds to rank 32.0 with a diff of abs(level - rank)=11.0
磨: level 45.0 translated to 45.0 and corresponds to rank 40.0 with a diff of abs(level - rank)=5.0
魔: level 46.0 translated to 46.0 and corresponds to rank 22.0 with a diff of abs(level - rank)=24.0
毎: level 5.0 translated to 5.0 and corresponds to rank 16.0 with a diff of abs(level - rank)=11.0
妹: level 6.0 translated to 6.0 and corresponds to rank 24.0 with a diff of abs(level - rank)=18.0
枚: level 18.0 translated to 18.0 and corresponds to rank 22.0 with a diff of abs(level - rank)=4.0
埋: level 39.0 translated to 39.0 and corresponds to rank 26.0 with a diff of abs(level - rank)=13.0
幕: leve

誉: level 40.0 translated to 40.0 and corresponds to rank 41.0 with a diff of abs(level - rank)=1.0
預: level 30.0 translated to 30.0 and corresponds to rank 42.0 with a diff of abs(level - rank)=12.0
幼: level 28.0 translated to 28.0 and corresponds to rank 33.0 with a diff of abs(level - rank)=5.0
用: level 3.0 translated to 3.0 and corresponds to rank 4.0 with a diff of abs(level - rank)=1.0
羊: level 6.0 translated to 6.0 and corresponds to rank 43.0 with a diff of abs(level - rank)=37.0
洋: level 11.0 translated to 11.0 and corresponds to rank 14.0 with a diff of abs(level - rank)=3.0
要: level 9.0 translated to 9.0 and corresponds to rank 7.0 with a diff of abs(level - rank)=2.0
容: level 19.0 translated to 19.0 and corresponds to rank 12.0 with a diff of abs(level - rank)=7.0
庸: level 56.0 translated to 56.0 and corresponds to rank 49.0 with a diff of abs(level - rank)=7.0
揚: level 42.0 translated to 42.0 and corresponds to rank 31.0 with a diff of abs(level - rank)=11.0
揺: level 42.0 t

脇: level 48.0 translated to 48.0 and corresponds to rank 42.0 with a diff of abs(level - rank)=6.0
惑: level 27.0 translated to 27.0 and corresponds to rank 26.0 with a diff of abs(level - rank)=1.0
枠: level 39.0 translated to 39.0 and corresponds to rank 57.0 with a diff of abs(level - rank)=18.0
湾: level 37.0 translated to 37.0 and corresponds to rank 52.0 with a diff of abs(level - rank)=15.0
腕: level 24.0 translated to 24.0 and corresponds to rank 20.0 with a diff of abs(level - rank)=4.0
Aozora vs wanikani_level average level difference=9.523520485584218
Created bin column: 0       24.0
1       49.0
2       54.0
3        8.0
4       29.0
5       19.0
6       42.0
7       24.0
8       24.0
9       54.0
10      38.0
11       9.0
12      18.0
13      30.0
14       4.0
15      29.0
16       4.0
17      21.0
18      15.0
19       3.0
20      16.0
21      36.0
22      21.0
23      60.0
24      54.0
25      40.0
26      15.0
27      10.0
28      60.0
29      48.0
        ... 
2106    50.0

In [None]:
#How strongly does JLPT level corrolate with each source?
#Low N# means higher level. Scale of N5 to N1. 
JLPT_levels = 5
levelValues = {"N"+str(i): (JLPT_levels+1)-i for i in range(1, JLPT_levels+1)}
levelValues["none"] = 0 #'none' is support for the queries at the end of this file
print(levelValues)
def translateJLPT(levelStr):
    try:
        return levelValues[levelStr]
    except:
        return float('nan')
print(translateJLPT("N1"))
JLPT_results = getAvgLevelDiff("jlpt", translateJLPT, JLPT_levels)

In [None]:
#How strongly does grade level corrolate with each source?
print(df["grade"].unique())
gradeLevels = {
              'none': 0, #'none' is support for the queries at the end of this file
              'grade 1': 1,
              'grade 2': 2,
              'grade 3': 3,
              'grade 4': 4,
              'grade 5': 5,
              'grade 6': 6,
              'junior high': 7,
              }
def translateGradeLevel(levelStr):
    try:
        return gradeLevels[levelStr]
    except:
        return float('nan')

grade_levels = 7
grade_results = getAvgLevelDiff("grade", translateGradeLevel, grade_levels)

In [None]:
#How strongly does Jisho frequency level corrolate with each source?
print(max(df["frequency"].unique())) #Ranges from 1 to 2495. So it's including more entries than joyo.
jisho_levels = 2495
#See comment on Joyo below. Bugged, and not really meaningful anyway. 
#jisho_results = getAvgLevelDiff("frequency", intIsJustItself, jisho_levels)

In [None]:
#How strongly does Joyo rank corrolate with each source?
#TODO should really just rename that column. 
print(max(df["Unnamed: 0"].unique())) #Ranges from 0 to 2135.
joyo_levels = 2136 
#We need level numbers to be 1..joyo_levels, inclusive
def translateJoyo(x):
    try:
        return x+1
    except:
        return float('nan')
#Those bins look a little weird? Why's it using e^0 to e^3? 
#    Each bin should correspond to a level number. We have more levels than bins though so something's wrong here. 
#    The bin function the way I'm doing it must not support having more bins than elements to qcut. 
#But really, these results aren't useful anyway since you don't learn by Joyo ranking. Same with Jisho. 
#joyo_results = getAvgLevelDiff("Unnamed: 0", translateJoyo, joyo_levels)

In [None]:
#How strongly does Genki frequency level corrolate with each source?

#max isn't working on this? NaN throwing it off ?
possibleGenkiValues = df["Genki_Lesson"].unique()
possibleGenkiValues.sort()
print(possibleGenkiValues)

genki_levels = 0
for value in df["Genki_Lesson"].unique():
    if(not isnan(value)):
        genki_levels += 1
print(genki_levels, "possible valid values.")

#It looks like there is no lesson before lesson 3. A consequence of our source?
#http://genki.japantimes.co.jp/self/genki-kanji-list-linked-to-wwkanji
#We need level numbers to be 1..genki_levels, inclusive
def translateGenki(x):
    try:
        return x-2
    except:
        return float('nan')
#Test that we get 1..genki_levels
print("1 =?=", translateGenki(3))
print(genki_levels, "=?=", translateGenki(23))
genki_results = getAvgLevelDiff("Genki_Lesson", translateGenki, genki_levels)

In [None]:
testq = [1,4,2,3]
pd.qcut(testq, 4, labels=False) + 1

In [None]:
#Test that the output of twitter freq vs twitter freq is 0
test_levels = 4 #Four is arbitrary. This is a made up learning sequence that corresponds to frequency perfectly.
testColName = "Twitter Test"
#I need to qcut the rank col, with lower ranks being in lower bin numbers
rankColName = "Rank of Appearances on Twitter"
rankCol = df[rankColName]
print(rankCol.head())
testCol = pd.qcut(rankCol, test_levels, labels=False)
print("Bins:", pd.qcut(rankCol, test_levels).head())
testCol = testCol+1
df[testColName] = testCol
#I think something's wrong with my test binning. First 3 are level 3? They should be common.
print(df[testColName].head())
#Just a number ?
def translateTest(x):
    try:
        return x
    except:
        return float('nan')
sampleTestRow = df.loc[df["kanji"] == "亜"]
print("Sample has value", sampleTestRow[testColName], "and rank", sampleTestRow[rankColName])
#ctrl f and you can find "Twitter vs Twitter Test average level difference=0.0"
test_results = getAvgLevelDiff(testColName, translateTest, test_levels)

In [None]:
#Now compare them. If I want to read Twitter, what's the best option to learn? Wikipedia? Etc.
#Note that there is some inherent rounding with 5 levels (JLPT) versus 60 (WaniKani).
#  I quantify the results as an average percentage. 
#  20% inaccuracy would be 1 level off for JLPT, or 12 for WaniKani. 
results = [(wani_results, wani_levels, "WaniKani"), 
           (JLPT_results, JLPT_levels, "JLPT"), 
           (grade_results, grade_levels, "Grade"), 
           #(jisho_results, jisho_levels, "Jisho"), #This is frequency, not really a sequence you'd learn. 
           #(joyo_results, joyo_levels, "Jōyō"),  #This is a ranking system, not very relevant. 
           (genki_results, genki_levels, "Genki")]
#String for datasource name, string for best sequence, float for % match. 
#Data in this is modified in the loop below. 
best_sequence_for_sources = {datasources[i]: ("Name of best sequence goes here", 1.0) 
                             for i in range(len(datasources))}

percent_results_of_each_source = [[] for _ in range(len(datasources))]
for (result, level, name) in results:
    i = 0
    for datasource in datasources:
        correlationWithThisSource = result[i]
        correlationWithThisSource = correlationWithThisSource / level
        percent_results_of_each_source[i].append(correlationWithThisSource)
        print(datasource, name, correlationWithThisSource)
        if(best_sequence_for_sources[datasource][1] > correlationWithThisSource):
            new_tup = (name, correlationWithThisSource)
            best_sequence_for_sources[datasource] = new_tup
        i = i + 1

#TODO refactor: I really should just use a dataframe from the start
#We could maybe just say each has some +- inaccuracy based on number of levels as well. 
#We also could maybe say something about coverage, particularly for Genki
result_df = pd.DataFrame(percent_results_of_each_source)
result_df.columns = [name for (_, _, name) in results]
result_df.index = [source for source in datasources]
print(result_df)

print("")
print("Results: ")
for result in best_sequence_for_sources:
    print("Best for learning to read", result, ": ",
          best_sequence_for_sources[result][0], "with"+" {:.2f}".format(best_sequence_for_sources[result][1]*100),
          "% inaccuracy compared to actual usage on", result)

It looks like we have a reasonable amount of variation. These learning sequences weren't chosen randomly, but can't perfectly model use while teaching in a way that makes sense. 

Notes about shortcomings of this comparison:

We only have the WaniKani/JLPT/Genki/Grade data for the Jōyō set, results might vary with more than 2136 kanji. 

Coverage is taken into consideration in a somewhat strange way due to how I binned frequency ranks. By saying the levels and frequencies should correspond, I am assuming two things: 

* The learning sequence is trying to teach the whole Joyo set. If the sequence's last levels correspond to earlier bin levels, the algorithm penalises it because it sees them as having poor correlation. This is part of why Genki scores so poorly. 
    
* The learning sequence distributes kanjis roughly evenly between levels. This is part of why WaniKani scores so well

In any case, this comparison should at least tell you which source a sequence best matches (for example, WaniKani teaches Twitter better than Wikipedia, slightly). 

In [None]:
#https://stackoverflow.com/a/43348337
import matplotlib.ticker as ticker
#Currently this xaxis function is unused, doesn't work for scatter plot or something. 
#Turn the x axis into names of sources
def formatterX(x, pos):
    #The name of the data source
    return results[x][2]
#Translate 0.0 to 1.0 to 0.0 to 100.0
#Long floating points is an issue. 
#Maybe force the y axis to use nice round numbers? 0, 10, 20...
#    Maybe we won't end up using this type of chart though. 
def formatterY(y, pos):
    return y*100

fig = plt.figure(figsize=(13,10))

dataX = [i for i in range(0, len(results))]
dataY = []
for source in datasources:
    dataYPiece = [val for val in result_df.loc[source]]
    dataY.append(dataYPiece)
sequence_names = [val[2] for val in results]
#Only need colors if I show them all on the same vertical line. 
#colors = ["black", "blue", "red", "green"]
#correspondingColors = [colors[0] for i in range(1, 4*4+1)]
for ax_index in range(0, len(datasources)):
    ax = fig.add_subplot(2, 2, ax_index+1)
    ax.scatter(dataX, dataY[ax_index])
    props = {    
        #Inaccuracy is a strange word to use here. If there were 4 levels and it was 
        #    all off by 1 it'd be 25% "inaccuracy," right (should probably develop better tests, hard to 
        #    verify these large calculations)? So more, the average difference in level.
        'title': 'Inaccuracy of learning '+datasources[ax_index]+' using each sequence',
        'ylabel': 'Percent of inaccuracy'
    }
    ax.set(**props)
    plt.xticks(range(len(results)), sequence_names, size='medium')
#plt.gca().xaxis.set_major_formatter(ticker.FuncFormatter(formatterX))#Doesn't work for scatter plot?
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(formatterY))

#fig.subplots_adjust(wspace=0, hspace=0)
#Prevent overlap
fig.tight_layout()

None

In [None]:
#https://stackoverflow.com/a/43348337
import matplotlib.ticker as ticker
#Turn the x axis into names of sources
def formatterX(x, pos):
    if(x >= 1 and x <= 4):
        return datasources[x-1]
    #This shouldn't occur. Maybe print a warning but it'd be pretty noticeable. 
    return x
#Translate 0.0 to 1.0 to 0.0 to 100.0
#Long floating points is an issue. 
#Maybe force the y axis to use nice round numbers? 0, 10, 20...
#    Maybe we won't end up using this type of chart though. 
def formatterY(y, pos):
    return y*100

fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
ax1.boxplot(percent_results_of_each_source)
props = {    
    #Inaccuracy is a strange word to use here. If there were 4 levels and it was 
    #    all off by 1 it'd be 25% "inaccuracy," right (should probably develop better tests, hard to 
    #    verify these large calculations)? So more, the average difference in level.
    'title': 'Inaccuracy of learning using the sequences, for each source',
    'xlabel': 'Sources',
    'ylabel': 'Percent of inaccuracy'
}
ax1.set(**props)
plt.gca().xaxis.set_major_formatter(ticker.FuncFormatter(formatterX))
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(formatterY))
print("For each source, we have considered the learning sequences from: ")
for result in results:
    print(result[2])
print("An inaccuracy of 0 would mean that the order it's taught perfectly corresponds to the frequency of usage")
#TODO can I draw additional conclusions from this? Maybe certain sources have more variety, etc. 
#    But I can get that from the frequency numbers and I'm not sure how useful that'd be to know. 

In [None]:
#TODO remove this, I have this in a dataframe now. 
#Print the values for each source
i=0
for source in datasources:
    print(source, percent_results_of_each_source[i])
    i += 1

In [None]:
#We could also store this in a csv at this point. 

Also lets do the queries Oleksandra wanted:
    
Given a user has completed a level of WaniKani, Grade level, Genki, or JLPT, what are the most frequent kanji in the next level of that system? For frequency we're just using Jisho's frequency numbers. 

In [None]:
#Sequence is what you're learning, like jlpt or Genki_Lessons or wanikani_Level
#Translator translates that sequence's levels like "N1" into number ranks.
#numberToGet is how many kanji from the next level to return
def getNextInSquence(sequence, level, translator, numberToGet):
    col = zip(df["kanji"], df[sequence], df["frequency"])
    
    colAboveLevel = []
    previouslyCompletedLevel = translator(level)
    #The column is now created by index, and the index is from Joyo.
    for (kanji, levelInSequence, freq) in col:
        numericLevelInSequence = translator(levelInSequence)
        #If you wanted to retrieve data from the current as well as any higher level, just make this >=
        if(numericLevelInSequence > previouslyCompletedLevel):
            colAboveLevel.append((kanji, levelInSequence, numericLevelInSequence, freq))
    #Now we sort by the numeric level (Can't sort by string level, N1 is harder than N2), 
    #    so that we give them the next hardest kanji instead of any harder kanji. 
    colAboveLevel.sort(key=lambda tup: tup[2])
    
    if(len(colAboveLevel) == 0 ):
        return []
    
    for index in range(len(colAboveLevel)):
        #min(len(colAboveLevel), bounds[nextBound]), 
        (_,_, numericLevelInSequenceCurrent, freqCurrent) = colAboveLevel[index]
        #Frequency is a rank so lower is more frequent
        indexLeft = index
        #Debug, see how many swaps this kanji has done
        num = 0
        while indexLeft > 0:
            indexLeft -= 1
            (_,_, numericLevelInSequenceLeft, freqLeft) = colAboveLevel[indexLeft]
            #We're sorted by level so if our levels are unequal then we are done (only sorting within levels)
            #I consider NaN frequencies to be high frequency rank (infrequent), 
            #    since it'd make sense for a more obscure kanji to not have a frequency rating
            if((freqLeft > freqCurrent or isnan(freqLeft)) and numericLevelInSequenceCurrent == numericLevelInSequenceLeft):
                #Swap
                #print("Swapping", colAboveLevel[indexLeft], colAboveLevel[index], "b/c", freqLeft, ">",freqCurrent,"num=",num)
                num += 1
                tmp = colAboveLevel[index]
                colAboveLevel[index] = colAboveLevel[indexLeft]
                colAboveLevel[indexLeft] = tmp
                index -= 1
                #print(colAboveLevel[:5])
            else:
                #print("not swapping ",colAboveLevel[indexLeft], colAboveLevel[index])
                break
    #print("returning ",numberToGet,"of", len(colAboveLevel))
    return colAboveLevel[:numberToGet]

In [None]:
def getMoreWani(level, numberToGet):
    #Actual col name in the dataframe
    sequence = "wanikani_level"
    #Values in the dataframe may not be ints, so the translator is required. 
    #But for WaniKani, it is just ints
    translator = intIsJustItself
    return getNextInSquence(sequence, level, translator, numberToGet)
def getMoreGrade(level, numberToGet):
    #Actual col name in the dataframe
    sequence = "grade"
    #Values in the dataframe may not be ints, so the translator is required. 
    translator = translateGradeLevel
    return getNextInSquence(sequence, level, translator, numberToGet)
def getMoreJLPT(level, numberToGet):
    #Actual col name in the dataframe
    sequence = "jlpt"
    #Values in the dataframe may not be ints, so the translator is required. 
    translator = translateJLPT
    return getNextInSquence(sequence, level, translator, numberToGet)
def getMoreGenki(level, numberToGet):
    #Actual col name in the dataframe
    sequence = "Genki_Lesson"
    #Values in the dataframe may not be ints, so the translator is required. 
    translator = translateGenki
    return getNextInSquence(sequence, level, translator, numberToGet)

#Takes the name of the sequence you're learning, the level you last completed, and how many kanji you want to get.
#Returns a tuple: (kanji, level string, level number)
def getMore(sequenceName, lastCompletedLevel, numberToGet):
    if(numberToGet <= 0):
        return []
    sequenceName = sequenceName.lower()
    if(sequenceName == "wanikani"):
        return getMoreWani(lastCompletedLevel, numberToGet)
    if(sequenceName == "grade" or sequenceName == "grade level"):
        return getMoreGrade(lastCompletedLevel, numberToGet)
    if(sequenceName == "jlpt"):
        return getMoreJLPT(lastCompletedLevel, numberToGet)
    if(sequenceName == "genki"):
        return getMoreGenki(lastCompletedLevel, numberToGet)
    raise ValueError('No sequence found with name '+sequenceName)

For all of these queries you are saying the level you last completed, and getting stuff from the next level. 

The results are currently unsorted apart from level (if the next level contains 40 kanjis and you request 50, you'll get 40 from the next level and 10 unsorted from the level above that).

To retrieve kanji starting at the first level, for the previously completed level pass in 0 for numeric levels, or the string "none" for string-based levels.

Giving the final level or invalid input, the result should be empty. 

In [None]:
mostRecentlyCompletedLevel = 50
numberToGet = 5
print(getMore("WaniKani", mostRecentlyCompletedLevel, numberToGet))
#Should give results from the first level since this is from before the first level
print(getMore("WaniKani", 0, numberToGet))
#Should give nothing
print(getMore("WaniKani", 60, numberToGet))

In [None]:
mostRecentlyCompletedLevel = "grade 6"
print(getMore("grade", mostRecentlyCompletedLevel, numberToGet))
#Additional tests.
print(getMore("grade", "none", numberToGet))
print(getMore("grade", "junior high", numberToGet))

In [None]:
mostRecentlyCompletedLevel = "N3"
print(getMore("jlpt", mostRecentlyCompletedLevel, numberToGet))
#Additional tests.
print(getMore("jlpt", "none", numberToGet))
print(getMore("jlpt", "N1", numberToGet))

In [None]:
mostRecentlyCompletedLevel = 10
print(getMore("genki", mostRecentlyCompletedLevel, numberToGet))
#Additional tests.
#The level information we have starts at 3, so 2 or lower should be invalid and give the fist stuff (3).
print(getMore("genki", 1, numberToGet))
print(getMore("genki", 24, numberToGet))

In [None]:
#Tests for invalid input

#Test with a nonexistant sequence, raises exception
#print(getMore("badvalue", 1, numberToGet))
#Test with a nonexistant numeric level. Just gives first level if too small, empty [] if too large
#print(getMore("genki", -1, numberToGet))
#Test with a nonexistant string level. Just gives []
#print(getMore("jlpt", "level name", numberToGet))
#Test with invalid numberToGet, gives []
#print(getMore("genki", 1, -5))

In [None]:
#This question also was originally written to ask for vocab. 
#Questions 1 & 2 contain information about readings and words with a given kanji, 
#    so whatever information is desired can be retrieved from there after using these queries.  