Test referential critic and components

In [1]:
import os
import json
import numpy as np
os.chdir("D:\日本語の心配genAI")

import tiktoken
from shinpai_genai.critic.referential_critic import get_lesson_selector_chain, VarietyLessonSelectorChain
from shinpai_genai.critic.referential_critic import lesson_docs_from_ids
from shinpai_genai.critic.referential_critic import ReferentialCriticChain
from langchain.callbacks.manager import RunManager
from langchain.callbacks.base import BaseCallbackHandler

from typing import Any, Dict, List, Optional

In [2]:
def num_tokens_from_string(string: str, encoding_name: str = "cl100k_base") -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

In [3]:
# how long is the n4 lesson list
with open("./data/grammar_documents/n4_grammar/index_short.json", "r", encoding='utf-8') as f:
    # NOTE: read as string
    LESSON_LIST = f.read().encode('utf-8').decode('unicode_escape')

In [4]:
# wow that's longer than I expected..
    # maybe I'll just use 16k context
num_tokens_from_string(LESSON_LIST)

3988

In [5]:
print(LESSON_LIST)

[
    {
        "id": 0,
        "title": "あまり～ない (amari~nai)",
        "short_description": "not very, not much"
    },
    {
        "id": 1,
        "title": "あとで (ato de)",
        "short_description": "after, later"
    },
    {
        "id": 2,
        "title": "ば (ba)",
        "short_description": "if… then"
    },
    {
        "id": 3,
        "title": "場合は (baai wa)",
        "short_description": "in the event of"
    },
    {
        "id": 4,
        "title": "ばかり (bakari)",
        "short_description": "only, nothing but"
    },
    {
        "id": 5,
        "title": "だけで (dake de)",
        "short_description": "just by"
    },
    {
        "id": 6,
        "title": "だす (dasu)",
        "short_description": "to suddenly begin, to suddenly appear"
    },
    {
        "id": 7,
        "title": "でも (demo)",
        "short_description": "or something"
    },
    {
        "id": 8,
        "title": "でございます (de gozaimasu)",
        "short_description": "to be (honorific)"
  

In [6]:
print(len(json.loads(LESSON_LIST)))

106


In [7]:
# exclude the last practice buddy message
    # this will be handled in get_conversation_string

samp_conversation_str = '''
STUDENT: こんにちは！今から日本語の練習します。
PRACTICE BUDDY: こんにちは！それは素晴らしいですね！日本語の練習にはどのようなことをしたいですか？文法や単語の練習、会話の練習、それとも他の何かですか？
STUDENT: 今日は土曜日だよ。たくさんねました。11時に起きました。
'''

In [8]:
lesson_selector_chain = get_lesson_selector_chain()

In [9]:
# MAYBE:
    # take two here, then add 1 randomly for some variety

# MAYBE:
    # shuffle the lesson_list everytime so that it doesn't bias towards the start?
    # this may be worth making a custom chain for
    # both of the above

# res = lesson_selector_chain.run(
#     {"conversation": samp_conversation_str}
# )
# print(res)

In [10]:
LESSON_LIST_DICTS = json.loads(LESSON_LIST)
print(LESSON_LIST_DICTS)

[{'id': 0, 'title': 'あまり～ない (amari~nai)', 'short_description': 'not very, not much'}, {'id': 1, 'title': 'あとで (ato de)', 'short_description': 'after, later'}, {'id': 2, 'title': 'ば (ba)', 'short_description': 'if… then'}, {'id': 3, 'title': '場合は (baai wa)', 'short_description': 'in the event of'}, {'id': 4, 'title': 'ばかり (bakari)', 'short_description': 'only, nothing but'}, {'id': 5, 'title': 'だけで (dake de)', 'short_description': 'just by'}, {'id': 6, 'title': 'だす (dasu)', 'short_description': 'to suddenly begin, to suddenly appear'}, {'id': 7, 'title': 'でも (demo)', 'short_description': 'or something'}, {'id': 8, 'title': 'でございます (de gozaimasu)', 'short_description': 'to be (honorific)'}, {'id': 9, 'title': 'がる (garu)', 'short_description': 'to show signs of, to feel, to think'}, {'id': 10, 'title': 'がする (ga suru)', 'short_description': 'smell, hear, taste'}, {'id': 11, 'title': 'ごろ (goro)', 'short_description': 'around, about'}, {'id': 12, 'title': 'ございます (gozaimasu)', 'short_descript

In [11]:
# how to turn back to string
LESSON_LIST_STR = json.dumps(LESSON_LIST_DICTS, ensure_ascii=False)
print(LESSON_LIST_STR)

[{"id": 0, "title": "あまり～ない (amari~nai)", "short_description": "not very, not much"}, {"id": 1, "title": "あとで (ato de)", "short_description": "after, later"}, {"id": 2, "title": "ば (ba)", "short_description": "if… then"}, {"id": 3, "title": "場合は (baai wa)", "short_description": "in the event of"}, {"id": 4, "title": "ばかり (bakari)", "short_description": "only, nothing but"}, {"id": 5, "title": "だけで (dake de)", "short_description": "just by"}, {"id": 6, "title": "だす (dasu)", "short_description": "to suddenly begin, to suddenly appear"}, {"id": 7, "title": "でも (demo)", "short_description": "or something"}, {"id": 8, "title": "でございます (de gozaimasu)", "short_description": "to be (honorific)"}, {"id": 9, "title": "がる (garu)", "short_description": "to show signs of, to feel, to think"}, {"id": 10, "title": "がする (ga suru)", "short_description": "smell, hear, taste"}, {"id": 11, "title": "ごろ (goro)", "short_description": "around, about"}, {"id": 12, "title": "ございます (gozaimasu)", "short_descript

In [12]:
my_list = [1,2,3,4,5]

print(np.random.choice(my_list, size = 1, replace = False))

[5]


Test Variety Lesson Selector

In [13]:
lesson_selector_chain = VarietyLessonSelectorChain()

In [14]:
class PrintHandler(BaseCallbackHandler):

    def on_text(self, text: str, **kwargs):

        print(text)

print_handler = PrintHandler()

In [15]:
res = lesson_selector_chain.run(
    {
        "conversation": samp_conversation_str,
        "lesson_list": LESSON_LIST_DICTS,
    },
    callbacks = [print_handler],
)
print(res)

Prompt after formatting:
[32;1m[1;3mSystem: You are an assistant to a student learning Japanese.
A conversation will be provided between the student and the practice buddy.

You will also be provided a list of lessons in a json-like format.
Each lesson will have the keys: "id", "title" and "short_description".

Your task is to select 2 lessons which could be highlighted when providing feedback to the student.
Specify the lessons you selected by providing the ids in a list.

Respond only with the list of ids. Don't add anything else to your response.

Prioritize lessons which the student may have used wrongly.
Otherwise, if the student use correct grammar, 
select lessons which could be used in alternate sentence constructions
or for conveying similar ideas.

Focus on the student's most recent response.
Only refer to his previous messages if absolutely necessary.

LESSON LIST:
[{"id": 9, "title": "がる (garu)", "short_description": "to show signs of, to feel, to think"}, {"id": 16, "tit

Test lesson_docs_from_ids

In [16]:
%time
print(lesson_docs_from_ids([72, 103, 62]))

CPU times: total: 0 ns
Wall time: 0 ns
LESSON 1:
Learn JLPT N4 Grammar: たところ (ta tokoro)

July 18, 2015

  
Learn Japanese N4 Grammar

Click on image to view full size.

Meaning:
 just finished doing; was just doing

Formation:

Verb-casual, past + ところ

Example sentences:

There are 31 example sentences available for this grammar point.

たった今、病院から出てきたところなんです。
We just came from the hospital.
tatta ima, byouin kara dete kita tokoro nan desu.

彼は十分ばかりまえに帰ってきたところです。
He came back not more than ten minutes ago.
kare wa juppun bakari mae ni kaette kita tokoro desu.

どのみち、もうあの男には退屈していたところだしな。
I was growing bored with that man, anyway.
dono michi, mou ano otoko ni wa taikutsu shite ita tokoro da shi na.

ずいぶん長くかかったのね。心配になっていたところよ。
You were gone a long time. I was starting to worry.
zuibun nagaku kakatta no ne. shinpai ni natte ita tokoro yo.

その内容について話し合いたいと思っていたところだ。
I’ve been wanting to discuss it with you.
sono naiyou ni tsuite hanashiaitai to omotte ita tokoro da.

Click here
 to download J

Test Referential Critic

In [17]:
referential_critic = ReferentialCriticChain()

In [18]:
test_conversation = '''
STUDENT: こんにちは！今から日本語の練習します。
PRACTICE BUDDY: こんにちは！それは素晴らしいですね！日本語の練習にはどのようなことをしたいですか？文法や単語の練習、会話の練習、それとも他の何かですか？
STUDENT: 今日は土曜日だよ。たくさんねました。11時に起きました。
'''.strip()

In [20]:
critic_response = referential_critic.run(
    {"conversation": test_conversation},
)
print(critic_response)

The sentence "今日は土曜日だよ。たくさんねました。11時に起きました。" is grammatically correct and understandable. However, it seems like you wanted to express that you slept a lot today and woke up at 11 o'clock. To express this more naturally, you could use the grammar point "ことができる" to say "I was able to sleep a lot today" and "I was able to wake up at 11 o'clock." 

For example:
- 今日はたくさん寝ることができました。11時に起きることができました。

Using "ことができる" helps convey the idea of being able to do something, which fits well with your intention of expressing what you were able to do today. Keep up the good work!
