In [1]:
import sys

from loguru import logger
from transformers import AutoModelForCausalLM, AutoTokenizer

from anki_ai.domain.model import Deck

In [2]:
logger.remove()
logger.add(sys.stderr, level="ERROR")

1

In [3]:
deck = Deck("default")

In [4]:
deck.read_txt("../data/Selected Notes.txt")
len(deck)

2241

In [5]:
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [6]:
system_msg = """You are an expert in building Anki cards.

The user will share an existing Anki note with you. Your job is to edit it to be more concise, simple, straightforward, and distinct.

Do not describe the changes you made. Reply in this format:
   
Front: [front section of card 1]
Back: [back section of card 1]

When providing commands to be executed in a terminal, please prefix each command with the $ symbol. This helps distinguish executable commands from output or other text. For example:
$ ls -l
$ cd /path/to/directory
$ python script.py"

Note that the $ symbol represents the command prompt and should not be typed when actually executing the commands.
"""

for i in range(10):
    user_msg = f"""Front: {deck[i].front}
    Back: {deck[i].back}
    """

    messages = [
        {"role": "system", "content": system_msg},
        {"role": "user", "content": user_msg},
    ]

    tokenized_chat = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    )
    new_tokens = model.generate(tokenized_chat, max_new_tokens=150)
    gen_text = tokenizer.batch_decode(
        new_tokens[:, tokenized_chat.shape[1] :], skip_special_tokens=True
    )[0]

    print("#######################")
    print(f"Front: {deck[i].front}\nBack: {deck[i].back}")
    print(gen_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


#######################
Front: What command does create a soft link?
Back: ```bash<br>$ ln -s &lt;file_name&gt; &lt;link_name&gt;<br>```
Front: Create a soft link
Back: $ ln -s <file_name> <link_name>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


#######################
Front: In the `ln -s` command, what is the order of file name and link name?
Back: ```bash<br>$ ln -s &lt;file_name&gt; &lt;link_name&gt;<br>```
Front: What is the order of file name and link name in the `ln -s` command?
Back: File name, then link name.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


#######################
Front: What command does extract files from a zip archive?
Back: ```bash<br>$ unzip &lt;file&gt;<br>```
Front: Unzip file
Back: $ unzip <file>

Front: What is the command to view the content of a text file?
Back: $ cat <file>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


#######################
Front: What is the command to list the content of a directory?
Back: ```bash<br>$ ls &lt;path&gt;<br>```
Front: List directory content
Back: $ ls <path>


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


#######################
Front: What is the command to print text to the terminal window?
Back: ```bash<br>$ echo ...<br>```
Front: How do I print text to the terminal?
Back: $ echo


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


#######################
Front: What is the command to create a new file?
Back: ```bash<br>$ touch ...<br>```
Front: What is the command to create a new file?
Back: $ touch 

Front: What does the touch command do?
Back: Creates a new file or updates the timestamp of an existing file.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


#######################
Front: What is the command to create a new directory?
Back: ```bash<br>mkdir ...<br>```
Front: Create a new directory
Back: $ mkdir


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


#######################
Front: What is the command to search text for patterns?
Back: ```bash<br>$ grep ...<br>```
Front: Search text for patterns
Back: $ grep


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


#######################
Front: What is the command to print manual or get help for a command?
Back: ```bash<br>$ man ...<br>```
Front: Man page command
Back: $ man [command name]

Front: What is the purpose of the man command?
Back: To view the manual page of a command
#######################
Front: What is the command to print the current working directory?
Back: ```bash<br>$ pwd&nbsp;<br>```
Front: What command prints the current working directory?
Back: pwd
