In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
from typing import List, Optional

from loguru import logger

from anki_ai.domain.model import Deck, Note

In [3]:
logger.remove()
logger.add(sys.stderr, level="ERROR")

1

When exporting notes from Anki, select the following options:
- Notes in plain text (.txt)
- Include HTML and media references: these are important for the indentation inside a code blocks and to display images
- Include tags
- Include deck name
- Include notetype name
- Include unique identifier

This is important so that we can later overwrite the notes in our deck with the changes we made without losing all of the statistics. 

In [4]:
deck = Deck()
deck.read_txt("../data/Selected Notes v8.txt", exclude_tags=["personal"])
print(f"The deck contains {len(deck)} notes")

The deck contains 2854 notes


To ease reading, we can create a small class to strip out HTML tags. This will be handy as most of our notes contain HTML tags since we are using the [Markdown and KaTeX Support (Rework) - Code Highlighting, Math, ... add-on](https://ankiweb.net/shared/info/1786114227). Here is an example:

In [5]:
deck[10].back

'```bash<br>mkdir ...<br>```'

In [6]:
from html.parser import HTMLParser
from io import StringIO


class MLStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.strict = False
        self.convert_charrefs = True
        self.text = StringIO()

    def handle_data(self, d):
        self.text.write(d)

    def get_data(self):
        return self.text.getvalue()


def replace_br_with_newline(html_string):
    import re

    return re.sub(r"<br\s*/?>", "\n", html_string)


def strip_tags(html):
    s = MLStripper()
    s.feed(replace_br_with_newline(html))
    return s.get_data()

In [7]:
back_card = deck[10].back
print(f"Original note:\n{back_card}\n")
print(f"Fixed:\n{strip_tags(back_card)}")

Original note:
```bash<br>mkdir ...<br>```

Fixed:
```bash
mkdir ...
```


### vLLM

Please run the following command on the terminal to spin up a vLLM server, which we will query in the rest of the notebook.


```bash
vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max_model_len 4096 --chat-template ./template_llama31.jinja
```

We might have to wait about 1 minute before the server is up and reachable.

In [8]:
from anki_ai.service_layer.services import get_chat_completion

In [9]:
chat = get_chat_completion()

In [10]:
def vllm_generate(
    prompt: str,
    n_notes: int = 10,
    json_mode: bool = False,
    tags: Optional[List[str]] = None,
):
    notes = []

    if tags is not None:
        count = 0
        for note in deck:
            if count >= n_notes:
                continue
            else:
                if tags[0] in note.tags:
                    notes.append(note)
                    count += 1
    else:
        notes = deck[:n_notes]

    for note in notes:
        user_msg = f"""Front: {strip_tags(note.front)}
        Back: {strip_tags(note.back)}
        """

        messages = [
            {"role": "system", "content": system_msg},
            {"role": "user", "content": user_msg},
        ]

        if json_mode:
            extra_body = {
                "guided_json": Note.schema(),
                "guided_whitespace_pattern": r"[\n\t ]*",
            }
        else:
            extra_body = {}

        chat_response = chat.create(
            model="meta-llama/Meta-Llama-3.1-8B-Instruct",
            messages=messages,
            temperature=0,
            extra_body=extra_body,
        )

        print("#######################")
        print(f"Front: {strip_tags(note.front)}\nBack: {strip_tags(note.back)}")
        if json_mode:
            json_data = chat_response.choices[0].message.content
            new_note = Note.model_validate_json(json_data)
            print(f"Front: {strip_tags(new_note.front)}\nBack: {strip_tags(note.back)}")
        else:
            print(chat_response.choices[0].message.content)

### Few-shot learning

In [11]:
system_msg = """
Optimize this Anki note:
- Concise, simple, distinct
- Follow format rules

Reply in this format:
Front: [edited front]
Back: [edited back]

Terminal commands:
```bash
$ command <placeholder>
```

Code:
```language
code here
```

Use the following placeholders only: <file>, <path>, <link>, <command>.

No explanations.

Example 1:
Front: What command does extract files from a zip archive?
Back: ```bash
$ unzip <file>
```
Front: Extract zip files
Back: ```bash
$ unzip <file>
```

Example 2:
Front: What is the command to print manual or get help for a command?
Back: ```bash
$ man ...
```
Front: Get command manual/help
Back: ```bash
$ man <command>
```

Example 3: 
Front: What command does create a soft link?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: Create soft link
Back: ```bash
$ ln -s <file> <link>
```

Example 4:
Front: In the `ln -s` command, what is the order of file name and link name?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: `ln -s` argument order
Back: <file> then <link>
"""

vllm_generate(system_msg, n_notes=5, json_mode=False)

#######################
Front: ""
Back: Headboard
Front: What is the name of the piece of furniture at the foot of a bed?
Back: Headboard
#######################
Front: ""
Back: Towel
Front: What is the answer to the ultimate question of life, the universe, and everything?
Back: 42, Towel
#######################
Front: ""
Back: Jug
Front: Empty string
Back: A jug
#######################
Front: Command to create a soft link
Back: ```bash
$ ln -s <file> <link>
```
Front: Create a soft link
Back: ```bash
$ ln -s <file> <link>
```
#######################
Front: In the `ln -s` command, what is the order of file name and link name?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: `ln -s` argument order
Back: <file> then <link>


### Guided JSON

In [12]:
vllm_generate(system_msg, n_notes=10, json_mode=True)

#######################
Front: ""
Back: Headboard
Front: Headboard
Back: Headboard
#######################
Front: ""
Back: Towel
Front: What is the answer to the ultimate question of life, the universe, and everything?
Back: Towel
#######################
Front: ""
Back: Jug
Front: Jug
Back: Jug
#######################
Front: Command to create a soft link
Back: ```bash
$ ln -s <file> <link>
```
Front: Create soft link
Back: ```bash
$ ln -s <file> <link>
```
#######################
Front: In the `ln -s` command, what is the order of file name and link name?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: ln -s argument order
Back: ```bash
$ ln -s <file_name> <link_name>
```
#######################
Front: In the `zip` command, what is the option to specify the destination?
Back: "```bash
$ unzip <file> -d <path>
```

"
Front: In the `zip` command, what is the option to specify the destination?
Back: "```bash
$ unzip <file> -d <path>
```

"
#######################
Front: What comm

One issue with the above output is that the LLM sometimes forgets to wrap terminal commands in a code block. We can address that by showing the model some examples of the expected output using the JSON format. This will resemble much closer what the model is expected to generate.

In [13]:
system_msg = """
Optimize this Anki note:
- Concise, simple, distinct
- Follow format rules
- Markdown syntax

Reply in this format:
Front: [edited front]
Back: [edited back]

Terminal commands:
```bash
$ command <placeholder>
```

Code:
```language
code here
```

Use the following placeholders only: <file>, <path>, <link>, <command>.

No explanations.

Return results using this JSON schema:
{
    "title": "Note",
    "type": "object",
    "properties": {
        "front": {"type": "string"},
        "back": {"type": "string"},
    },
    "required": ["front", "back"]
}

Example 1:
Front: What command does extract files from a zip archive?
Back: ```bash
$ unzip <file>
```
{ "front": "Extract zip files", "back": "```bash $ unzip <file>```" }

Example 2:
Front: What is the command to print manual or get help for a command?
Back: ```bash
$ man ...
```
{ "front": "Get command manual/help", "back": "```bash $ man <command>```" }

Example 3: 
Front: What command does create a soft link?
Back: ```bash
$ ln -s <file_name> <link_name>
```
{ "front": "Create soft link", "back": "```bash $ ln -s <file> <link>```" }

Example 4:
Front: In the `ln -s` command, what is the order of file name and link name?
Back: ```bash
$ ln -s <file_name> <link_name>
```
{ "front": "`ln -s` argument order", "back": "Back: <file> then <link>" }

Example 5:
Front: What is the range of the Leaky ReLU function?
Back: $ [ -0.01, + \infty ] $
{ "front": "Leaky ReLU range", "back": "$ [-0.01, +\infty] $" }
"""

vllm_generate(system_msg, n_notes=10, json_mode=True)

#######################
Front: ""
Back: Headboard
Front: Furniture part at the top of a bed
Back: Headboard
#######################
Front: ""
Back: Towel
Front: 
Back: Towel
#######################
Front: ""
Back: Jug
Front: 
Back: Jug
#######################
Front: Command to create a soft link
Back: ```bash
$ ln -s <file> <link>
```
Front: Create soft link
Back: ```bash
$ ln -s <file> <link>
```
#######################
Front: In the `ln -s` command, what is the order of file name and link name?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: ln -s argument order
Back: ```bash
$ ln -s <file_name> <link_name>
```
#######################
Front: In the `zip` command, what is the option to specify the destination?
Back: "```bash
$ unzip <file> -d <path>
```

"
Front: zip destination option
Back: "```bash
$ unzip <file> -d <path>
```

"
#######################
Front: What command extracts files from a zip archive?
Back: ```bash
$ unzip <file>
```
Front: Extract zip files
Back: ```

This looks much better!

### Other tags

Let's check how the model performs with other type of notes.

In [14]:
vllm_generate(system_msg, n_notes=10, json_mode=True, tags=["python"])

#######################
Front: Why can't lists be used as keys in a dictionary?
Back: Lists are not hashable
Front: Why can't lists be used as keys in a dictionary?
Back: Lists are not hashable
#######################
Front: What are the two requirements for a Python object to be hashable?
Back: * Be immutable* Have implemented `__hash__`
Front: Hashable Python object requirements
Back: * Be immutable* Have implemented `__hash__`
#######################
Front: What is the command to run the garbage collector?
Back: ```python
>>> import gc
>>> del session_past_ids, items_first_ts_df
>>> gc.collect()
```
Front: Run garbage collector
Back: ```python
>>> import gc
>>> del session_past_ids, items_first_ts_df
>>> gc.collect()
```
#######################
Front: What does gc.collect() do?
Back: It removes all objects without an active reference
Front: gc.collect()
Back: It removes all objects without an active reference
#######################
Front: What function scans a string for a regex ma

In [15]:
vllm_generate(system_msg, n_notes=10, json_mode=True, tags=["dl"])

#######################
Front: What is the formula to compute the L2-norm?
Back: $ \|\boldsymbol{x}\|_2 = \sqrt{ \sum{x_i^2}} $
Front: L2-norm formula
Back: $ \|\boldsymbol{x}\|_2 = \sqrt{ \sum{x_i^2}} $
#######################
Front: When implementing Label Smoothing, what are we replacing the target 1s with?
Back: Assuming a multi-label classification problem with $ \kappa $ output values, the training target 1 are replaced with  $ 1-\epsilon $
Front: Label Smoothing target replacement
Back: Assuming a multi-label classification problem with $ \kappa $ output values, the training target 1 are replaced with  $ 1-\epsilon $
#######################
Front: Generally speaking, what is the purpose of applying Label Smoothing?
Back: Reduce the network’s over-confidence
Front: Label Smoothing purpose
Back: Reduce the network’s over-confidence
#######################
Front: What is the range of the Leaky ReLU function?
Back: $ [ -0.01, + \infty ] $
Front: Leaky ReLU range
Back: $ [ -0.01, + \

In [16]:
!pip install termcolor

Defaulting to user installation because normal site-packages is not writeable
[0m

In [17]:
def improve_note(note):
    system_msg = """
    Optimize this Anki note:
    - Concise, simple, distinct
    - Follow format rules
    - Markdown syntax
    
    Reply in this format:
    Front: [edited front]
    Back: [edited back]
    
    Terminal commands:
    ```bash
    $ command <placeholder>
    ```
    
    Code:
    ```language
    code here
    ```
    
    Use the following placeholders only: <file>, <path>, <link>, <command>.
    
    No explanations.
    
    Return results using this JSON schema:
    {
        "title": "Note",
        "type": "object",
        "properties": {
            "front": {"type": "string"},
            "back": {"type": "string"},
        },
        "required": ["front", "back"]
    }
    
    Example 1:
    Front: What command does extract files from a zip archive?
    Back: ```bash
    $ unzip <file>
    ```
    { "front": "Extract zip files", "back": "```bash $ unzip <file>```" }
    
    Example 2:
    Front: What is the command to print manual or get help for a command?
    Back: ```bash
    $ man ...
    ```
    { "front": "Get command manual/help", "back": "```bash $ man <command>```" }
    
    Example 3: 
    Front: What command does create a soft link?
    Back: ```bashq
    $ ln -s <file_name> <link_name>
    ```
    { "front": "Create soft link", "back": "```bash $ ln -s <file> <link>```" }
    
    Example 4:
    Front: In the `ln -s` command, what is the order of file name and link name?
    Back: ```bash
    $ ln -s <file_name> <link_name>
    ```
    { "front": "`ln -s` argument order", "back": "Back: <file> then <link>" }
    
    Example 5:
    Front: What is the range of the Leaky ReLU function?
    Back: $ [ -0.01, + \infty ] $
    { "front": "Leaky ReLU range", "back": "$ [-0.01, +\infty] $" }
    """
    user_msg = f"""Front: {strip_tags(note.front)}
    Back: {strip_tags(note.back)}
    """

    messages = [
        {"role": "system", "content": system_msg},
        {"role": "user", "content": user_msg},
    ]

    chat_response = chat.create(
        model="meta-llama/Meta-Llama-3.1-8B-Instruct",
        messages=messages,
        temperature=0,
        extra_body={
            "guided_json": Note.schema(),
            "guided_whitespace_pattern": r"[\n\t ]*",
        },
    )

    json_data = chat_response.choices[0].message.content
    return Note.model_validate_json(json_data)

In [18]:
import difflib

from termcolor import colored


def github_style_diff(text1, text2):
    """
    Generate a GitHub-style colored diff between two strings.

    Args:
    text1 (str): The first string to compare.
    text2 (str): The second string to compare.

    Returns:
    str: A string containing the GitHub-style colored diff.
    """

    def color_text(text, bg_color):
        return colored(text, "black", bg_color, attrs=["bold"])

    sm = difflib.SequenceMatcher(None, text1, text2)
    line1 = []
    line2 = []

    for op, i1, i2, j1, j2 in sm.get_opcodes():
        if op == "equal":
            line1.append(color_text(text1[i1:i2], "on_red"))
            line2.append(color_text(text2[j1:j2], "on_green"))
        elif op == "delete":
            line1.append(color_text(text1[i1:i2], "on_light_red"))
        elif op == "insert":
            line2.append(color_text(text2[j1:j2], "on_light_green"))
        elif op == "replace":
            line1.append(color_text(text1[i1:i2], "on_light_red"))
            line2.append(color_text(text2[j1:j2], "on_light_green"))

    print(f"- {''.join(line1)}\n+ {''.join(line2)}")

In [19]:
for note in deck[:10]:
    other = improve_note(note)
    github_style_diff(strip_tags(note.front), replace_br_with_newline(other.front))
    github_style_diff(strip_tags(note.back), replace_br_with_newline(other.back))
    print("\n")

- [1m[101m[30m""[0m
+ [1m[102m[30mHeadboard[0m
- [1m[41m[30mHeadboard[0m
+ [1m[42m[30mHeadboard[0m


- [1m[101m[30m""[0m
+ [1m[102m[30mTowel[0m
- [1m[101m[30mTowel[0m
+ 


- [1m[101m[30m""[0m
+ [1m[102m[30mWhat is the name of a container made of leather?[0m
- [1m[41m[30mJug[0m
+ [1m[42m[30mJug[0m


- [1m[41m[30mC[0m[1m[101m[30mommand to c[0m[1m[41m[30mreate[0m[1m[101m[30m a[0m[1m[41m[30m soft link[0m
+ [1m[42m[30mC[0m[1m[42m[30mreate[0m[1m[42m[30m soft link[0m
- [1m[41m[30m```bash[0m[1m[101m[30m
[0m[1m[41m[30m$ ln -s <file> <link>[0m[1m[101m[30m
[0m[1m[41m[30m```[0m
+ [1m[42m[30m```bash[0m[1m[102m[30m [0m[1m[42m[30m$ ln -s <file> <link>[0m[1m[42m[30m```[0m


- [1m[101m[30mIn the `[0m[1m[41m[30mln -s[0m[1m[101m[30m`[0m[1m[41m[30m [0m[1m[101m[30mco[0m[1m[41m[30mm[0m[1m[101m[30mma[0m[1m[41m[30mn[0m[1m[101m[30md, wha[0m[1m[41m[30mt[0m[1m[

We lose the indentation information when generating a new note from JSON. 

### Using our library

In [20]:
from anki_ai.service_layer.services import format_note

for i in range(10):
    new_note = format_note(deck[i], chat)
    print(f"Front: {new_note.front}\nBack: {replace_br_with_newline(new_note.back)}\n")

Front: Headboard
Back: A piece of furniture placed at the top of a bed.

Front: Towel
Back: Towel

Front: 
Back: Jug

Front: Create soft link
Back: ```bash
$ ln -s <file> <link>
```

Front: ln -s argument order
Back: ```bash
$ ln -s <file> <link>
```

Front: Zip destination option
Back: ```bash
$ unzip <file> -d <path>
```

Front: Extract zip files
Back: ```bash
$ unzip <file>
```

Front: List directory content
Back: ```bash
$ ls <path>
```

Front: Print text to terminal
Back: ```bash
$ echo <text>
```

Front: Create new file
Back: ```bash
$ touch <file>
```

