In [1]:
import sys
from typing import List, Optional

from loguru import logger

from anki_ai.domain.model import Deck, Note

In [2]:
logger.remove()
logger.add(sys.stderr, level="ERROR")

1

When exporting notes from Anki, select the following options:
- Notes in plain text (.txt)
- Include HTML and media references
- Include tags
- Include deck name
- Include notetype name
- Include unique identifier

This is important to later be able to overwrite the notes in our deck, with the changes we made, without losing all of the statistics. 

In [3]:
deck = Deck()
deck.read_txt("../data/Selected Notes.txt")
print(f"The deck contains {len(deck)} notes")

The deck contains 2241 notes


One issue we need to handle is that our notes might contain HTML and media references, which can confuse the LLM when editing the notes. In our deck, these are quite frequent, because we are using the [Markdown and KaTeX Support (Rework) - Code Highlighting, Math, ... add-on](https://ankiweb.net/shared/info/1786114227). Here is an example:

In [4]:
deck[0].back

'```bash<br>$ ln -s &lt;file_name&gt; &lt;link_name&gt;<br>```'

In [5]:
from io import StringIO
from html.parser import HTMLParser

class MLStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.strict = False
        self.convert_charrefs= True
        self.text = StringIO()
    def handle_data(self, d):
        self.text.write(d)
    def get_data(self):
        return self.text.getvalue()

def replace_br_with_newline(html_string):
    import re
    return re.sub(r'<br\s*/?>', '\n', html_string)

def strip_tags(html):
    s = MLStripper()
    s.feed(replace_br_with_newline(html))
    return s.get_data()

In [6]:
back_card = deck[0].back
print(f"Original note:\n{back_card}\n")
print(f"Fixed:\n{strip_tags(back_card)}")

Original note:
```bash<br>$ ln -s &lt;file_name&gt; &lt;link_name&gt;<br>```

Fixed:
```bash
$ ln -s <file_name> <link_name>
```


### vLLM

Please run the following command on the terminal to spin up a vLLM server, which we will query in the rest of the notebook.


```bash
vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max_model_len 4096 --chat-template ./template_llama31.jinja
```

We might have to wait about 1 minute before the server is up and reachable.

In [7]:
from vllm import LLM, SamplingParams
from openai import OpenAI

In [8]:
# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

In [9]:
client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

In [10]:
def vllm_generate(prompt: str, n_notes: int=10, json_mode: bool=False, tags: Optional[List[str]] = None):
    notes = []

    if tags is not None:
        count = 0
        for note in deck:
            if count >= n_notes:
                continue
            else:
                if tags[0] in note.tags:
                    notes.append(note)
                    count += 1
    else:
        notes = deck[:n_notes]
        
    for note in notes:
        user_msg = f"""Front: {strip_tags(note.front)}
        Back: {strip_tags(note.back)}
        """
    
        messages = [
            {"role": "system", "content": system_msg},
            {"role": "user", "content": user_msg},
        ]

        if json_mode:
            extra_body={
                "guided_json": Note.schema(),
                "guided_whitespace_pattern": r"[\n\t ]*",
            }
        else:
            extra_body={}

        chat_response = client.chat.completions.create(
            model="meta-llama/Meta-Llama-3.1-8B-Instruct",
            messages=messages,
            temperature=0,
            extra_body=extra_body,
        )
    
        print("#######################")
        print(f"Front: {strip_tags(note.front)}\nBack: {strip_tags(note.back)}")
        if json_mode:
            json_data = chat_response.choices[0].message.content
            new_note = Note.model_validate_json(json_data)
            print(f"Front: {strip_tags(new_note.front)}\nBack: {strip_tags(note.back)}")
        else:
            print(chat_response.choices[0].message.content)

### Few-shot learning

In [11]:
system_msg = """
Optimize this Anki note:
- Concise, simple, distinct
- Follow format rules

Reply in this format:
Front: [edited front]
Back: [edited back]

Terminal commands:
```bash
$ command <placeholder>
```

Code:
```language
code here
```

Use the following placeholders only: <file>, <path>, <link>, <command>.

No explanations.

Example 1:
Front: What command does extract files from a zip archive?
Back: ```bash
$ unzip <file>
```
Front: Extract zip files
Back: ```bash
$ unzip <file>
```

Example 2:
Front: What is the command to print manual or get help for a command?
Back: ```bash
$ man ...
```
Front: Get command manual/help
Back: ```bash
$ man <command>
```

Example 3: 
Front: What command does create a soft link?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: Create soft link
Back: ```bash
$ ln -s <file> <link>
```

Example 4:
Front: In the `ln -s` command, what is the order of file name and link name?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: `ln -s` argument order
Back: <file> then <link>
"""

vllm_generate(system_msg, n_notes=5)

#######################
Front: What command does create a soft link?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: Create soft link
Back: ```bash
$ ln -s <file> <link>
```
#######################
Front: In the `ln -s` command, what is the order of file name and link name?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: `ln -s` argument order
Back: <file> then <link>
#######################
Front: What command does extract files from a zip archive?
Back: ```bash
$ unzip <file>
```
Front: Extract zip files
Back: ```bash
$ unzip <file>
```
#######################
Front: What is the command to list the content of a directory?
Back: ```bash
$ ls <path>
```
Front: List directory content
Back: ```bash
$ ls <path>
```
#######################
Front: What is the command to print text to the terminal window?
Back: ```bash
$ echo ...
```
Front: Print text to terminal
Back: ```bash
$ echo <text>
```


### Guided JSON

In [12]:
vllm_generate(system_msg, n_notes=10, json_mode=True)

#######################
Front: What command does create a soft link?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: Create soft link
Back: ```bash
$ ln -s <file_name> <link_name>
```
#######################
Front: In the `ln -s` command, what is the order of file name and link name?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: ln -s argument order
Back: ```bash
$ ln -s <file_name> <link_name>
```
#######################
Front: What command does extract files from a zip archive?
Back: ```bash
$ unzip <file>
```
Front: Extract zip files
Back: ```bash
$ unzip <file>
```
#######################
Front: What is the command to list the content of a directory?
Back: ```bash
$ ls <path>
```
Front: List directory content
Back: ```bash
$ ls <path>
```
#######################
Front: What is the command to print text to the terminal window?
Back: ```bash
$ echo ...
```
Front: Print text to terminal
Back: ```bash
$ echo ...
```
#######################
Front: What is the command

One issue with the above output is that the LLM sometimes forgets to wrap terminal commands in a code block. We can address that by showing the model some examples of the expected output using the JSON format. This will resemble much closer what the model is expected to generate.

In [13]:
system_msg = """
Optimize this Anki note:
- Concise, simple, distinct
- Follow format rules
- Markdown syntax

Reply in this format:
Front: [edited front]
Back: [edited back]

Terminal commands:
```bash
$ command <placeholder>
```

Code:
```language
code here
```

Use the following placeholders only: <file>, <path>, <link>, <command>.

No explanations.

Return results using this JSON schema:
{
    "title": "Note",
    "type": "object",
    "properties": {
        "front": {"type": "string"},
        "back": {"type": "string"},
    },
    "required": ["front", "back"]
}

Example 1:
Front: What command does extract files from a zip archive?
Back: ```bash
$ unzip <file>
```
{ "front": "Extract zip files", "back": "```bash $ unzip <file>```" }

Example 2:
Front: What is the command to print manual or get help for a command?
Back: ```bash
$ man ...
```
{ "front": "Get command manual/help", "back": "```bash $ man <command>```" }

Example 3: 
Front: What command does create a soft link?
Back: ```bash
$ ln -s <file_name> <link_name>
```
{ "front": "Create soft link", "back": "```bash $ ln -s <file> <link>```" }

Example 4:
Front: In the `ln -s` command, what is the order of file name and link name?
Back: ```bash
$ ln -s <file_name> <link_name>
```
{ "front": "`ln -s` argument order", "back": "Back: <file> then <link>" }

Example 5:
Front: What is the range of the Leaky ReLU function?
Back: $ [ -0.01, + \infty ] $
{ "front": "Leaky ReLU range", "back": "$ [-0.01, +\infty] $" }
"""

vllm_generate(system_msg, n_notes=10, json_mode=True)

#######################
Front: What command does create a soft link?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: Create soft link
Back: ```bash
$ ln -s <file_name> <link_name>
```
#######################
Front: In the `ln -s` command, what is the order of file name and link name?
Back: ```bash
$ ln -s <file_name> <link_name>
```
Front: `ln -s` argument order
Back: ```bash
$ ln -s <file_name> <link_name>
```
#######################
Front: What command does extract files from a zip archive?
Back: ```bash
$ unzip <file>
```
Front: Extract zip files
Back: ```bash
$ unzip <file>
```
#######################
Front: What is the command to list the content of a directory?
Back: ```bash
$ ls <path>
```
Front: List directory content
Back: ```bash
$ ls <path>
```
#######################
Front: What is the command to print text to the terminal window?
Back: ```bash
$ echo ...
```
Front: Print text to terminal
Back: ```bash
$ echo ...
```
#######################
Front: What is the comma

This looks much better!

### Other tags

Let's check how the model performs with other type of notes.

In [14]:
vllm_generate(system_msg, n_notes=10, json_mode=True, tags=["python"])

#######################
Front: Why can't lists be used as keys in a dictionary?
Back: Lists are not hashable
Front: Why can't lists be used as keys in a dictionary?
Back: Lists are not hashable
#######################
Front: What are the two requirements for a Python object to be hashable?
Back: * Be immutable* Have implemented `__hash__`
Front: Hashable Python object requirements
Back: * Be immutable* Have implemented `__hash__`
#######################
Front: "What does gc.collect() do?"
Back: It removes all objects without an active reference
Front: gc.collect() purpose
Back: It removes all objects without an active reference
#######################
Front: What function does scan a string for a regex match?
Back: ```python
>>> re.search(<regex>, <string>)
```
Front: Scan string for regex match
Back: ```python
>>> re.search(<regex>, <string>)
```
#######################
Front: What are the two possible objects `re.search()` can return?
Back: Either a match object or `None`

```python


In [15]:
vllm_generate(system_msg, n_notes=10, json_mode=True, tags=["dl"])

#######################
Front: What is the formula to compute the L2-norm?
Back: $ \|\boldsymbol{x}\|_2 = \sqrt{ \sum{x_i^2}} $
Front: L2-norm formula
Back: $ \|\boldsymbol{x}\|_2 = \sqrt{ \sum{x_i^2}} $
#######################
Front: When implementing Label Smoothing, what are we replacing the target 1s with?
Back: Assuming a multi-label classification problem with $ \kappa $ output values, the training target 1 are replaced with  $ 1-\epsilon $
Front: Label Smoothing target replacement
Back: Assuming a multi-label classification problem with $ \kappa $ output values, the training target 1 are replaced with  $ 1-\epsilon $
#######################
Front: Generally speaking, what is the purpose of applying Label Smoothing?
Back: Reduce the network’s over-confidence
Front: Label Smoothing purpose
Back: Reduce the network’s over-confidence
#######################
Front: What is the range of the Leaky ReLU function?
Back: $ [ -0.01, + \infty ] $
Front: Leaky ReLU range
Back: $ [ -0.01, + \