## Tutorial for using string edit distance metrics
Yarden 11/2023

In [21]:
import sys
sys.path.append('..')
from helper_functions_song_sequence.string_metrics import *

### The Levenshtein distance 
Given 2 strings, this metric counts how many edits (insertions, deletions, substitutions) are needed to convert one string to the other. See more in this [link to Wikipedia](https://www.wikiwand.com/en/Levenshtein_distance)

The function levenshtein(string1, string2), defined in [string_metrics.py](https://github.com/NeuralSyntaxLab/birdsong_behavior_analysis/blob/main/helper_functions_song_sequence/string_metrics.py), implements the Levenshtein distance.

### Examples:

In [22]:
print(levenshtein('Brownies','Frown'))

4


### Character error rate
Since strings can be of different lengths it is not so useful to know the Levenstein distance. Instead, we normalize the Levenshtein distance by the length of the shorter string. This creates a metric that we term **The Character Error Rate**.

The function char_error_rate(string1, string2), defined in [string_metrics.py](https://github.com/NeuralSyntaxLab/birdsong_behavior_analysis/blob/main/helper_functions_song_sequence/string_metrics.py), implements this metric.

### Examples:

In [26]:
LEV_PARAMETRIZE = [
# adapted from https://github.com/toastdriven/pylev/blob/master/tests.py
    ("kitten", "sitting", 3),
    ("kitten", "kitten", 0),
    ("", "", 0),
    ("kitten", "", 6),
    ("", "sitting", 7),
    ("meilenstein", "levenshtein", 4),
    ("levenshtein", "frankenstein", 6),
    ("confide", "deceit", 6),
    ("CUNsperrICY", "conspiracy", 8),
    # case added to catch failure with our previous implementation from
    # https://en.wikibooks.org/wiki/Talk:Algorithm_Implementation/Strings/Levenshtein_('aabcc', 'bccdd', 4),
]
for source, target, expected in LEV_PARAMETRIZE:
    if len(target) == 0 and len(source) != 0:
        ser = 'value error'
    else:
        ser = char_error_rate(source, target)
    print(f'source: {source}, target: {target}\n\tvak: {ser}')


source: kitten, target: sitting
	vak: 0.42857142857142855
source: kitten, target: kitten
	vak: 0.0
source: , target: 
	vak: 0.0
source: kitten, target: 
	vak: value error
source: , target: sitting
	vak: 1.0
source: meilenstein, target: levenshtein
	vak: 0.36363636363636365
source: levenshtein, target: frankenstein
	vak: 0.5
source: confide, target: deceit
	vak: 1.0
source: CUNsperrICY, target: conspiracy
	vak: 0.8
