# Custom Classes

Some rules might not fit in the existing classes, or it might be preferred to implement your own.

This is fairly simple and will be explained in this notebook.

## Normalization

### Structure

For a class to be considered a "normalization" class, all it needs to do is provide a `normalize` method with the following signature:

```python
def normalize(self, text: str) -> str
```

E.g.

In [1]:
class MyCustomNormalizer:
    def normalize(self, text):
        return text.strip().lower().replace('apples', 'oranges')

This can be used without any need for `benchmarkstt`. E.g.

In [2]:
normalizer = MyCustomNormalizer()

print(normalizer.normalize("Comparing apples to oranges"))

comparing oranges to oranges


### Usage

The normalizer class can be used directly with e.g. input classes.

In [3]:
from benchmarkstt.metrics.core import WordDiffs
from benchmarkstt.input.core import PlainText
word_diffs = WordDiffs('ansi')

plaintext_1 = PlainText("Comparing apples to ORANGES", normalizer)
plaintext_2 = PlainText("COMPARING apples to pears", normalizer)

print(word_diffs.compare(plaintext_1, plaintext_2))

Color key: Unchanged [31mReference[0m [32mHypothesis[0m

·comparing·oranges·to[31m·oranges[0m[32m·pears[0m


## Metrics

### Structure

For a class to be considered a "metrics" class, all it needs to do is provide a compare method with the following signature:

```python
def compare(self, ref: benchmarkstt.schema.Schema, hyp: benchmarkstt.schema.Schema) -> Any
```

(`benchmarkstt.schema.Schema` should be treated as an iterable)

E.g.

In [4]:
class IsTheSame:
    def compare(self, ref, hyp):
        return ref == hyp

or

In [5]:
class FirstDifference:
    def compare(self, ref, hyp):
        n = 0
        ihyp = iter(hyp)
        for n, ref_n in enumerate(ref):
            hyp_n = next(ihyp, None)
            if hyp_n != ref_n:
                return (n, ref_n, hyp_n)
        
        hyp_n = next(ihyp, None)
        if hyp_n is None:
            return False
        return (n+1, None, hyp_n)

This can be used and tested directly without any need for `benchmarkstt`. E.g.

In [6]:
is_the_same = IsTheSame()
a = iter("comparing apples to oranges".split())
b = iter("comparing oranges to pears".split())

print("IsTheSame")
print(is_the_same.compare(a, b))

first_difference = FirstDifference()
print("\nFirstDifference")
print(first_difference.compare(a, b))

IsTheSame
False

FirstDifference
(1, 'apples', 'oranges')


### Usage

In [7]:
plaintext_1 = PlainText("Comparing apples to ORANGES", normalizer)
plaintext_2 = PlainText("COMPARING apples to pears", normalizer)

print("IsTheSame")
print(is_the_same.compare(plaintext_1, plaintext_2))

print("\nFirstDifference")
print(first_difference.compare(plaintext_1, plaintext_2))

IsTheSame
False

FirstDifference
(3, Item({"item": "oranges", "type": "word", "@raw": "oranges"}), Item({"item": "pears", "type": "word", "@raw": "pears"}))
