# The Macronizer Class

A new macronizer object takes a range of initialization variables, all of them optional. Only the first two are intended to be changed by the user:


- `macronize_everything=True`, determines whether to mark macrons whose length is inferable from accent rules (should be False for human audience)
- `unicode=False`, determines whether output is human-friendly unicode combining diacritics or machine-friendly non-combining carets and underscores. Evaluation methods are only available for the latter.

Here's the simplest case possible:

In [None]:
from class_macronizer import Macronizer

macronizer = Macronizer()

input = '''ἀάατος, ἀγαθὸς, καλὸς, ἀνήρ, νεανίας'''
output = macronizer.macronize_text(input)

print(f'Results: {output}')

                                                            


Macronization took 0.02 seconds
Results: ἀ^ά_α^τος, ἀ^γα^θὸς, κα^λὸς, ἀ^νήρ, νεα_νί^α_ς




Let's try it with unicode diacritics. This is a useful option for settings where there is access to fonts with so-called OpenType ligature instructions like [New Athena](https://github.com/SteelWagstaff/new-athena-unicode), meaning fonts that have prepared precomposed glyphs for adding longa or brevia to Greek letters that already have other diacritics. 

**Important**: this option is only for printing results; to apply any of the evaluation methods, you will need to turn unicode off.

In [2]:
macronizer = Macronizer(unicode=True)

output = macronizer.macronize_text(input)
print(f'Results: {output}')

                                                            


Macronization took 0.00 seconds
Results: ἀ̆ά̄ᾰτος, ἀ̆γᾰθὸς, κᾰλὸς, ἀ̆νήρ, νεᾱνί̆ᾱς




We can evaluate the results with the method `macronization_ratio`, which makes some prints and returns a ratio. Since proper names are especially prone to have ambivalent dichrona, there is the option to exclude them from the statistic by using `count_proper_names=False`.

This time let's try a longer input. Have patience if you include the evaluation, as it is O(n) and takes 25 extra seconds on my machine.

In [5]:
from class_macronizer import Macronizer
from tests.anabasis import anabasis

macronizer = Macronizer()

words = anabasis.split()
print(f'Input is {len(words)} words')

input = anabasis
output = macronizer.macronize_text(input)
with open('tests/anabasis_macronized.txt', 'w') as f:
    f.write(output)

ratio = macronizer.macronization_ratio(input, output)
print(ratio)


Input is 57178 words


                                                                               


Macronization took 3.94 seconds

Counting all dichrona...


                                                                                             

Dichrona in open syllables before: 61761
Unmacronized dichrona in open syllables left: 47118

Evaluation took 25.29 seconds
Difference: 14643
0.2370913683392432


