This notebook illustrates the `NLP()` pipeline on all available languages.

If dependency parse information is available, an example tree is printed, too.

In [1]:
from cltk import NLP
from cltk.dependency.tree import DependencyTree
from cltk.languages.example_texts import get_example_text
from cltk.languages.pipelines import *

In [2]:
iso_to_pipeline = {
    "akk": AkkadianPipeline,
    "ang": OldEnglishPipeline,
    "arb": ArabicPipeline,
    "arc": AramaicPipeline,
    "chu": OCSPipeline,
    "cop": CopticPipeline,
    "enm": MiddleEnglishPipeline,
    "frm": MiddleFrenchPipeline,
    "fro": OldFrenchPipeline,
    "gmh": MiddleHighGermanPipeline,
    "got": GothicPipeline,
    "grc": GreekPipeline,
    "hin": HindiPipeline,
    "lat": LatinPipeline,
    "lzh": ChinesePipeline,
    "non": OldNorsePipeline,
    "pan": PanjabiPipeline,
    "pli": PaliPipeline,
    "san": SanskritPipeline,
}

In [3]:
for lang, pipeline in iso_to_pipeline.items():
    print(f"{pipeline.language.name} ('{pipeline.language.iso_639_3_code}') ...")
    text = get_example_text(lang)
    cltk_nlp = NLP(language=lang)
    cltk_doc = cltk_nlp.analyze(text=text)
    cltk_doc.sentences_strings
    word = cltk_doc.sentences[0][0]
    print("Example `Word`:", word)
    if all([w.features for w in cltk_doc.sentences[0]]):
        print("Printing dependency tree of first sentence ...")
        try:
            a_tree = DependencyTree.to_tree(cltk_doc.sentences[0])
        except:
            print(f"Dependency parsing Process not available for '{lang}'.")
            print("")
            continue
        a_tree.print_tree()
    print("")

Akkadian ('akk') ...
‎𐤀 CLTK version '1.0.11'.
Pipeline for language 'Akkadian' (ISO: 'akk'): `AkkadianTokenizationProcess`, `StopsProcess`.
Example `Word`: Word(index_char_start=0, index_char_stop=2, index_token=0, index_sentence=None, string=('u2-wa-a-ru', 'akkadian'), pos=None, lemma=None, stem=None, scansion=None, xpos=None, upos=None, dependency_relation=None, governor=None, features={}, category={}, stop=False, named_entity=None, syllables=None, phonetic_transcription=None, definition=None)
Printing dependency tree of first sentence ...
Dependency parsing Process not available for 'akk'.

Old English (ca. 450-1100) ('ang') ...
‎𐤀 CLTK version '1.0.11'.
Pipeline for language 'Old English (ca. 450-1100)' (ISO: 'ang'): `MultilingualTokenizationProcess`, `OldEnglishLemmatizationProcess`, `OldEnglishEmbeddingsProcess`, `StopsProcess`, `OldEnglishNERProcess`.
Example `Word`: Word(index_char_start=0, index_char_stop=5, index_token=0, index_sentence=None, string='Hwæt.', pos=None, lemma=

Example `Word`: Word(index_char_start=None, index_char_stop=None, index_token=0, index_sentence=0, string='swa', pos=adverb, lemma='swa', stem=None, scansion=None, xpos='Df', upos='ADV', dependency_relation='advmod', governor=1, features={}, category={F: [neg], N: [pos], V: [pos]}, stop=None, named_entity=None, syllables=None, phonetic_transcription=None, definition=None)
Printing dependency tree of first sentence ...
root | liuhtjai_1/verb
    └─ advmod | swa_0/adverb
    └─ obj | liuhaþ_2/noun
        └─ nmod | izwar_3/adjective
    └─ obl | andwairþja_5/noun
        └─ case | in_4/adposition
        └─ nmod | manne,_6/noun
    └─ advcl | gasaiƕaina_8/verb
        └─ mark | ei_7/subordinating_conjunction
        └─ nsubj | waurstwa_11/noun
            └─ nmod | izwara_9/adjective
            └─ amod | goda_10/adjective
        └─ cc | jah_12/coordinating_conjunction
        └─ conj | hauhjaina_13/verb
            └─ obj | attan_14/noun
                └─ nmod | izwarana_15/adjective
