# Pali with CLTK

Here's a quick overview on how you can analyse your Pali texts with <b>CLTK</b> ! <br>
Let's begin by adding the `USER_PATH`..

In [1]:
import os
USER_PATH = os.path.expanduser('~')

In order to be able to download Pali texts from CLTK's Github repo, we will require an importer.

In [2]:
from cltk.corpus.utils.importer import CorpusImporter
pali_downloader = CorpusImporter('pali')

We can now see the corpora available for download, by using `list_corpora` feature of the importer. Let's go ahead and try it out!

In [3]:
pali_downloader.list_corpora

['pali_text_ptr_tipitaka', 'pali_texts_gretil']

The corpus <i>pali_texts_gretil</i> can be downloaded from the Github repo. The corpus will be downloaded to the directory `cltk_data/pali` at the above mentioned `USER_PATH`

In [4]:
pali_downloader.import_corpus('pali_texts_gretil')

You can see the texts downloaded by doing the following, or checking out the `cltk_data/pali/text/pali_texts_gretil` directory.

In [5]:
pali_corpus_path = os.path.join(USER_PATH,'cltk_data/pali/text/pali_texts_gretil')
list_of_texts = [text for text in os.listdir(pali_corpus_path) if '.' not in text]
print(list_of_texts)

['9_phil', '2_parcan', '4_comm', '1_tipit', '3_chron', '6_suanco']


Digha Nikaya is a Buddhist scripture and is one of the five nikayas in Sutta Pitaka, which is one of the three parts of Pali Tipitaka. Let us view the contents of the first chapter of Digha Nikaya.

In [6]:
pali_text_path = os.path.join(pali_corpus_path,'1_tipit/2_sut/1_digh/dighan1u.txt')
pali_text = open(pali_text_path,'r').read()
print(pali_text)



Input by the Sri Lanka Tripitaka Project


[PTS Vol D - 1] [\z D /] [\f I /]    
[PTS Page 001] [\q   1/]     
[BJT Vol D - 1] [\z D /] [\w I /]    
[BJT Page 002] [\x   2/]     

Suttantapiṭake

Dīghanikāyo

Sīlakkhandhavaggo



THIS GRETIL TEXT FILE IS FOR REFERENCE PURPOSES ONLY!
COPYRIGHT AND TERMS OF USAGE AS FOR SOURCE FILE.

Text converted to Unicode (UTF-8).
(This file is to be used with a UTF-8 font and your browser's VIEW configuration
set to UTF-8.)



description:multibyte sequence:
long a  ā   
long A  Ā   
long i  ī   
long I  Ī   
long u  ū   
long U  Ū   
vocalic r  ṛ  
vocalic R  Ṛ  
long vocalic r  ṝ  
vocalic l  ḷ  
vocalic L  Ḷ  
long vocalic l  ḹ  
velar n  ṅ  
velar N  Ṅ  
palatal n  ñ   
palatal N  Ñ   
retroflex t  ṭ  
retroflex T  Ṭ  
retroflex d  ḍ  
retroflex D  Ḍ  
retroflex n  ṇ  
retroflex N  Ṇ  
palatal s  ś   
palatal S  Ś   
retroflex s  ṣ  
retroflex S  Ṣ  
anusvara  ṃ  
visarga  ḥ  
long e  ē   
long o  ō   
l underbar  ḻ  
r underbar  ṟ  
n underba

## Pali Alphabets

Pali is usually written in Sinhalese, Brāhmi, Khmer, Burmese or Devanagari. There are 7 independent vowels, 7 dependent vowels and 33 consonants in Pali (Sinhalese script) which are as follows:

In [7]:
from cltk.corpus.pali.alphabet import *
print("Independent vowels:",INDEPENDENT_VOWELS)
print("Dependent vowels:",DEPENDENT_VOWELS)
print("Consonants:",CONSONANTS)

Independent vowels: ['අ', 'ආ', 'ඉ', 'ඊ', 'උ', 'එ', 'ඔ']
Dependent vowels: ['ා', 'ි', 'ී', 'ු', 'ූ', 'ෙ', 'ො']
Consonants: ['ක', 'ඛ', 'ග', 'ඝ', 'ඞ', 'ච', 'ඡ', 'ජ', 'ඣ', 'ඤ', 'ට', 'ඨ', 'ඩ', 'ඪ', 'ණ', 'ත', 'ථ', 'ද', 'ධ', 'න', 'ප', 'ඵ', 'බ', 'භ', 'ම', 'ය', 'ර', 'ල', 'ව', 'ස', 'හ', 'ළ', 'අං']


## Transliterations

We can transliterate Pali texts to that of other Indic languages. Let us transliterate `අභිරුචිර` to Hindi:

In [8]:
pali_text_two = 'අභිරුචිර'
from cltk.corpus.sanskrit.itrans.unicode_transliterate import UnicodeIndicTransliterator
UnicodeIndicTransliterator.transliterate(pali_text_two,"si","hi")

'अभिरुचिर'

We can also romanize the text as shown:

In [9]:
from cltk.corpus.sanskrit.itrans.unicode_transliterate import ItransTransliterator
ItransTransliterator.to_itrans(pali_text_two,'si')

'abhiruchira'

Similarly, we can indicize a text given in its ITRANS-transliteration

In [10]:
pali_text_itrans = 'devadhammo'
ItransTransliterator.from_itrans(pali_text_itrans,'si')

'දේවධම්මෝ'

## Syllabifier

We can use the indian_syllabifier to syllabify the Pali sentences. To do this, we will have to import models as follows. The importing of `sanskrit_models_cltk` might take some time.

In [11]:
phonetics_model_importer = CorpusImporter('sanskrit')
phonetics_model_importer.list_corpora
phonetics_model_importer.import_corpus('sanskrit_models_cltk') 

Now we import the syllabifier and syllabify as follows:

In [12]:
%%capture
from cltk.stem.sanskrit.indian_syllabifier import Syllabifier
pali_syllabifier = Syllabifier('sinhalese')
pali_syllables = pali_syllabifier.orthographic_syllabify(pali_text_two)

The syllables of the word `pali_text_two` will thus be:

In [13]:
print(pali_syllables)

['අ', 'භ', 'ි', 'ර', 'ු', 'ච', 'ි', 'ර']
