# Malayalam with CLTK

Use <b>CLTK</b> to analyze your Malayalam texts.
<br>Let us start by setting `USER_PATH`

In [1]:
import os
USER_PATH = os.path.expanduser('~')

Let us try to download a Malayalam corpora that is available remotely at CLTK's Github repo. To do this, first we will need an importer.

In [2]:
from cltk.corpus.utils.importer import CorpusImporter
malayalam_downloader = CorpusImporter('malayalam')

Once we have this, we can check the corpora available for download as follows:

In [3]:
print(malayalam_downloader.list_corpora)

['malayalam_text_gretil']


Let us now download the <i>malayalam_text_gretil</i> corpus. 

In [4]:
malayalam_downloader.import_corpus('malayalam_text_gretil')

The corpus has been downloaded to a directory in `cltk_data` , which resides in the `USER_PATH`. Let us open the text, <i>Jyotsnika</i>.

In [5]:
malayalam_corpus_path = os.path.join(USER_PATH, 'cltk_data/malayalam/text/malayalam_text_gretil/text')
malayalam_text_path = os.path.join(malayalam_corpus_path,'jyotsniu.txt')
malayalam_text = open(malayalam_text_path,'r').read()

In [6]:
print(malayalam_text[1930:2998]) #indices adjusted


jyōtsnikā viṣavaidyaṃ

[1]abhivandanādhikāraṃ
hariḥ ṣrī gaṇapatayē namaḥ
avighnamastu

maṃgaḷaṃ
vandē varadamācāryyamantarāyōpaśāntayē  /
gaṇanāthaṃ ca gōvindaṃ kumārakamalōtbhavau  // Jyo_1.1 //

muṭiyil tiṅkaḷuṃ pāṃpuṃ maṭiyil gauriyuṃ sadā  /
kuṭi koṇṭoru dēvantannaṭiyāṃ paṅkajam bhajē  // Jyo_1.2 //

gatvā svarggamatandritassuravaraṃ
jitvā sudhāṃ bāhubhirddhṛtvā
mātaramētya vidrutataraṃ
datvāśu tasyai tataḥ
hṛtvā dāsyamanēkakadrutanayān
hatvā muhurmmātaraṃ
natvā yastu virājatē tamaniśaṃ
vandē khagādhīśvaraṃ  // Jyo_1.3 //

yēnaviṣṇōrddhvajaṃ sākṣādrājatē paramātmanaḥ  /
tasmai namōstu satataṃ garuḍāya mahātmanē  // Jyo_1.4 //


pratijñā
viṣapīḍitarāyuḷḷa narāṇāṃ hitasiddhayē  /
taccikitsāṃ pravakṣyāmi prasannāstu sarasvatī  // Jyo_1.5 //

gurudēvadvijātīnāṃ bhaktaḥ śuddhō dayāparaḥ  /
svakarmmābhirataḥ kuryyāl garapīḍitarakṣaṇaṃ  // Jyo_1.6 //

tathā bahujanadrōhaṃ ceyvōnuṃ brahmahāvinuṃ  /
svadharmmācāramaryyādāhīnanuṃ dviṣatāmapi  // Jyo_1.7 //

kṛtaghnabhīruśōkārttacaṇḍānāṃ vya

## Transliterations

Transliterations of Malayalam text from Malayalam to other scripts, indicizing ITRANS-transliteration and romanizing Malayalam script can be done. Let us convert a sample text from Malayalam to Hindi.

In [7]:
from cltk.corpus.sanskrit.itrans.unicode_transliterate import UnicodeIndicTransliterator
malayalam_text_two = 'കായിക'
UnicodeIndicTransliterator.transliterate(malayalam_text_two,'ml','hi') ##transliterating to hindi

'कायिक'

Now, let us try transliterating ITRANS-transliteration of `അവിഘ്നമസ്തു` to Malayalam..

In [8]:
from cltk.corpus.sanskrit.itrans.unicode_transliterate import ItransTransliterator
ItransTransliterator.from_itrans('avighnamastu','ml')

'അവിഘ്നമസ്തു'

Similiarly, we can romanize the Malayalam words as follows:

In [9]:
from cltk.corpus.sanskrit.itrans.unicode_transliterate import ItransTransliterator
ItransTransliterator.to_itrans('തസ്യൈ','ml')

'tasyai'

## Syllabifier

We can use the indian_syllabifier to syllabify the Malayalam sentences. To do this, we will have to import models as follows. The importing of `sanskrit_models_cltk` might take some time.

In [10]:
phonetics_model_importer = CorpusImporter('sanskrit')
phonetics_model_importer.list_corpora
phonetics_model_importer.import_corpus('sanskrit_models_cltk') 

Now we import the syllabifier and syllabify as follows:

In [11]:
%%capture
from cltk.stem.sanskrit.indian_syllabifier import Syllabifier
malayalam_syllabifier = Syllabifier('malayalam')
malayalam_syllables = malayalam_syllabifier.orthographic_syllabify('ജാലവിദ്യ')

The syllables of the word ജാലവിദ്യ will thus be:

In [12]:
print(malayalam_syllables)

['ജാ', 'ല', 'വി', 'ദ്യ']
