# Gujarati with CLTK

See how you can analyse your Gujarati texts with <b>CLTK</b> ! <br>
Let's begin by adding the `USER_PATH`..

In [1]:
import os
USER_PATH = os.path.expanduser('~')

In order to be able to download Gujarati texts from CLTK's Github repo, we will require an importer.

In [2]:
from cltk.corpus.utils.importer import CorpusImporter
gujarati_downloader = CorpusImporter('gujarati')

We can now see the corpora available for download, by using `list_corpora` feature of the importer. Let's go ahead and try it out!

In [3]:
gujarati_downloader.list_corpora

['gujarati_text_wikisource']

The corpus <i>gujarati_text_wikisource</i> can be downloaded from the Github repo. The corpus will be downloaded to the directory `cltk_data/gujarati` at the above mentioned `USER_PATH`

In [4]:
gujarati_downloader.import_corpus('gujarati_text_wikisource')

You can see the texts downloaded by doing the following, or checking out the `cltk_data/gujarati/text/gujarati_text_wikisource` directory.

In [7]:
gujarati_corpus_path = os.path.join(USER_PATH,'cltk_data/gujarati/text/gujarati_text_wikisource')
list_of_texts = [text for text in os.listdir(gujarati_corpus_path) if '.' not in text]
print(list_of_texts)

['narsinh_mehta', 'kabir', 'vallabhacharya']


Great, now that we have our texts, let's take a sample from one of them. For this tutorial, we shall be using <i>govinda_khele_holi</i> , a text by the Gujarati poet Narsinh Mehta.

In [10]:
gujarati_text_path = os.path.join(gujarati_corpus_path,'narsinh_mehta/govinda_khele_holi.txt')
gujarati_text = open(gujarati_text_path,'r').read()
print(gujarati_text)

﻿વૃંદાવન જઈએ,
જીહાં ગોવિંદ ખેલે હોળી;
નટવર વેશ ધર્યો નંદ નંદન,
મળી મહાવન ટોળી... ચાલો સખી !

એક નાચે એક ચંગ વજાડે,
છાંટે કેસર ઘોળી;
એક અબીરગુલાલ ઉડાડે,
એક ગાય ભાંભર ભોળી... ચાલો સખી !

એક એકને કરે છમકલાં,
હસી હસી કર લે તાળી;
માહોમાહે કરે મરકલાં,
મધ્ય ખેલે વનમાળી... ચાલો સખી !

વસંત ઋતુ વૃંદાવન સરી,
ફૂલ્યો ફાગણ માસ;
ગોવિંદગોપી રમે રંગભર,
જુએ નરસૈંયો દાસ... ચાલો સખી !
 


## Gujarati Alphabets

There are 13 vowels, 33 consonants, which are grouped as follows:

In [12]:
from cltk.corpus.gujarati.alphabet import *
print("Digits:",DIGITS)
print("Vowels:",VOWELS)
print("Dependent vowels:",DEPENDENT_VOWELS)
print("Consonants:",CONSONANTS)
print("Velar consonants:",VELAR_CONSONANTS)
print("Palatal consonants:",PALATAL_CONSONANTS)
print("Retroflex consonants:",RETROFLEX_CONSONANTS)
print("Dental consonants:",DENTAL_CONSONANTS)
print("Labial consonants:",LABIAL_CONSONANTS)
print("Sonorant consonants:",SONORANT_CONSONANTS)
print("Sibilant consonants:",SIBILANT_CONSONANTS)
print("Guttural consonant:",GUTTURAL_CONSONANT)
print("Additional consonants:",ADDITIONAL_CONSONANTS)
print("Modifiers:",MODIFIERS)

Digits: ['૦', '૧', '૨', '૩', '૪', '૫', '૬', '૭', '૮', '૯', '૧૦']
Vowels: ['અ', 'આ', 'ઇ', 'ઈ', 'ઉ', 'ઊ', 'ઋ', 'એ', 'ઐ', 'ઓ', 'ઔ', 'અં', 'અઃ']
Dependent vowels: ['ા ', 'િ', 'ી', 'ો', 'ૌ']
Consonants: ['ક', 'ખ', 'ગ', 'ઘ', 'ચ', 'છ', 'જ', 'ઝ', 'ઞ', 'ટ', 'ઠ', 'ડ', 'ઢ', 'ણ', 'ત', 'થ', 'દ', 'ધ', 'ન', 'પ', 'ફ', 'બ', 'ભ', 'મ', 'ય', 'ર', 'લ', 'ળ', 'વ', 'શ', 'ષ', 'સ', 'હ']
Velar consonants: ['ક', 'ખ', 'ગ', 'ઘ', 'ઙ']
Palatal consonants: ['ચ', 'છ', 'જ', 'ઝ', 'ઞ']
Retroflex consonants: ['ટ', 'ઠ', 'ડ', 'ઢ', 'ણ']
Dental consonants: ['ત', 'થ', 'દ', 'ધ', 'ન']
Labial consonants: ['પ', 'ફ', 'બ', 'ભ', 'મ']
Sonorant consonants: ['ય', 'ર', 'લ', 'વ']
Sibilant consonants: ['શ', 'ષ', 'સ']
Guttural consonant: ['હ']
Additional consonants: ['ળ', 'ક્ષ', 'જ્ઞ']
Modifiers: [' ्', ' ॓', ' ॔']


## Transliterations

We can transliterate Gujarati scripts to that of other Indic languages. Let us transliterate `કમળ ભારતનો રાષ્ટ્રીય ફૂલ છે`to Kannada:

In [16]:
gujarati_text_two = 'કમળ ભારતનો રાષ્ટ્રીય ફૂલ છે'
from cltk.corpus.sanskrit.itrans.unicode_transliterate import UnicodeIndicTransliterator
UnicodeIndicTransliterator.transliterate(gujarati_text_two,"gu","kn")

'ಕಮಳ ಭಾರತನೋ ರಾಷ್ಟ್ರೀಯ ಫೂಲ ಛೇ'

We can also romanize the text as shown:

In [26]:
from cltk.corpus.sanskrit.itrans.unicode_transliterate import ItransTransliterator
ItransTransliterator.to_itrans(gujarati_text_two,'gu')

'kamalda bhaaratano raashhTriiya phuula Che'

Similarly, we can indicize a text given in its ITRANS-transliteration

In [25]:
gujarati_text_itrans = 'bhaawanaa'
ItransTransliterator.from_itrans(gujarati_text_itrans,'gu')

'ભાવના'

## Syllabifier

We can use the indian_syllabifier to syllabify the Gujarati sentences. To do this, we will have to import models as follows. The importing of `sanskrit_models_cltk` might take some time.

In [11]:
phonetics_model_importer = CorpusImporter('sanskrit')
phonetics_model_importer.list_corpora
phonetics_model_importer.import_corpus('sanskrit_models_cltk') 

Now we import the syllabifier and syllabify as follows:

In [27]:
%%capture
from cltk.stem.sanskrit.indian_syllabifier import Syllabifier
gujarati_syllabifier = Syllabifier('gujarati')
gujarati_syllables = gujarati_syllabifier.orthographic_syllabify('ભાવના')

The syllables of the word `ભાવના` will thus be:

In [28]:
print(gujarati_syllables)

['ભા', 'વ', 'ના']
