### Nepali Rhymes

So, there aren't any web applications out there for finding rhyming words in Nepali. Let's attempt to create one here. Required stuffs:

- nepali word list (scraped this from somewhere, don't ask where)
- a wee bit knowledge of python and unicode
- a bored programmer

Since we have each of the required ingredients, we can begin.

In [5]:
# read word list
import codecs
from __future__ import print_function, unicode_literals

words = codecs.open("word_list.txt", "r", "utf-8").read()
words = words.split("\n")

#sample words
_ = map (print, words[0:5])

अन्तरका
नरहरि
मालिकले
अर्थराजनीति
पूर्वाधारतिर


In [6]:
# Take input here
input_word = "पानी"

**Algorithm**

The algo is relatively simple. Find the longest suffix that matches! There! Simple right? Right???

In [17]:
from collections import defaultdict

def get_rhymes(smallest, largest, input_word, words):
    rhymes = defaultdict(set)
    for i in range(smallest, largest+1):
        match = input_word[-i:]
        for w in words:
            if match == w[-i:]:
                rhymes[i].add(w)
        
        # the idea is.. if it is already in a bigger list, remove it from the smaller list (no redundancy)
        rhymes[i-1] = rhymes[i-1] - rhymes[i]
    return rhymes

smallest = 2  # minimum two morphemes should match
largest = len(input_word)

rhymes = get_rhymes(smallest, largest, input_word, words)

# print some of the words:
for i in range(smallest, largest+1):
    print ("-------- Rhymes of suffix length %d --------" % i)
    map(print, list(rhymes[i])[:5])
    

-------- Rhymes of suffix length 2 --------
थाल्नी
सोल्टिनी
ताहाननी
फ्र्याङ्कोलोनी
हाडनोर्नी
-------- Rhymes of suffix length 3 --------
अफगानिस्तानी
खानेबानी
बलिदानी
निसानी
आडवानी
-------- Rhymes of suffix length 4 --------
नयाँपानी
चिसापानी
खिरेपानी
रोगटेपानी
बिजुलीपानी


Now we are done... or not! There are a little hiccups here or there. Consider these two words.
- नालि
- खाली

Now, even a kid can tell you that the two words rhyme. However, our program won't be able to find it, because of the different vowels which actually sound the same (blame Sanskrit). To solve this small hiccup, we'll have to sacrifice grammar a bit which isn't really a problem for most of the time.

In [15]:
# A simple morphological transformation function... we used the same in sambidhaan.dastabez.com

def morphological_transform(np):
    """
    Nepali words are incosistent..

    there are hraswas and the dirghas.. different sa, bindu and ma root.
    
    """

    # change all dirghas to raswas
    np = np.replace(u'ी', u'ि').replace(u'ू', u'ु').replace('ई', 'इ').replace('ऊ', 'उ')

    # change all sa to the same
    np = np.replace('श', 'स').replace('ष', 'स')

    # change bindu to ma root
    np = np.replace(u'ं', 'म्')

    # change sanskrit characters
    np = np.replace('ञ्', 'न्').replace('ण्', 'न्').replace('ऋ', 'रि')

    # other possible morphological transforms that I am not aware of yet..
    return np

Now, we'll use this function to transform our input as well as the word list.

In [11]:
new_words = map(morphological_transform, words)
new_input_word = morphological_transform(input_word)

_ = map(print, new_words[:5])

अन्तरका
नरहरि
मालिकले
अर्थराजनिति
पुर्वाधारतिर


In [16]:
# Done! Now we use the same function to get a richer results!

rhymes = get_rhymes(smallest, largest, new_input_word, new_words)
print ("Word: ", input_word)
# print some of the words:
for i in range(smallest, largest+1):
    print ("-------- Rhymes of suffix length %d --------" % i)
    map(print, list(rhymes[i])[:5])
    

Word:  पानी
-------- Rhymes of suffix length 2 --------
धुसेनि
नाकापनि
लेखिदिनुपनि
चलाएपनि
गजनि
-------- Rhymes of suffix length 3 --------
जवानि
राजामहारानि
मानिमानि
जेठमलानि
कृतिमानि
-------- Rhymes of suffix length 4 --------
ढुकुरपानि
पानि
नरपानि
रानिपानि
सिम्पानि


## Conclusion

See how the output are all hraswas and not a single dirgha is in sight? Normally, this is good enough for song writing which is the primary application of this script. Won't work in "chandas" though (chandas are rule based poetry which are uber strict in grammatical structure). 

Where do you go from here? There are several improvements that can be done:

- Understanding commonly used words and suggesting those before rare words.
- Displaying the words in their correct grammar (by using a table perhaps?)
- Expanding the word lists.
- Upgrading the rhyming algorithm (हुक्का and सुस्का have prefix length 3 despite of the lesser degree of rhyming. The accidental ् is to blame for this)

Did you find this script useful? Well you're in luck! **This whole thing is MIT licensed**. Have fun tweaking and using it (for making a web application perhaps.. mail me if you do).