### Morphology:-

1. Morphology in Natural Language Processing (NLP) is the study of the internal structure and functions of words, and how words are formed from smaller meaningful units called morphemes.
2. Morphology is one of the central linguistic disciplines and is crucial in NLP for tasks such as spell checkers, machine translation, and text analysis.
3. Morphemes are the smallest units of a language that carry meaning. They can be prefixes, suffixes, or roots.
4. There are two types of morphology: *inflectional and derivational*.
    
    a. Inflectional morphology refers to the way words are formed by adding affixes, such as prefixes and suffixes.
    
    b. Derivational morphology, on the other hand, involves changing the meaning of a word by adding a prefix or suffix. For example, the word "unhappy" is derived from the word "happy" by adding the prefix "un-" .
5. There are two main approaches to morphology in NLP: *word-based and lexeme-based*.
    
    a. Word-based morphology considers words as arrangements of morphemes,
    
    b. lexeme-based morphology views a word form as the result of applying rules to a stem .
6. Morphology is also used in NLP for tasks such as ***stemming***, where the goal is to reduce words to their root form, and ***lemmatization***, where the goal is to reduce a word to its base or dictionary form.
7. morphology plays a vital role in NLP by providing a way to analyze and understand the structure of words, which is crucial for tasks such as text analysis, machine translation, and language understanding.

In [1]:
!pip install morfessor polyglot pyICU pycld2     #pyICU AND pycld2 is requied to run these

Collecting morfessor
  Downloading Morfessor-2.0.6-py3-none-any.whl (35 kB)
Collecting polyglot
  Downloading polyglot-16.7.4.tar.gz (126 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.3/126.3 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pyICU
  Downloading PyICU-2.12.tar.gz (260 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m260.0/260.0 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting pycld2
  Downloading pycld2-0.41.tar.gz (41.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.4/41.4 MB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: polyglot, pyICU, pycld2
  Building whee

In [2]:
%%bash
polyglot download morph2.en morph2.mr morph2.ru        #used to download the languages

[polyglot_data] Downloading package morph2.en to
[polyglot_data]     /root/polyglot_data...
[polyglot_data] Downloading package morph2.mr to
[polyglot_data]     /root/polyglot_data...
[polyglot_data] Downloading package morph2.ru to
[polyglot_data]     /root/polyglot_data...


In [3]:
from polyglot.downloader import downloader
print(downloader.supported_languages_table("morph2"))     #it supports 135 languages

  1. Kapampangan                2. Italian                    3. Upper Sorbian            
  4. Sakha                      5. Hindi                      6. French                   
  7. Spanish; Castilian         8. Vietnamese                 9. Arabic                   
 10. Macedonian                11. Pashto, Pushto            12. Bosnian-Croatian-Serbian 
 13. Egyptian Arabic           14. Norwegian Nynorsk         15. Sundanese                
 16. Sicilian                  17. Azerbaijani               18. Bulgarian                
 19. Yoruba                    20. Tajik                     21. Georgian                 
 22. Tatar                     23. Galician                  24. Malagasy                 
 25. Uighur, Uyghur            26. Amharic                   27. Venetian                 
 28. Yiddish                   29. Norwegian                 30. Alemannic                
 31. Estonian                  32. West Flemish              33. Divehi; Dhivehi; Mald... 

In [4]:
from polyglot.text import Text, Word

In [6]:
words = ['cats','computing','association','programming','leadership','miscommunication','communication','identifiable',
        'psychologically','engneering','tabular','realistic','colorfulness','loveable']

for word in words:
    w = Word(word, language="en")
    print(w,'\t\t',w.morphemes)

cats 		 ['cat', 's']
computing 		 ['com', 'put', 'ing']
association 		 ['associ', 'ation']
programming 		 ['program', 'ming']
leadership 		 ['leader', 'ship']
miscommunication 		 ['mis', 'communication']
communication 		 ['communication']
identifiable 		 ['identif', 'i', 'able']
psychologically 		 ['psycho', 'logical', 'ly']
engneering 		 ['en', 'g', 'ne', 'er', 'ing']
tabular 		 ['tab', 'ular']
realistic 		 ['real', 'istic']
colorfulness 		 ['color', 'ful', 'ness']
loveable 		 ['love', 'able']


In [7]:
words = ['प्रामाणिकपणा','जलविदुत','इतिहास','रविवार','मानवशास्त्र','जलतरण','मदन','यशोधन','आदित्य','']
for w in words:
    w = Word(w, language="mr")
    print(w, '\t\t',w.morphemes)

प्रामाणिकपणा 		 ['प्रा', 'माणिक', 'पणा']
जलविदुत 		 ['जल', 'वि', 'दु', 'त']
इतिहास 		 ['इतिहास']
रविवार 		 ['रवि', 'वार']
मानवशास्त्र 		 ['मानव', 'शास्त्र']
जलतरण 		 ['जलतरण']
मदन 		 ['मदन']
यशोधन 		 ['य', 'शोध', 'न']
आदित्य 		 ['आदि', 'त्य']
 		 []


In [8]:
words = ['хонэст','секюрэ','сахиль','мадан','сухас','адитья']
for w in words:
    w = Word(w, language="ru")
    print(w, '\t\t',w.morphemes)

хонэст 		 ['х', 'он', 'эст']
секюрэ 		 ['сек', 'ю', 'р', 'э']
сахиль 		 ['с', 'ах', 'иль']
мадан 		 ['м', 'а', 'дан']
сухас 		 ['сух', 'а', 'с']
адитья 		 ['а', 'д', 'ить', 'я']


In [10]:
st = "Wewillmeettoday."
text = Text(st)
text.language="en"

text.morphemes

WordList(['We', 'will', 'meet', 'to', 'day', '.'])