<a href="https://colab.research.google.com/github/jasmin-lilly/jasmin-lilly/blob/main/Morphology_in_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Perform Morphology in NLP
**Morphology:**
Morphology is the study of the internal structure of words.. Morphology focuses on how the components within a word (stems, root words, prefixes, suffixes, etc.) are arranged or modified to create different meanings.
**Lemmatization :**
Lemmatization is the process of reducing a word to its base form, or lemma, by analyzing its context and inflected form.


In [None]:
import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords

In [12]:
nltk.download('punkt')
nltk.download("punkt_tab")
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('wordnet')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [25]:
def morphology_analysis(text):
  words=word_tokenize(text)
  stop_words=set(stopwords.words('english'))
  words=[word for word in words if word.casefold() not in stop_words]
  tagged_words=pos_tag(words)
  lemmatizer=WordNetLemmatizer()
  lemmatized_words=[lemmatizer.lemmatize(word,get_wordnet_pos(pos)) for word,pos in tagged_words]
  print("Original words:",words)
  print("Lemmatized words:",lemmatized_words)
def get_wordnet_pos(treebank_tag):
  if treebank_tag.startswith('J'):
    return wordnet.ADJ
  elif treebank_tag.startswith('V'):
    return wordnet.VERB
  elif treebank_tag.startswith('N'):
    return wordnet.NOUN
  elif treebank_tag.startswith('R'):
    return wordnet.ADV
  else:
    return wordnet.NOUN

  return lemmatized_words
if __name__=="__main__":
  text=input("Enter a text:")
  morphology_analysis(text)
  for word,pos in pos_tag(word_tokenize(text)):
    print(f"Word:{word},POS Tag:{pos},WordNet POS Tag:{get_wordnet_pos(pos)}")

Enter a text:Morphology is the study of the internal structure of words.. Morphology focuses on how the components within a word (stems, root words, prefixes, suffixes, etc.) are arranged or modified to create different meanings
Original words: ['Morphology', 'study', 'internal', 'structure', 'words', '..', 'Morphology', 'focuses', 'components', 'within', 'word', '(', 'stems', ',', 'root', 'words', ',', 'prefixes', ',', 'suffixes', ',', 'etc', '.', ')', 'arranged', 'modified', 'create', 'different', 'meanings']
Lemmatized words: ['Morphology', 'study', 'internal', 'structure', 'word', '..', 'Morphology', 'focus', 'component', 'within', 'word', '(', 'stem', ',', 'root', 'word', ',', 'prefix', ',', 'suffix', ',', 'etc', '.', ')', 'arrange', 'modified', 'create', 'different', 'meaning']
Word:Morphology,POS Tag:NNP,WordNet POS Tag:n
Word:is,POS Tag:VBZ,WordNet POS Tag:v
Word:the,POS Tag:DT,WordNet POS Tag:n
Word:study,POS Tag:NN,WordNet POS Tag:n
Word:of,POS Tag:IN,WordNet POS Tag:n
Word:t

Articles are words like "a," "an," and "the." They are used to specify whether a noun is singular or plural, and whether it is definite or indefinite.

In [20]:
articles = ["a", "an", "the"]

def identify_articles(text):
    words = nltk.word_tokenize(text)
    articles_in_text = [word for word in words if word.lower() in articles]
    return articles_in_text

example_text = "The quick brown fox jumps over the lazy dog."
articles_found = identify_articles(example_text)
print("Articles found:", articles_found)  # Output: ['The', 'the']

Articles found: ['The', 'the']


Prepositions are words that show the relationship between a noun or pronoun and another word in the sentence. Examples include "in," "on," "at," "above," "below," "between," etc.

In [21]:
import nltk
from nltk import pos_tag, word_tokenize

def identify_prepositions(text):
    words = word_tokenize(text)
    tagged_words = pos_tag(words)
    prepositions_in_text = [word for word, pos in tagged_words if pos == 'IN']
    return prepositions_in_text

example_text = "The book is on the table."
prepositions_found = identify_prepositions(example_text)
print("Prepositions found:", prepositions_found)  # Output: ['on']

Prepositions found: ['on']


Conjunctions are words that connect words, phrases, or clauses. Examples include "and," "but," "or," "so," "because," etc.

In [22]:
def identify_conjunctions(text):
    words = word_tokenize(text)
    tagged_words = pos_tag(words)
    conjunctions_in_text = [word for word, pos in tagged_words if pos in ['CC', 'IN']]  # CC for coordinating, IN for subordinating
    return conjunctions_in_text

example_text = "I like apples and oranges, but I don't like bananas."
conjunctions_found = identify_conjunctions(example_text)
print("Conjunctions found:", conjunctions_found)  # Output: ['and', 'but']

Conjunctions found: ['and', 'but']


Interjections are words or phrases that express strong emotions or sudden feelings. Examples include "Oh," "Wow," "Alas," "Ouch," etc.

In [23]:
import re

def identify_interjections(text):
    pattern = r"\b(?:oh|wow|alas|ouch|etc)\b"  # Add more interjections as needed
    interjections_in_text = re.findall(pattern, text, re.IGNORECASE)
    return interjections_in_text

example_text = "Wow! That's amazing!"
interjections_found = identify_interjections(example_text)
print("Interjections found:", interjections_found)  # Output: ['Wow']

Interjections found: ['Wow']


all types


In [24]:
import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk.corpus import stopwords
import re

nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('stopwords')

def identify_articles(text):
    words = nltk.word_tokenize(text)
    articles_in_text = [word for word in words if word.lower() in ["a", "an", "the"]]
    return articles_in_text

def identify_prepositions(text):
    words = word_tokenize(text)
    tagged_words = pos_tag(words)
    prepositions_in_text = [word for word, pos in tagged_words if pos == 'IN']
    return prepositions_in_text

def identify_conjunctions(text):
    words = word_tokenize(text)
    tagged_words = pos_tag(words)
    conjunctions_in_text = [word for word, pos in tagged_words if pos in ['CC', 'IN']]
    return conjunctions_in_text

def identify_interjections(text):
    pattern = r"\b(?:oh|wow|alas|ouch|etc)\b"  # Add more interjections as needed
    interjections_in_text = re.findall(pattern, text, re.IGNORECASE)
    return interjections_in_text

if __name__ == "__main__":
    text = input("Enter a text: ")

    articles_found = identify_articles(text)
    print("Articles found:", articles_found)

    prepositions_found = identify_prepositions(text)
    print("Prepositions found:", prepositions_found)

    conjunctions_found = identify_conjunctions(text)
    print("Conjunctions found:", conjunctions_found)

    interjections_found = identify_interjections(text)
    print("Interjections found:", interjections_found)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Enter a text: wow wha abueaty?
Articles found: []
Prepositions found: []
Conjunctions found: []
Interjections found: ['wow']
