<a href="https://colab.research.google.com/github/badrinarayanan02/machine_learning/blob/main/2348507_NLPlab5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Performing Different Operations**


In [None]:
import nltk
import spacy
from nltk.corpus import wordnet
from nltk.stem import PorterStemmer, LancasterStemmer, SnowballStemmer,WordNetLemmatizer
from nltk.stem.snowball import GermanStemmer
from nltk.tokenize import word_tokenize,sent_tokenize
from nltk.chunk import ne_chunk
from nltk.tag import pos_tag
from nltk import RegexpParser
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('words')
nltk.download('maxent_ne_chunker')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!


True

1) Program to get Antonyms from WordNet

In [None]:
def getSynonymsAntonyms(word):
    synonyms = []
    antonyms = []

    synsets = wordnet.synsets(word)
    # Iterating over the synsets
    for synset in synsets:
        for lemma in synset.lemmas():
            synonyms.append(lemma.name())
            # checking if the lemma has antonyms
            if lemma.antonyms():
                antonyms.append(lemma.antonyms()[0].name())
    return synonyms, antonyms

print("Synonyms and Antonyms of the given words")
print("----------------------------------------")

words = ["happy","good", "selfish", "cloudy", "true"]
for word in words:
    synonyms, antonyms = getSynonymsAntonyms(word)
    print("Synonyms for", word, ":", synonyms)
    print("Antonyms for", word, ":", antonyms)

Synonyms and Antonyms of the given words
----------------------------------------
Synonyms for happy : ['happy', 'felicitous', 'happy', 'glad', 'happy', 'happy', 'well-chosen']
Antonyms for happy : ['unhappy']
Synonyms for good : ['good', 'good', 'goodness', 'good', 'goodness', 'commodity', 'trade_good', 'good', 'good', 'full', 'good', 'good', 'estimable', 'good', 'honorable', 'respectable', 'beneficial', 'good', 'good', 'good', 'just', 'upright', 'adept', 'expert', 'good', 'practiced', 'proficient', 'skillful', 'skilful', 'good', 'dear', 'good', 'near', 'dependable', 'good', 'safe', 'secure', 'good', 'right', 'ripe', 'good', 'well', 'effective', 'good', 'in_effect', 'in_force', 'good', 'good', 'serious', 'good', 'sound', 'good', 'salutary', 'good', 'honest', 'good', 'undecomposed', 'unspoiled', 'unspoilt', 'good', 'well', 'good', 'thoroughly', 'soundly', 'good']
Antonyms for good : ['evil', 'evilness', 'bad', 'badness', 'bad', 'evil', 'ill']
Synonyms for selfish : ['selfish']
Antonyms

**Inference:** Thus the program to get the antonyms from WordNet has been done successfully.

2) Program for stemming Non-English words

In [None]:
germanSt = GermanStemmer()

In [None]:
token = ["Danke","geschrieben","Kuchen","katze"]

In [None]:
stemWords = [germanSt.stem(words) for words in token]
print(stemWords)

['dank', 'geschrieb', 'kuch', 'katz']


**Inference:** The german words are tokenized. Meanings -> (danke - Thankyou, geschrieben - written, kuchen - Cake, katze - Cat).

3) Program for lemmatizing words using WordNet

In [None]:
porterStemmer = PorterStemmer()
lancasterStemmer = LancasterStemmer()
snowballStemmer = SnowballStemmer("english")

# function to stem a word
def stem_word(word):
    porter_stemmed_word = porterStemmer.stem(word)
    lancaster_stemmed_word = lancasterStemmer.stem(word)
    snowball_stemmed_word = snowballStemmer.stem(word)

    print(word, "-> Porter:", porter_stemmed_word,
          "-> Lancaster:", lancaster_stemmed_word,
          "-> Snowball:", snowball_stemmed_word)

words = ["cats", "dogs", "shipping", "chair", "man"]

for word in words:
    stem_word(word)


cats -> Porter: cat -> Lancaster: cat -> Snowball: cat
dogs -> Porter: dog -> Lancaster: dog -> Snowball: dog
shipping -> Porter: ship -> Lancaster: ship -> Snowball: ship
chair -> Porter: chair -> Lancaster: chair -> Snowball: chair
man -> Porter: man -> Lancaster: man -> Snowball: man


**Inference:** Thus the given words in the list has been lemmatized by using different lemmatizing techniques.

4) Program to differentiate stemming and lemmatizing words


In [None]:
input_text = "Running ducks and swimming geese are better than flying birds"

words = word_tokenize(input_text)

# stemming
stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in words]
print("Stemmed words:", stemmed_words)

# lemmatization
lemmatizer = WordNetLemmatizer()
lemmatized_words = [lemmatizer.lemmatize(word, pos='v') for word in words]
print("Lemmatized words:", lemmatized_words)

Stemmed words: ['run', 'duck', 'and', 'swim', 'gees', 'are', 'better', 'than', 'fli', 'bird']
Lemmatized words: ['Running', 'duck', 'and', 'swim', 'geese', 'be', 'better', 'than', 'fly', 'bird']


**Inference:** Here we can interpret the differences between stemming and lemmatizing

5) Program for PoS Tagging

In [None]:
input_text = "My Spirituality guru is Gaur Gopal Das"

words = word_tokenize(input_text)

# applying POS tagging
pos_tags = pos_tag(words)

print(pos_tags)


[('My', 'PRP$'), ('Spirituality', 'NNP'), ('guru', 'NN'), ('is', 'VBZ'), ('Gaur', 'NNP'), ('Gopal', 'NNP'), ('Das', 'NNP')]


**Inference:** The part of speech tagging is implemented for the given corpus

6) Program to Identify Named Entity Recognition

In [None]:
corpus = "Barack Obama was the 44th President of the United States. He was born in Hawaii."

In [None]:
words = word_tokenize(corpus)

pos_tags = pos_tag(words)

named_entities = ne_chunk(pos_tags)

for entity in named_entities:
    if hasattr(entity, 'label'):
        print(' '.join(c[0] for c in entity), '=>', entity.label())

Barack => PERSON
Obama => PERSON
United States => GPE
Hawaii => GPE


**Inference:** For the given corpus the named entity recognition is obtained. (Barack, Obama, -> Person), (United States, Hawaii -> Geographical Entity).

7) Implementing Dependency Parsing and Constituency Parsing

In [None]:
nlp = spacy.load("en_core_web_sm")

sentence = "The quick brown fox jumps over the lazy dog."

doc = nlp(sentence)
print("\nDependency Parsing with spaCy:")
for token in doc:
    print(token.text, "-->", token.dep_, "-->", token.head.text)

print("\nConstituency Parsing with NLTK:")

tokens = word_tokenize(sentence)

pos_tags = pos_tag(tokens)

grammar = r"""
    NP: {<DT|JJ|NN.*>+}
"""

chunk_parser = RegexpParser(grammar)

parsed_sentence = chunk_parser.parse(pos_tags)
print(parsed_sentence)


Dependency Parsing with spaCy:
The --> det --> fox
quick --> amod --> fox
brown --> amod --> fox
fox --> nsubj --> jumps
jumps --> ROOT --> jumps
over --> prep --> jumps
the --> det --> dog
lazy --> amod --> dog
dog --> pobj --> over
. --> punct --> jumps

Constituency Parsing with NLTK:
(S
  (NP The/DT quick/JJ brown/NN fox/NN)
  jumps/VBZ
  over/IN
  (NP the/DT lazy/JJ dog/NN)
  ./.)


**Inference:** For a given corpus, dependency parsing and constituency parsing has been implemented sucessfully.

**Conclusion**

Thus the given task to perform different operations has been executed successfully.