# Stemming using NLTK library & Lemmatization using spacy library

### Stemming:
- It is a technique for normalizing words. It involves reducing words to their base form, also know as the root form.
- Stemming is also known as suffix stripping
- Stemming is a technique for reducing text dimensionality.
- Example: walks, walking, walked, are all derived from the root word walk. writes, writing, written, writer etc.. are all derived from the root word write.

In [2]:
import nltk

In [5]:
# Importing the PorterStemmer from nltk.stem library to perform stemming

from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
stemmer

<PorterStemmer>

In [6]:
words = ["eating", "eats", "eat", "ate", "adjustable", "rafting", "ability", "meeting"]

for word in words:
    print(word, "-", stemmer.stem(word))

eating - eat
eats - eat
eat - eat
ate - ate
adjustable - adjust
rafting - raft
ability - abil
meeting - meet


# Notes:
- eating, eats, eat have their root words as eat hence it is displayed.
- The root words for ate is also eat but we don't see that getting displayed here as with stemming we can only do suffix stripping we cannot convert the word from one form to another. The same logic applies to ability as well. The root word for ability is able be we don't see that getting displayed here. We see it has only done suffix stripping.
- for adjustable it is adjust, for rating it is raft and for meeting it is meet.

# Lemmatization in Spacy
- Spacy is a library similar to nltk used to perform Natural Language processing
- Lemmatization is a text pre-processing technique used in NLP to break down a word to its root meaning.
- It's a text normalization technique that groups different inflected forms of words into the root form.


![image.png](attachment:1ed9bffc-b58f-4817-b7bf-cacecdec26f6.png)

In [None]:
The display is impressively bright, despite the 60 Hz, which I thought might be a drawback coming from a 120 Hz Android device. 
Surprisingly, it feels the same. The battery life is solid, providing around 1.5 days for mild phone users. 
As a developer, with light usage for content consumption and social media, I get around 8 hours of screen on time with 25-30% battery left. 
It charges quickly to 50% and then trickle charges to 100%, adapting to usage patterns for faster charging. 
Initially, there were some battery drain and heating issues, but they resolved as the phone learned my usage patterns.

Now, the camera is impressive, especially in low light conditions. 
Despite lacking a dedicated long exposure mode, the default night option, averaging around 3 seconds, captures details even in the dark. 
Compared to the iPhone 14 and S21FE, the camera quality is top-notch. The auto portrait detection is a standout feature.
Believe me no one explain the real use of that auto portrait theoretically until you try it.

In terms of network, the dual 5G standby supports Vi and JIO 5G (eSim). Inside my house, I get around 500 Mbps, and outside easily surpass 1 Gbps. 
                             Call quality is excellent, with two introduced modes in the mic option available in the control center.

This is my personal opinion; I'll update if I find anything noteworthy.

# Installing SpaCy

In [15]:
# !pip install spacy
# !python -m spacy download en

In [17]:
import spacy
nlp = spacy.load("en_core_web_sm")

# doc = nlp("Mando talked for 3 hours although talking isn't his thing")
doc = nlp("eating eats eat ate adjustable rafting ability meeting better")
doc

eating eats eat ate adjustable rafting ability meeting better

In [18]:
for word in doc:
    print(word, "-", word.lemma_)

eating - eat
eats - eat
eat - eat
ate - eat
adjustable - adjustable
rafting - raft
ability - ability
meeting - meeting
better - well


# Notes:
- eating, eats, eat all have their root word as per grammer as eat. Hence eat is displayed.
- The past tense of eat is ate as per grammer. Hence for ate also it is displayed as eat.
- rafting the root word is raft hence it displayed. For ability, it is displayed as ability only.
- Meeting is displayed as meeting only. The reason here meeting is a noun hence it is left as is. If meeting would have been a verb then it would have displayed meet as the root word.
- The root word for better is well. Hence after lemmatization it better is converted to well.

In [None]:
In office today we had a lot of meetings. In each meeting, things related business expansion was discussed.
meeting - meet
meetings - meeting

# Parts of Speech Tagging

In [20]:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Elon flew to mars yesterday. He carried biryani masala with him")
doc

Elon flew to mars yesterday. He carried biryani masala with him

In [23]:
for word in doc:
    print(word, "-", word.pos_, "-", spacy.explain(word.pos_))

Elon - PROPN - proper noun
flew - VERB - verb
to - ADP - adposition
mars - NOUN - noun
yesterday - NOUN - noun
. - PUNCT - punctuation
He - PRON - pronoun
carried - VERB - verb
biryani - ADJ - adjective
masala - NOUN - noun
with - ADP - adposition
him - PRON - pronoun


In [24]:
doc = nlp("Wow! Dr. Strange made 265 million $ on the very first day")
doc

Wow! Dr. Strange made 265 million $ on the very first day

In [25]:
for word in doc:
    print(word, "-", word.pos_, "-", spacy.explain(word.pos_))

Wow - INTJ - interjection
! - PUNCT - punctuation
Dr. - PROPN - proper noun
Strange - PROPN - proper noun
made - VERB - verb
265 - NUM - numeral
million - NUM - numeral
$ - NUM - numeral
on - ADP - adposition
the - DET - determiner
very - ADV - adverb
first - ADJ - adjective
day - NOUN - noun


In [26]:
for word in doc:
    print(word, "-", word.pos_, "-", spacy.explain(word.pos_), "-", word.tag_, "-", spacy.explain(word.tag_))

Wow - INTJ - interjection - UH - interjection
! - PUNCT - punctuation - . - punctuation mark, sentence closer
Dr. - PROPN - proper noun - NNP - noun, proper singular
Strange - PROPN - proper noun - NNP - noun, proper singular
made - VERB - verb - VBD - verb, past tense
265 - NUM - numeral - CD - cardinal number
million - NUM - numeral - CD - cardinal number
$ - NUM - numeral - CD - cardinal number
on - ADP - adposition - IN - conjunction, subordinating or preposition
the - DET - determiner - DT - determiner
very - ADV - adverb - RB - adverb
first - ADJ - adjective - JJ - adjective (English), other noun-modifier (Chinese)
day - NOUN - noun - NN - noun, singular or mass


In [29]:
text = "Wow! Dr. Strange made 265 million $ on the very first day"

from nltk import pos_tag
pos_tag(text.split())

[('Wow!', 'NNP'),
 ('Dr.', 'NNP'),
 ('Strange', 'NNP'),
 ('made', 'VBD'),
 ('265', 'CD'),
 ('million', 'CD'),
 ('$', '$'),
 ('on', 'IN'),
 ('the', 'DT'),
 ('very', 'RB'),
 ('first', 'JJ'),
 ('day', 'NN')]

# Named Entity Recognition:

In [30]:
import spacy
nlp = spacy.load("en_core_web_sm")

In [31]:
doc = nlp("Tesla Inc is going to aquire Twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Tesla Inc | ORG | Companies, agencies, institutions, etc.
Twitter | PRODUCT | Objects, vehicles, foods, etc. (not services)
$45 billion | MONEY | Monetary values, including unit


In [44]:
doc = nlp("Bill Gates has leaded Microsoft very well. That's why we could see Microsoft Windows is still a huge success.")
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Bill Gates | PERSON | People, including fictional
Microsoft | ORG | Companies, agencies, institutions, etc.
Microsoft Windows | ORG | Companies, agencies, institutions, etc.


In [45]:
nlp.pipe_labels["ner"]

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']