# <b> NLP : Intro To Lemmatization

> Description:
  * Lemmatization is the process of transforming words to their base or dictionary form, known as a lemma. It's a common technique used in natural language processing (NLP) to group together inflected or variant forms of a word. For example, the lemma of the word "ran" is "run", and the lemma of "am, is, are" is "be". Lemmatization can help reduce the complexity of text analysis and improve accuracy by ensuring that words with similar meanings are treated as the same word.





In [1]:
# importing Neccessary Libraries

import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords

In [17]:
# Particular nltk sub_libraries we will be using to perform Lemmatization.

nltk.download('punkt')  
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

In [18]:
# Creating a paragraph on which we will be performing Stemming. (We are using the same paragraph we used in Stemming) 

paragraph  = '''The Cosmic Microwave Background (CMB) is a form of electromagnetic radiation that pervades the entire universe. It is thought to be the afterglow of the Big Bang, the event that marks the beginning of the universe as we know it.

The CMB was first discovered in 1964 by two radio astronomers, Arno Penzias and Robert Wilson, who were working at Bell Labs in New Jersey. They were using a large horn-shaped antenna to study radio waves emitted by the Milky Way, but they kept detecting a mysterious signal that seemed to be coming from all directions in the sky. After ruling out a number of possible explanations, they realized that they had stumbled upon the CMB.

The CMB is incredibly faint, with a temperature of just 2.7 Kelvin (-270.45 degrees Celsius). However, it is remarkably uniform across the entire sky, with temperature variations of just a few parts in 100,000. These tiny fluctuations are thought to be the result of slight density variations in the early universe, which were stretched out by cosmic expansion to form the large-scale structures we see today, such as galaxies and clusters of galaxies.

Studying the CMB has been crucial to our understanding of the universe and its evolution. It has provided strong evidence for the Big Bang theory, as well as for the existence of dark matter and dark energy. It has also allowed astronomers to measure the age, size, and composition of the universe with unprecedented accuracy.

In recent years, the study of the CMB has entered a new era, with a number of high-precision experiments, such as the Planck satellite and the Atacama Cosmology Telescope, providing even more detailed maps of the CMB and shedding light on some of the universe's deepest mysteries.
'''

In [19]:

# Perfoerming Sentence Tokenization: 

sentences = nltk.sent_tokenize(paragraph)
print (len(sentences))

12


In [20]:
# Perform Lemmatization :
lemmatizer = WordNetLemmatizer()
for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words =[lemmatizer.lemmatize(word) for word in words if word not in set(stopwords.words('english'))] 
    sentences[i]  = ' '.join(words)

In [21]:
sentences

['The Cosmic Microwave Background ( CMB ) form electromagnetic radiation pervades entire universe .',
 'It thought afterglow Big Bang , event mark beginning universe know .',
 'The CMB first discovered 1964 two radio astronomer , Arno Penzias Robert Wilson , working Bell Labs New Jersey .',
 'They using large horn-shaped antenna study radio wave emitted Milky Way , kept detecting mysterious signal seemed coming direction sky .',
 'After ruling number possible explanation , realized stumbled upon CMB .',
 'The CMB incredibly faint , temperature 2.7 Kelvin ( -270.45 degree Celsius ) .',
 'However , remarkably uniform across entire sky , temperature variation part 100,000 .',
 'These tiny fluctuation thought result slight density variation early universe , stretched cosmic expansion form large-scale structure see today , galaxy cluster galaxy .',
 'Studying CMB crucial understanding universe evolution .',
 'It provided strong evidence Big Bang theory , well existence dark matter dark ener

# <b> 
> Here in Lemmatization, we can observe that the words are now making sence and have meanings. 