<a href="https://colab.research.google.com/github/Mel-iza/The-Natural-Language-Processing-Workshop/blob/main/NLP_Workshop_Chapter_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1> <b>The Natural Language Processing Workshop</b> </h1>
Rohan Chopra, Aniruddha M. Godbole, Nipun Sadvilkar et al, 2020. Packt Publishing.

### Chapter 1: Introduction to Natural Language Processing

Overview: In this chapter, you will learn the difference between **Natural Language Processing (NLP)** and basic text analysis. You will implement various preprocessing tasks such as tokenization, lemmatization, stemming, stop word removal, and more. By the end of this chapter, you will have a deep understanding of the various phases of an NLP project, from data collection to model deployment.

- **What natural language is?** <br>
It is a means for us to express our thoughts and ideas. To define it more specifically, language is a mutually agreed upon set of protocols involving words/sounds that we use to communicate with each other.

NLP can be defined as a field of computer science that is concerned with enabling compouter algorithms to understand, analyze, and generate natural languages.


NLP works at different leves, with means that machines process and understand natural language at different levels:
- **Morphological level**: this leval deals with understanding word structure and word information;
- **Lexical level**: This level deals with understanding the part od speech of the word. <i>(compreensão da parte gramatical da palavra)</i>
- **Syntatic level**: This level deals with understanding the syntatic analysis of a sentence, or parsing a sentence.
- **Semantic level**: This deals with understanding the actual meaning of a sentence.
- **Discourse level**: This level deals with understanding the meaning of a sentence beyound just the sentence level, that is, considering the context.
- **Pragmatic level**: This level deals with using real-world knowledge to understand sentence. 

**History of NLP** <br>

**NLP** = Artificial intelligence + linguistics + data science 

With the advancemnt of computing technologies and increased availability of data, NLP has undergone a huge change. Previously a traditional rule-based system was used for computations, in wich you had to explicitly write hardcoded rules. Today, compuations on natural language are being done using machine learning and deep learning techniques.

<i>O exemplo utilizado para ilustrar como os sistemas de NLP eram utilizados de acordo com regras, foi baseado em um projeto de extrair nomes de políticos de um jornal. Se quisermos pegar esses nomes, antes teríamos que elaborar todas as regras para eles, como por exemplo, qual seria a estrutura sintática de um nome próprio - um nome próprio precisaria começar sempre com letra maiúscula -  e assim por diante.

Muito embora esse sistema baseado em regras não trouxesse um desempenho computacional, foi utilizado por bastante tempo.</i>

<b>Text Analytics and NLP </b>

<b>Text analytics</b>  is the method of extracting meaningful insights and answering questions from text data ➡ (<i>length of sentences, length os words, word count, and finding words from the text</i>)

↪ we are generating insights from text without getting into semantics of the language.

<b>NLP</b> on the oter hand, help us in understanding the semantics and the underlying meaning of text  ➡ (<i> sentiment of a sentence, top keywords in a text, parts pf speech for different words</i>)
 




* <b>Natural Language Understading (NLU) </b><br>
NLU refres to a process by wich an inanimate object with computing power is able to comprehend spoken language.

* <b>Natural Language Generatuion (NLG) </b><br>
NLG refers to a process by wich an inanimate object with computing power is able to comunicate with humans in a language that they can understand or is able to generate human-understandable text from a dataset.


Exercise 1.01: Basic Text Analytics

In [None]:
# 2. Assign a sentence variable the value 'the quick brown fox jumps over te lazy dog'

sentence = 'The quick brown fox jumps over the lazy dog'
sentence

'The quick brown fox jumps over the lazy dog'

In [None]:
# 3. Check if the word 'quick' belongs to sentence

def find_word(word, sentence):
  return word in sentence

find_word('quick', sentence)  

True

In [None]:
# 4. Find out the index value of the word 'fox'

def get_index(word, text):
  return text.index(word)

get_index('fox', sentence)  

16

In [None]:
# 5. Find out the rank of the word 'lazy'

get_index('lazy', sentence.split())

7

In [None]:
# 6. Print the third word of the given text

def get_word(text, rank):
  return text.split()[rank]

get_word(sentence, 2)  

'brown'

In [None]:
# 7. Print the third word of the given text in reverse order

get_word(sentence, 2)[::-1]

'nworb'

In [None]:
# 8. Concatenate the first and last word ofs of the given sentence

def concat_words(text):
  '''
  This method will concat first and last words of given text
  '''
  words = text.split()
  first_word = words[0]
  last_word = words[len(words)-1]
  return first_word + last_word

concat_words(sentence)  

'Thedog'

In [None]:
# 9. Print words at even positions

def get_even_positions_words(text):
  words = text.split()
  return [words[i] for i in range (len(words)) if i%2 == 0]

get_even_positions_words(sentence)

['The', 'brown', 'jumps', 'the', 'dog']

In [None]:
# 10. Print the last  three letters of the text

def get_last_n_letters(text, n):
  return text[-n:]

get_last_n_letters(sentence, 3)  

'dog'

In [None]:
# 11. To print the text in reverse order

def get_reverse(text):
  return text[::-1]

get_reverse(sentence)  

'god yzal eht revo spmuj xof nworb kciuq ehT'

In [None]:
# 12. To print each word of the given text in reverse order, maintaining their sequence

def get_word_reverse(text):
  words = text.split()
  return ' '.join([word[::-1] for word in words])

get_word_reverse(sentence)  

'ehT kciuq nworb xof spmuj revo eht yzal god'

<b>Various Tasks in NLP</b>

In [None]:
!pip install nltk

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


<b>Tokenization</b> <br>
<i>Tokenization refres to the procedure of splitting a sentence into its constituent parts - the words and punctuation that is it made up of. Such tokens are called <b>unigrams.</b></i>

<br>
Exercise 1.02: Tokenization of a simple sentence

In [None]:
from nltk import word_tokenize, download 

download(['punkt', 'averaged_perceptron_tagger', 'stopwords'])

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
def get_tokens(sentence):
  words = word_tokenize(sentence)
  return words

In [None]:
print(get_tokens('I am reading NLP Fundamentals.'))

['I', 'am', 'reading', 'NLP', 'Fundamentals', '.']


<b>PoS Tagging (Parts-of-Speech)</b><br>
<i>PoS Tagging refers to the process of tagging words within sentences with their respective PoS. We extract the PoS of tokens constituting a sentence so that we can filter out the PoS that are of interest and analyze them.</i>

```
DT = Determiner
NN = Noun, common, singular or mass
VBZ = Verb, present tense, third-person singular
JJ = Adjective
```

PoS tagging finds application in many NLP tasks, including word sense disambiguation, classification, Named Entity recognition(NER) and coreference resolution.

<br>
Exercise 1.03: PoS Tagging


In [None]:
from nltk import word_tokenize, pos_tag

def get_tokens(sentence):
  words = word_tokenize(sentence)
  return words

In [None]:
words = get_tokens('I am reading NLP Fundamentals')
print(words)

['I', 'am', 'reading', 'NLP', 'Fundamentals']


In [None]:
def get_pos(words):
  return pos_tag(words)

get_pos(words)  

[('I', 'PRP'),
 ('am', 'VBP'),
 ('reading', 'VBG'),
 ('NLP', 'NNP'),
 ('Fundamentals', 'NNS')]

<b>Stop Word Removal </b><br>
<i>Stop words are the most frequently occuring words in any language and they are just used to support the construction of sentences and do not contribute anything to the semantics of a sentence. So, we can remove stop words from any text before an NLP process, as they occur very frequently and their presence doesn't have much impact on the sense of a sentence.</i>


<br>
Exercise 1.04: Stop Word Removal

In [None]:
from nltk import download 
download('stopwords')

from nltk import word_tokenize
from nltk.corpus import stopwords

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [None]:
# Tem em portugues também!

stop_words = stopwords.words('portuguese')
print(stop_words)

['a', 'à', 'ao', 'aos', 'aquela', 'aquelas', 'aquele', 'aqueles', 'aquilo', 'as', 'às', 'até', 'com', 'como', 'da', 'das', 'de', 'dela', 'delas', 'dele', 'deles', 'depois', 'do', 'dos', 'e', 'é', 'ela', 'elas', 'ele', 'eles', 'em', 'entre', 'era', 'eram', 'éramos', 'essa', 'essas', 'esse', 'esses', 'esta', 'está', 'estamos', 'estão', 'estar', 'estas', 'estava', 'estavam', 'estávamos', 'este', 'esteja', 'estejam', 'estejamos', 'estes', 'esteve', 'estive', 'estivemos', 'estiver', 'estivera', 'estiveram', 'estivéramos', 'estiverem', 'estivermos', 'estivesse', 'estivessem', 'estivéssemos', 'estou', 'eu', 'foi', 'fomos', 'for', 'fora', 'foram', 'fôramos', 'forem', 'formos', 'fosse', 'fossem', 'fôssemos', 'fui', 'há', 'haja', 'hajam', 'hajamos', 'hão', 'havemos', 'haver', 'hei', 'houve', 'houvemos', 'houver', 'houvera', 'houverá', 'houveram', 'houvéramos', 'houverão', 'houverei', 'houverem', 'houveremos', 'houveria', 'houveriam', 'houveríamos', 'houvermos', 'houvesse', 'houvessem', 'houvésse

In [None]:
stop_words = stopwords.words('english')
print(stop_words)

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', '

To remove the stop words from a sentence, we first assign a string to the sentence variable and tokenize it into words using <b>word_tokenize()</b> method.

In [None]:
sentence = 'I am learning Python. It is one of the '\
            'most popular programming languages'

sentence_words = word_tokenize(sentence)            

In [None]:
print(sentence_words)

['I', 'am', 'learning', 'Python', '.', 'It', 'is', 'one', 'of', 'the', 'most', 'popular', 'programming', 'languages']


To remove the stop words, we need to loop through each word in the sentence, check whether there are any stop words, and then finally combine them to form a complete sentence.

In [None]:
def remove_stop_words(sentence_words, stop_words):
  return ' '.join([word for word in sentence_words if \
                   word not in stop_words])

In [None]:
print(remove_stop_words(sentence_words, stop_words))

I learning Python . It one popular programming languages


In [None]:
# Add your own stop words to the stop word list

stop_words.extend(['I', 'It', 'one'])
print(remove_stop_words(sentence_words, stop_words))

learning Python . popular programming languages


<b>Text Normalization </b><br>
<i>There are some words that are spelled, pronounce, and represented differently. Although they are different, they refer to the same thing. Text normalization is a process wherein different variations of text get converted into a standard form.</i>

In this exercise, we will normalize some given text. Basically, we will be trying to replace select words with new words, using the <b>replace()</b> function and finally produce the normalized text.<b>replace()</b>


<br>
Exercise 1.05: Text Normalization

In [None]:
sentence = 'I visited te US from the UK on 22-20-18'

We want to replace `US` with `United States`, `UK` with `United Kingdom` and `18` with `2018`.

In [None]:
def normalize(text):
  return text.replace('US', 'United States')\
             .replace('UK', 'United Kingdom')\
             .replace('-18', '-2018')

In [None]:
normalized_sentence = normalize(sentence)
print(normalized_sentence)

I visited te United States from the United Kingdom on 22-20-2018


In [None]:
normalized_sentence = normalize('US and UK are two superpowers')
print(normalized_sentence)

United States and United Kingdom are two superpowers


<b>Spelling Correction </b><br>
<i>Spelling Correction is one of the most important tasks in NLP project. Ic can be time-consuming, but without it, there are high chances of losing out important information.</i>

Spelling Correction is executed in two steps:
1. Identify the misspelled word
2. Replace it or suggest the correctly spelled word. There are a lot of algorithms for this task - one of them is the minimum edit distance, wich chooses the nearest correctly spelled word for a misspelled word.


We make use of the <b>autocorrect</b> Python library to correct spellings.

<br>
Exercise 1.06: Spelling Correction of a word and a Sentence

In [None]:
!pip install autocorrect

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting autocorrect
  Downloading autocorrect-2.6.1.tar.gz (622 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m622.8/622.8 KB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: autocorrect
  Building wheel for autocorrect (setup.py) ... [?25l[?25hdone
  Created wheel for autocorrect: filename=autocorrect-2.6.1-py3-none-any.whl size=622381 sha256=ade8de6270cf7f9646b7cefb33ac99483910e309da34cc583d3b5768a872186b
  Stored in directory: /root/.cache/pip/wheels/ab/0f/23/3c010c3fd877b962146e7765f9e9b08026cac8b035094c5750
Successfully built autocorrect
Installing collected packages: autocorrect
Successfully installed autocorrect-2.6.1


In [None]:
from nltk import word_tokenize
from autocorrect import Speller

In [None]:
# Também tem em portugues!

spell = Speller(lang='pt')
spell('asúcar')

'açúcar'

In [None]:
spell = Speller(lang='en')
spell('Natureal')

'Natural'

In [None]:
sentence = word_tokenize('Ntural Luanguage Processin deals with'\
                         'the art of extrctinh insightes from'\
                         'Natual Languaes')

In [None]:
print(sentence)

['Ntural', 'Luanguage', 'Processin', 'deals', 'withthe', 'art', 'of', 'extrctinh', 'insightes', 'fromNatual', 'Languaes']


In [None]:
def correct_spelling(tokens):
  sentence_corrected = ' '.join([spell(word)\
                                 for word in tokens])
  return sentence_corrected

In [None]:
print(correct_spelling(sentence))

Natural Language Processing deals withthe art of extracting insights fromNatual Languages


Stemming <br>
Exercise 1.07: Using Stemming

<img src="https://user-images.githubusercontent.com/72058182/227818410-728b3eca-c849-46b4-ac04-aeeab98a1662.png">

In [None]:
from nltk import stem

In [None]:
def get_stems(word, stemmer):
  return stemmer.stem(word)

porterStem = stem.PorterStemmer()

In [None]:
get_stems('production', porterStem)

'product'

In [None]:
get_stems('coming', porterStem)

'come'

In [None]:
get_stems('firing', porterStem)

'fire'

In [None]:
# O stemmer também tem a língua portuguesa disponível

stemmer = stem.SnowballStemmer('portuguese')
get_stems('andando', stemmer)

'andand'

In [None]:
get_stems('corrigindo', stemmer)

'corrig'

In [None]:
# Testando em inglês como no exemplo

stemmer = stem.SnowballStemmer('english')
get_stems('battling', stemmer)

'battl'

Lemmatization <br>
Exercise 1.08: Extracting the base Word using Lemmatization

Algumas vezes o processo de stemização ou setamtização pode levar a alguns erros, como no exemplo, o resultado ser uma palavra que não existe como `battl`. Para isso existe a técnica de lematização, que é o processo de converter as palavras em sua forma gramatical básica. O exemplo citado da palavra `battling` seria processada como `battle`.

In [None]:
from nltk import download
download('wordnet')


from nltk.stem.wordnet import WordNetLemmatizer

[nltk_data] Downloading package wordnet to /root/nltk_data...


In [None]:
lemmatizer = WordNetLemmatizer()

In [None]:
def get_lemma(word):
  return lemmatizer.lemmatize(word)

get_lemma('products')  

'product'

In [None]:
get_lemma('production')  

'production'

Named Entity Recognition NER<br>
Exercise 1.09: Treating Named Entities

In [None]:
import nltk
from nltk import download 
from nltk import pos_tag 
from nltk import ne_chunk
from nltk import word_tokenize


download('maxent_ne_chunker')
download('words')

nltk.download(['punkt', 'averaged_perceptron_tagger'])

[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [None]:
sentence = 'We are reading a book published by Packt'\
           'wich is based out in Birmingham'

In [None]:
def get_ner(text):
  i = ne_chunk(pos_tag(word_tokenize(text)), binary=True)
  return [a for a in i if len(a)==1]

In [None]:
get_ner(sentence)

[Tree('NE', [('Packtwich', 'NNP')]), Tree('NE', [('Birmingham', 'NNP')])]

Word Sense Disambiguation <br>
Exercise 1.10: Word Sense Disabiguation

<img src="https://user-images.githubusercontent.com/72058182/227821273-1a3be90f-9af3-4e85-9d7c-41e230f1b6fe.png">

Palavras iguais em contextos diferentes podem ter significados diferentes e essa é uma das principais características da <b>ambiguidade</b>. <br>
Traduzido literalmente como <i>'desambiguação de sentido de palavra'</i>, esse processo consiste em mapear uma palavra de acordo com o sentido que ela deveria carregar. <br>
Nesse caso cada significado ambíguio é salvo em um synset de fundo (background synset).
1. Play: participar de um esporte ou jogo
2. Play: utilizar um instrumento musical

Aí encontraremos a similaridade entre o contexto da palavra `play` no texto e cada uma das definições precedentes. A definição que se sair melhor, ou que for mais similar, 'ganha' a definição da palavra.

In [None]:
import nltk

nltk.download('wordnet')
from nltk.wsd import lesk
from nltk import word_tokenize

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [None]:
sentence_1 = 'Keep your savings in the bank'
sentence_2 = 'Its so risk to drive over the banks of the road'

In [None]:
def get_synset(sentence, word):
  return lesk(word_tokenize(sentence), word)

Checando os sentidos da palavra 'bank' passando a sentença como parâmetro de avaliação.

In [None]:
get_synset(sentence_1, 'bank')

Synset('savings_bank.n.02')

In [None]:
get_synset(sentence_2, 'bank')

Synset('bank.v.07')

Após verificar, podemos observar que o lesk consegue identificar que existem contextos diferentes nessas duas sentenças - n.02 e v.07

Sentence Boundary Detection <br>
Exercise 1.11: Sentence boundary detection

Sentence boundary detection is the method of detecting where one sentence ends and another begins. 

In [None]:
import nltk 
from nltk.tokenize import sent_tokenize

In [None]:
def get_sentences(text):
  return sent_tokenize(text)

  
get_sentences('We are reading a book. Do you know who is '\
              'the publisher? It is Packt. Packt is based'\
              'out of Birmingham')  

['We are reading a book.',
 'Do you know who is the publisher?',
 'It is Packt.',
 'Packt is basedout of Birmingham']

In [None]:
get_sentences('Mr Donald John Trump is the curent'\
              'president of the USA. Before joining'\
              'politics, he was a businessman')

['Mr Donald John Trump is the curentpresident of the USA.',
 'Before joiningpolitics, he was a businessman']

Activity 1.01: Preprocessing of raw text

Follow these steps to implement this activity: <br>

1. Import the necessary libraries;
2. Load que text corpus to a variable;
3. Apply the tokenization process to the text corpus and print the first 20 tokens;
4. Apply spelling correction on each token and print the initial 20 corrected tokens as well as the corrected text corpus;
5. Apply PoS Tags to each of the corrected tokens and print them;
6. Remove stop words from the corrected token list and print the initial 20 tokens;
7. Apply stemming and lemmatization to the corrected list and then print the 20 initial 20 tokens
8. Detect sentence boundaries in the gigen text corpus and print the total number os sentences

In [2]:
# 1. Import the necessary libraries

import nltk

In [6]:
# 2. Load que text corpus to a variable

infile = open('file.txt', 'r')
data = infile.read()
data

'The reader of this course should have a basic knowledge of the Python programming lenguage.\nHe/she must have knowldge of data types in Python.He should be able to write functions,\n and also have the ability to import and use libraries and packages in Python. Familiarity\n with basic linguistics and probability is assumed although not required to fully\n complete this course.\n'

In [6]:
# 3. Apply the tokenization process to the text corpus and print the first 20 tokens



In [None]:
# 4. Apply spelling correction on each token and print the initial 20 corrected tokens as well as the corrected text corpus


In [None]:
# 5. Apply PoS Tags to each of the corrected tokens and print them


In [None]:
# 6. 

<h3><b>Kick Starting a NLP Project</b></h3><br>

We can divide a NLP project into several sub-projects or phases. These phases are completed in particular sequence. This tends to increase the overall efficiency of the process, as memory usage changes from one phase to the next. An NLP project has to go through six major phases, wich are outlined in the following figure:

<img src="https://user-images.githubusercontent.com/72058182/227945836-e48ce886-57b3-4af8-91ee-6c7897ccab9c.png">

Suppose you are working on a project in wich you need to classify e-mails as important and unimportant. We will explain how this is carried out by discussing each phase in detail.

<h3><b>Data Collection</b></h3>
This is the initial phase of any NLP project. Our sole purpose is to collect data as per our requirements. For this, we may either use existing data, collect data from various online repositories, or create our own dataset by crawling the web. In our case, we will collect different email data. We can even get this data from our personal emails as well, to start with.

<h3><b>Data Preprocessing</b></h3>
Once the data is collected, we need to clean it. For the process of cleaning, we will make use of the difference preprocesssing steps that we have learned about in this chapter. It is necessary to clean the collected data to ensure effectiveness and accuracy. In our case,  we will follow these preprocessing steps:

1. <i>Converting all the text data to lowercase</i>
2. <i>Stop word removal</i>
3. <i>Text normalization, wich will include replacing all numbers with some common term replacing punctuation with empty strings</i>
4. <i>Stemming and lemmatization</i>

<h3><b>Feature Extraction</b></h3>
Computers understand only binary digits: 0 and 1. As such, every instruction we feed into a computer gets transformed into binary digits. similarly, machine learning models tend to understand only numeric data. Therefore, it becomes necessary to convert text data into its equivalent numerical form.

To convert every email into its equivalent form, we will create a dictionary of all the unique words in our data and assing a unique index to each words. Then, we will represent every email with a list having a length equal to the number of unique words in the data. Ths list will have 1 at the indices of words that are present in the email and 0 ate the other indices. This is called one-hot-encoding. We will learn more about this in coming chapters.

<h3><b>Model Development</b></h3>
Once the features set is ready, we need to develop a suitable model that can be trained to gain knowledge from the data. These models are generally statistical, machine learning-based, deep learning-based, or reinforcement learning-based. In our case, we will build a model that is capable of differentiating between important and inumportant emails.

<h3><b>Model Assessment</b></h3>
After developing a model, it is essential to benchmark it. This process of benchmarking is known as model assessment. In this step, we will evaluate the performance of our model by comparing it to others. This can be done by using different parameters or metrics. These parameters include precision, recall, and accuracy. In our case, we will evaluate the newly created model by seeing how well it performs ate classifying emails as import and unimportant.

<h3><b>Model Deployment</b></h3>
This is the final stage and for most industrial NLP projects. In this stage, the models are put into production. They are either integrated into an existing system or new products are created by keeping this model as a base. In our case, we will deploy our model to production, so that can classify emails as important as unimportant in real time.