# <center>Basics for Natural Learning Processing (NLP)<center>

Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between humans and computers using natural language. It empowers machines to understand, interpret, and generate human language, enabling applications like speech recognition, machine translation, sentiment analysis, and more. In this notebook, we will delve into the basics of NLP and provide practical code examples to help beginners grasp the fundamental concepts.

# 1. Tokenization

Tokenization is the process of breaking down a sentence or paragraph into smaller units called tokens. These tokens can be words, phrases, or characters. In Python, the NLTK library provides essential functions for tokenization:

In [1]:
import nltk

In [2]:
from nltk.tokenize import word_tokenize

text = "Natural Language Processing is fascinating!"
tokens = word_tokenize(text)
print(tokens)

['Natural', 'Language', 'Processing', 'is', 'fascinating', '!']


# 2. Stop words

Stop words are common words like "the," "is," "and," which do not carry much meaning in the context of NLP. They are often removed from text to improve processing efficiency and focus on essential words:

In [3]:
# nltk.download('stopwords')

In [4]:
from nltk.corpus import stopwords

stop_words = set(stopwords.words("english"))
filtered_tokens = [word for word in tokens if word.lower() not in stop_words]
print(filtered_tokens)


['Natural', 'Language', 'Processing', 'fascinating', '!']


# 3. Stemming and lemmatization

Stemming and lemmatization are techniques used to reduce words to their root or base form. They help in reducing vocabulary size and normalizing text:

In [5]:
# nltk.download('wordnet')

In [6]:
from nltk.stem import PorterStemmer, WordNetLemmatizer

stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

stemmed_words = [stemmer.stem(word) for word in filtered_tokens]
lemmatized_words = [lemmatizer.lemmatize(word) for word in filtered_tokens]

print("Stemmed:", stemmed_words)
print("Lemmatized:", lemmatized_words)


Stemmed: ['natur', 'languag', 'process', 'fascin', '!']
Lemmatized: ['Natural', 'Language', 'Processing', 'fascinating', '!']


# 4. Part-of-Speech (POS) Tagging

POS tagging is the process of assigning grammatical tags to each word in a sentence, such as noun, verb, adjective, etc. The NLTK library provides functions for POS tagging:

In [19]:
from nltk import pos_tag

pos_tags = pos_tag(filtered_tokens)
print(pos_tags)

[('Natural', 'JJ'), ('Language', 'NNP'), ('Processing', 'NNP'), ('fascinating', 'NN'), ('!', '.')]


In [20]:
# spacy reults are more accurate here.

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Natural Language Processing is fascinating!")

for token in doc:
    print(token.text, token.pos_)


Natural PROPN
Language PROPN
Processing PROPN
is AUX
fascinating ADJ
! PUNCT


# 5. Named Entity Recognition (NER)

NER is the task of identifying named entities in a text, such as names of people, organizations, locations, etc. The spaCy library provides an efficient way to perform NER:

In [14]:
import spacy

text = "Natural Language Processing is fascinating!"
nlp = spacy.load("en_core_web_lg")
doc = nlp(text)

for entity in doc.ents:
    print(entity.text, entity.label_)


Natural Language Processing ORG


In [17]:
text = "Andrew NG is brilliant!"
nlp = spacy.load("en_core_web_lg")
doc = nlp(text)

for entity in doc.ents:
    print(entity.text, entity.label_)

Andrew NG PERSON


*End Note:*<br>
Natural Language Processing is a captivating field that empowers machines to understand and work with human language. In this article, we covered essential NLP concepts, including tokenization, stop words removal, stemming, lemmatization, POS tagging, and Named Entity Recognition. By leveraging the power of NLP, developers can build applications that analyze, interpret, and generate human language, revolutionizing how we interact with computers and technology. Happy coding and exploring the vast world of NLP