Natural Language Processing (NLP) is a branch of artificial intelligence that helps computers understand, interpret, and manipulate human language. Let's dive into the fundamental steps of NLP with practical code examples using Python libraries like NLTK, spaCy, and Transformers.

### Step 1: Text Preprocessing
Preprocessing is the foundation of NLP. It involves cleaning and preparing the raw text for analysis. Common preprocessing steps include tokenization, removing stop words, stemming, and lemmatization.

#### 1.1 Tokenization
Tokenization is breaking text into words or sentences.

In [1]:
import nltk

In [2]:
from nltk.tokenize import word_tokenize

text = "Natural Language Processing with Python is exciting!"

tokens = word_tokenize(text)

In [3]:
tokens

['Natural', 'Language', 'Processing', 'with', 'Python', 'is', 'exciting', '!']

#### 1.2 Removing Stop Words
Stop words (like "is", "the", "a") are common words that don't contribute much meaning.

In [4]:
from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))

filtered_words = [word for word in tokens if word.lower() not in stop_words]

In [5]:
filtered_words

['Natural', 'Language', 'Processing', 'Python', 'exciting', '!']

#### 1.3 Stemming and Lemmatization
Stemming reduces words to their root form, while lemmatization returns the base form of a word.m

#### Stemming

In [6]:
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in filtered_words]

stemmed_words

['natur', 'languag', 'process', 'python', 'excit', '!']

#### Lemmatization

In [7]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()
lemmatize_words = [lemmatizer.lemmatize(word) for word in filtered_words]
lemmatize_words

['Natural', 'Language', 'Processing', 'Python', 'exciting', '!']