Explanation:
Natural Language Processing (NLP) is a field of AI that focuses on the interaction between computers and human language. It involves the development of algorithms that can understand, interpret, and generate human language. Key concepts include:

Tokenization: Splitting text into words or subwords.

Stemming and Lemmatization: Reducing words to their base or root form.

Stopwords: Commonly used words (e.g., "and", "the") that are often removed in NLP tasks.

Bag of Words: A representation of text where the frequency of each word is considered.

TF-IDF: Term Frequency-Inverse Document Frequency, a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents.

Word Embeddings: Dense vector representations of words that capture their semantic meaning, such as Word2Vec or GloVe.

In [1]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

# Sample text
text = "Natural Language Processing with Python is fun and exciting!"

# Tokenization
tokens = word_tokenize(text.lower())
print("Tokens:", tokens)

# Remove stopwords
filtered_tokens = [word for word in tokens if word not in stopwords.words('english')]
print("Filtered Tokens:", filtered_tokens)

# Lemmatization
lemmatizer = WordNetLemmatizer()
lemmatized_tokens = [lemmatizer.lemmatize(word) for word in filtered_tokens]
print("Lemmatized Tokens:", lemmatized_tokens)

# TF-IDF Vectorization
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform([" ".join(lemmatized_tokens)])
print("TF-IDF Matrix:\n", tfidf_matrix.toarray())

# Feature Names
print("Feature Names:", tfidf_vectorizer.get_feature_names_out())


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\iShop\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\iShop\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\iShop\AppData\Roaming\nltk_data...


Tokens: ['natural', 'language', 'processing', 'with', 'python', 'is', 'fun', 'and', 'exciting', '!']
Filtered Tokens: ['natural', 'language', 'processing', 'python', 'fun', 'exciting', '!']
Lemmatized Tokens: ['natural', 'language', 'processing', 'python', 'fun', 'exciting', '!']
TF-IDF Matrix:
 [[0.40824829 0.40824829 0.40824829 0.40824829 0.40824829 0.40824829]]
Feature Names: ['exciting' 'fun' 'language' 'natural' 'processing' 'python']
