### Problem Statement:
#### You are given a set of customer reviews. Your task is to perform the following:
##### - Tokenize the text into words.
##### - Remove stopwords.
##### - Perform lemmatization.
##### - Identify the most common words used across all reviews.
### Dataset (Sample Reviews):
#### pythonCopyEditreviews = [
    "The product quality is amazing! I love it.",
    "Absolutely terrible experience. The service was bad.",
    "Great product and fast delivery. Highly recommended!",
    "The product was okay, but I expected better quality.",
    "Worst experience ever. Never buying again."
]
 

In [3]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from collections import Counter
import string

In [5]:
# Sample dataset
reviews = [
    "The product quality is amazing! I love it.",
    "Absolutely terrible experience. The service was bad.",
    "Great product and fast delivery. Highly recommended!",
    "The product was okay, but I expected better quality.",
    "Worst experience ever. Never buying again."
]

In [13]:
# Download necessary NLTK resources
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt_tab')


[nltk_data] Downloading package punkt to
[nltk_data]     /Users/nabinagahatraj/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/nabinagahatraj/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/nabinagahatraj/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/nabinagahatraj/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [9]:
# Initialize tools
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

In [15]:
# Process reviews
tokens = []
for review in reviews:
    words = word_tokenize(review.lower())     # Tokenization & Lowercasing
    words = [word for word in words if word.isalnum()]       # Remove punctuation
    words = [word for word in words if word not in stop_words]     # Remove stopwords
    words = [lemmatizer.lemmatize(word) for word in words]    # Lemmatization
    tokens.extend(words)

In [17]:
# Identify most common words
word_counts = Counter(tokens)
most_common_words = word_counts.most_common(5)  # Get top 5 most common words

# Output results
print("Most common words:", most_common_words)

Most common words: [('product', 3), ('quality', 2), ('experience', 2), ('amazing', 1), ('love', 1)]


### So the most common words used across all reviews were 
#### - Product
#### - Quality
#### - Experience
#### - Amazing
#### - Love
