# Text Preprocessing 

Text preprocessing method which is used to transform the text data into clean and formatted which NLP models can easily use.

Lowercasing,
Removing punctuations,
stemming / lemmatization.

In [1]:
import pandas as pd

In [2]:
reviews = [
    "Absolutely loved this product 😍! Works like a charm ✨.",
    "Great customer service 👏 and fast delivery 🚚💨!",
    "I'm so happy with my purchase 😄👍. Will buy again!",
    "The design is sleek and modern 🖤👌 #aesthetic",
    "Exceeded my expectations 💯🔥 worth every penny!",
    "Worst experience ever 😡. Totally disappointed 😞",
    "Item arrived broken 💔📦. No response from support 🤷",
    "Not what I expected 👎 Too expensive for the quality 💸",
    "Late delivery and bad packaging 📦❌. Never again!",
    "Product looks good but doesn’t work at all 😤🚫",
    "Love the features 😍 but the battery dies too fast 😒🔋",
    "Nice interface 👌 but app crashes often 😕💥",
    "Support team was helpful 🙌 but issue still unresolved 😐",
    "Quality is okay 🤷‍♂️. Not amazing, but not terrible either 🤔"
]
df_reviews = pd.DataFrame(reviews, columns= ['review'])
df_reviews['reviews_lowered'] = df_reviews['review'].str.lower()
df_reviews


Unnamed: 0,review,reviews_lowered
0,Absolutely loved this product 😍! Works like a ...,absolutely loved this product 😍! works like a ...
1,Great customer service 👏 and fast delivery 🚚💨!,great customer service 👏 and fast delivery 🚚💨!
2,I'm so happy with my purchase 😄👍. Will buy again!,i'm so happy with my purchase 😄👍. will buy again!
3,The design is sleek and modern 🖤👌 #aesthetic,the design is sleek and modern 🖤👌 #aesthetic
4,Exceeded my expectations 💯🔥 worth every penny!,exceeded my expectations 💯🔥 worth every penny!
5,Worst experience ever 😡. Totally disappointed 😞,worst experience ever 😡. totally disappointed 😞
6,Item arrived broken 💔📦. No response from suppo...,item arrived broken 💔📦. no response from suppo...
7,Not what I expected 👎 Too expensive for the qu...,not what i expected 👎 too expensive for the qu...
8,Late delivery and bad packaging 📦❌. Never again!,late delivery and bad packaging 📦❌. never again!
9,Product looks good but doesn’t work at all 😤🚫,product looks good but doesn’t work at all 😤🚫


Removing punctuations 

In [3]:
import re
df_reviews['review_punc'] = df_reviews['reviews_lowered'].apply(lambda x:re.sub(r'[^\w\s]', '', x) )
df_reviews

Unnamed: 0,review,reviews_lowered,review_punc
0,Absolutely loved this product 😍! Works like a ...,absolutely loved this product 😍! works like a ...,absolutely loved this product works like a ch...
1,Great customer service 👏 and fast delivery 🚚💨!,great customer service 👏 and fast delivery 🚚💨!,great customer service and fast delivery
2,I'm so happy with my purchase 😄👍. Will buy again!,i'm so happy with my purchase 😄👍. will buy again!,im so happy with my purchase will buy again
3,The design is sleek and modern 🖤👌 #aesthetic,the design is sleek and modern 🖤👌 #aesthetic,the design is sleek and modern aesthetic
4,Exceeded my expectations 💯🔥 worth every penny!,exceeded my expectations 💯🔥 worth every penny!,exceeded my expectations worth every penny
5,Worst experience ever 😡. Totally disappointed 😞,worst experience ever 😡. totally disappointed 😞,worst experience ever totally disappointed
6,Item arrived broken 💔📦. No response from suppo...,item arrived broken 💔📦. no response from suppo...,item arrived broken no response from support
7,Not what I expected 👎 Too expensive for the qu...,not what i expected 👎 too expensive for the qu...,not what i expected too expensive for the qua...
8,Late delivery and bad packaging 📦❌. Never again!,late delivery and bad packaging 📦❌. never again!,late delivery and bad packaging never again
9,Product looks good but doesn’t work at all 😤🚫,product looks good but doesn’t work at all 😤🚫,product looks good but doesnt work at all


Removing stopwords

In [4]:
import nltk
from nltk.corpus import stopwords


nltk.download('stopwords')


stopword_set = set(stopwords.words('english'))


df_reviews["remove_stopwords"] = df_reviews['review_punc'].apply(
    lambda x: ' '.join(word for word in x.split() if word not in stopword_set)
)


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Cheru\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.


In [6]:
df_reviews

Unnamed: 0,review,reviews_lowered,review_punc,remove_stopwords
0,Absolutely loved this product 😍! Works like a ...,absolutely loved this product 😍! works like a ...,absolutely loved this product works like a ch...,absolutely loved product works like charm
1,Great customer service 👏 and fast delivery 🚚💨!,great customer service 👏 and fast delivery 🚚💨!,great customer service and fast delivery,great customer service fast delivery
2,I'm so happy with my purchase 😄👍. Will buy again!,i'm so happy with my purchase 😄👍. will buy again!,im so happy with my purchase will buy again,im happy purchase buy
3,The design is sleek and modern 🖤👌 #aesthetic,the design is sleek and modern 🖤👌 #aesthetic,the design is sleek and modern aesthetic,design sleek modern aesthetic
4,Exceeded my expectations 💯🔥 worth every penny!,exceeded my expectations 💯🔥 worth every penny!,exceeded my expectations worth every penny,exceeded expectations worth every penny
5,Worst experience ever 😡. Totally disappointed 😞,worst experience ever 😡. totally disappointed 😞,worst experience ever totally disappointed,worst experience ever totally disappointed
6,Item arrived broken 💔📦. No response from suppo...,item arrived broken 💔📦. no response from suppo...,item arrived broken no response from support,item arrived broken response support
7,Not what I expected 👎 Too expensive for the qu...,not what i expected 👎 too expensive for the qu...,not what i expected too expensive for the qua...,expected expensive quality
8,Late delivery and bad packaging 📦❌. Never again!,late delivery and bad packaging 📦❌. never again!,late delivery and bad packaging never again,late delivery bad packaging never
9,Product looks good but doesn’t work at all 😤🚫,product looks good but doesn’t work at all 😤🚫,product looks good but doesnt work at all,product looks good doesnt work


In [7]:
from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')
lemmatizer  = WordNetLemmatizer()
df_reviews['lemmatise'] = df_reviews['remove_stopwords'].apply(lambda x :[lemmatizer.lemmatize(word) for word in x.split()])

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Cheru\AppData\Roaming\nltk_data...


In [8]:
df_reviews

Unnamed: 0,review,reviews_lowered,review_punc,remove_stopwords,lemmatise
0,Absolutely loved this product 😍! Works like a ...,absolutely loved this product 😍! works like a ...,absolutely loved this product works like a ch...,absolutely loved product works like charm,"[absolutely, loved, product, work, like, charm]"
1,Great customer service 👏 and fast delivery 🚚💨!,great customer service 👏 and fast delivery 🚚💨!,great customer service and fast delivery,great customer service fast delivery,"[great, customer, service, fast, delivery]"
2,I'm so happy with my purchase 😄👍. Will buy again!,i'm so happy with my purchase 😄👍. will buy again!,im so happy with my purchase will buy again,im happy purchase buy,"[im, happy, purchase, buy]"
3,The design is sleek and modern 🖤👌 #aesthetic,the design is sleek and modern 🖤👌 #aesthetic,the design is sleek and modern aesthetic,design sleek modern aesthetic,"[design, sleek, modern, aesthetic]"
4,Exceeded my expectations 💯🔥 worth every penny!,exceeded my expectations 💯🔥 worth every penny!,exceeded my expectations worth every penny,exceeded expectations worth every penny,"[exceeded, expectation, worth, every, penny]"
5,Worst experience ever 😡. Totally disappointed 😞,worst experience ever 😡. totally disappointed 😞,worst experience ever totally disappointed,worst experience ever totally disappointed,"[worst, experience, ever, totally, disappointed]"
6,Item arrived broken 💔📦. No response from suppo...,item arrived broken 💔📦. no response from suppo...,item arrived broken no response from support,item arrived broken response support,"[item, arrived, broken, response, support]"
7,Not what I expected 👎 Too expensive for the qu...,not what i expected 👎 too expensive for the qu...,not what i expected too expensive for the qua...,expected expensive quality,"[expected, expensive, quality]"
8,Late delivery and bad packaging 📦❌. Never again!,late delivery and bad packaging 📦❌. never again!,late delivery and bad packaging never again,late delivery bad packaging never,"[late, delivery, bad, packaging, never]"
9,Product looks good but doesn’t work at all 😤🚫,product looks good but doesn’t work at all 😤🚫,product looks good but doesnt work at all,product looks good doesnt work,"[product, look, good, doesnt, work]"
