# Lowercasing and Uppercasing Text

In [2]:
# Read the necessary dataset
import pandas as pd

df = pd.read_csv("C:/Users/ariji/OneDrive/Desktop/Data/reviews.csv")
df.head()

Unnamed: 0,review_id,text
0,txt145,The software had a steep learning curve at fir...
1,txt327,I'm really impressed with the user interface o...
2,txt209,The latest update to the software fixed severa...
3,txt825,I encountered a few glitches while using the s...
4,txt878,I was skeptical about trying the software init...


In [5]:
# Converting text to lowercase

df['text'] = df['text'].str.lower() 
print(df['text'])

0     the software had a steep learning curve at fir...
1     i'm really impressed with the user interface o...
2     the latest update to the software fixed severa...
3     i encountered a few glitches while using the s...
4     i was skeptical about trying the software init...
5     the analytics features have provided us with v...
6     i appreciate the regular updates that the soft...
7     i attended a training session for the software...
8     the software documentation could be more compr...
9     i've recommended the software to colleagues du...
10    the software integration with third-party plug...
11    i'm looking forward to the upcoming release of...
12    the user community is active and supportive, m...
13    i've been using the software for a while now, ...
14    the user interface could use some modernizatio...
15    i went for a run and the software did a good j...
Name: text, dtype: object


In [6]:
# Converting text to uppercase

df['review_id'] = df['review_id'].str.upper() 
print(df['review_id'])

0     TXT145
1     TXT327
2     TXT209
3     TXT825
4     TXT878
5     TXT933
6     TXT718
7     TXT316
8     TXT247
9     TXT515
10    TXT913
11    TXT341
12    TXT943
13    TXT688
14    TXT136
15    TXT137
Name: review_id, dtype: object


# Punctuation Removal

Punctuation removal is the process of removing punctuation marks from text data. Examples of such punctuation marks include periods (.), commas (,),
question marks (?), exclamation marks (!), colons (:), semicolons (;), quotation marks (“ ”), parentheses (()), brackets ([]), and hyphens and 
dashes (-, –, —). Removing such marks produces a text representation that’s less cluttered and more focused on the text’s main ideas, which can
improve efforts during data analysis and modeling.


In [7]:
df = pd.read_csv("C:/Users/ariji/OneDrive/Desktop/Data/reviews.csv")
df.head()

Unnamed: 0,review_id,text
0,txt145,The software had a steep learning curve at fir...
1,txt327,I'm really impressed with the user interface o...
2,txt209,The latest update to the software fixed severa...
3,txt825,I encountered a few glitches while using the s...
4,txt878,I was skeptical about trying the software init...


In [10]:
"""
We prepare a translation table using the str.maketrans() function from the string module. We set the translator translation table to remove all
punctuation marks from the input text. Here’s what each parameter in the str.maketrans() function does:

The first parameter is an empty string (''). This represents the characters that we want to replace. Since it’s an empty string, no characters will
be replaced with other characters.

The second parameter is also an empty string (''). This represents the characters that we want to remove. In this case, the characters from the
string.punctuation string (containing all punctuation characters) will be removed.

The third argument, string.punctuation, is a string containing all punctuation characters provided by the string module. It contains a collection 
of punctuation characters such as !"#$%&'()*+,-./:;<=>?@[\]^_{|}~.

We then use the translation table created in the previous line (translator) to remove the punctuation characters from the text by using the 
str.translate() method.

We return the cleaned text.

"""
import string

def remove_punctuation(text): 
    translator = str.maketrans('', '', string.punctuation) 
    text = text.translate(translator)
    return text 


In [11]:
df['text'] = df['text'].apply(remove_punctuation)
print(df['text'])

0     The software had a steep learning curve at fir...
1     Im really impressed with the user interface of...
2     The latest update to the software fixed severa...
3     I encountered a few glitches while using the s...
4     I was skeptical about trying the software init...
5     The analytics features have provided us with v...
6     I appreciate the regular updates that the soft...
7     I attended a training session for the software...
8     The software documentation could be more compr...
9     Ive recommended the software to colleagues due...
10    The software integration with thirdparty plugi...
11    Im looking forward to the upcoming release of ...
12    The user community is active and supportive ma...
13    Ive been using the software for a while now an...
14    The user interface could use some modernizatio...
15    I went for a run and the software did a good j...
Name: text, dtype: object


In [None]:
# Handling Special Characters

## 1. Removing special characters

In [12]:
import pandas as pd
from unidecode import unidecode
import re

In [13]:
df = pd.read_csv("C:/Users/ariji/OneDrive/Desktop/Data/reviews.csv")
df.head()

Unnamed: 0,review_id,text
0,txt145,The software had a steep learning curve at fir...
1,txt327,I'm really impressed with the user interface o...
2,txt209,The latest update to the software fixed severa...
3,txt825,I encountered a few glitches while using the s...
4,txt878,I was skeptical about trying the software init...


In [14]:
df['text'] = df['text'].apply(lambda x: re.sub(r'[^a-zA-Z0-9\s]', '', unidecode(x)))
print(df['text'])

0     The software had a steep learning curve at fir...
1     Im really impressed with the user interface of...
2     The latest update to the software fixed severa...
3     I encountered a few glitches while using the s...
4     I was skeptical about trying the software init...
5     The analytics features have provided us with v...
6     I appreciate the regular updates that the soft...
7     I attended a training session for the software...
8     The software documentation could be more compr...
9     Ive recommended the software to colleagues due...
10    The software integration with thirdparty plugi...
11    Im looking forward to the upcoming release of ...
12    The user community is active and supportive ma...
13    Ive been using the software for a while now an...
14    The user interface could use some modernizatio...
15    I went for a run and the software did a good j...
Name: text, dtype: object


## 2. Performing emoji to text conversion

In [16]:
! pip install emoji

Collecting emoji
  Downloading emoji-2.14.0-py3-none-any.whl.metadata (5.7 kB)
Downloading emoji-2.14.0-py3-none-any.whl (586 kB)
   ---------------------------------------- 0.0/586.9 kB ? eta -:--:--
   -- ------------------------------------ 30.7/586.9 kB 660.6 kB/s eta 0:00:01
   ----- ---------------------------------- 81.9/586.9 kB 1.2 MB/s eta 0:00:01
   ------------- -------------------------- 194.6/586.9 kB 1.5 MB/s eta 0:00:01
   ------------------ --------------------- 276.5/586.9 kB 1.7 MB/s eta 0:00:01
   ------------------------- -------------- 368.6/586.9 kB 1.8 MB/s eta 0:00:01
   --------------------------------- ------ 491.5/586.9 kB 1.8 MB/s eta 0:00:01
   ---------------------------------------- 586.9/586.9 kB 1.9 MB/s eta 0:00:00
Installing collected packages: emoji
Successfully installed emoji-2.14.0


In [17]:
import pandas as pd
import emoji

In [18]:
df = pd.read_csv("C:/Users/ariji/OneDrive/Desktop/Data/reviews.csv")
df.head()

Unnamed: 0,review_id,text
0,txt145,The software had a steep learning curve at fir...
1,txt327,I'm really impressed with the user interface o...
2,txt209,The latest update to the software fixed severa...
3,txt825,I encountered a few glitches while using the s...
4,txt878,I was skeptical about trying the software init...


In [19]:
def emoji_to_text(text):
    return emoji.demojize(text) 

In [20]:
df['text'] = df['text'].apply(emoji_to_text)
print(df['text'])

0     The software had a steep learning curve at fir...
1     I'm really impressed with the user interface o...
2     The latest update to the software fixed severa...
3     I encountered a few glitches while using the s...
4     I was skeptical about trying the software init...
5     The analytics features have provided us with v...
6     I appreciate the regular updates that the soft...
7     I attended a training session for the software...
8     The software documentation could be more compr...
9     I've recommended the software to colleagues du...
10    The software integration with third-party plug...
11    I'm looking forward to the upcoming release of...
12    The user community is active and supportive, m...
13    I've been using the software for a while now, ...
14    The user interface could use some modernizatio...
15    I went for a run and the software did a good j...
Name: text, dtype: object
