## Customer Sentiments Extraction - Logic Building
- ref: https://docs.python.org/3/library/re.html {Regex Documentation} - IMP
- To Try and get close to achieve just customer commnents
- Data used: Jan2022 - Jun2022 (Salesforce Custom Report)

---
---

In [45]:
from platform import python_version

In [46]:
#!pip install scikit-learn==0.24.2
#!pip install h5py
#!pip install typing-extensions
#!pip install wheel
python_version()

'3.9.7'

In [47]:
#-- Imports

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import re
import io

from tqdm import tqdm # To get progress bar for every process # To Progress Apply a User Defined Function, To each and every Record multiple times
tqdm.pandas()
from tqdm.notebook import tqdm as tn

In [48]:
import dtale

In [49]:
#-- Load Data
#- It turned out that the csv created in mac os is being parsed on a windows machine, I got the UnicodeDecodeError. To get rid of this error, try passing argument encoding='mac-roman' to read_csv method of pandas library.
df = pd.read_csv('Jan1_Jun30_2022.csv', encoding='mac_roman')
df.head()

Unnamed: 0,Is Incoming,Case Origin,Closed,From Address,Case Owner,Case Comments,Account Name,Subject,Date/Time Opened,Open,Email Subject,Age (Hours),Email Status,Email Message Date,Case Comment Created By,From Name,Case Number
0,0,Email,1.0,info@azuga.com,Gerard Revaz,Report Type: SpeedGaugeTM Daily Summary [Vehic...,Azuga Internal Account,SpeedGauge - All Vehicles Daily Group Summary ...,01/01/22 2:14,0.0,Fleet Support case# 00518840 -,51.0,Sent,01/01/22 2:14,System,Fleet Support,518840.0
1,0,Email,1.0,info@azuga.com,Gerard Revaz,"Hi ,\n\nThank you for contacting Azuga Custome...",Azuga Internal Account,SpeedGauge - All Vehicles Daily Group Summary ...,01/01/22 2:14,0.0,Fleet Support case# 00518840 -,51.0,Sent,01/01/22 2:14,System,Fleet Support,518840.0
2,0,Email,1.0,info@azuga.com,Gerard Revaz,Note:\n\nNo Action Required,Azuga Internal Account,SpeedGauge - All Vehicles Daily Group Summary ...,01/01/22 2:14,0.0,Fleet Support case# 00518840 -,51.0,Sent,01/01/22 2:14,Gerard Revaz,Fleet Support,518840.0
3,1,Email,1.0,speedgauge@speedgauge.net,Gerard Revaz,Report Type: SpeedGaugeTM Daily Summary [Vehic...,Azuga Internal Account,SpeedGauge - All Vehicles Daily Group Summary ...,01/01/22 2:14,0.0,SpeedGauge - All Vehicles Daily Group Summary ...,51.0,Read,01/01/22 2:13,System,,518840.0
4,1,Email,1.0,speedgauge@speedgauge.net,Gerard Revaz,"Hi ,\n\nThank you for contacting Azuga Custome...",Azuga Internal Account,SpeedGauge - All Vehicles Daily Group Summary ...,01/01/22 2:14,0.0,SpeedGauge - All Vehicles Daily Group Summary ...,51.0,Read,01/01/22 2:13,System,,518840.0


In [50]:
df['Case Origin'].unique() 

array(['Email', 'Internal', 'PHLY', 'Phone', 'Voicemail', nan,
       'Phone: Sales Rep called CC', 'Chat'], dtype=object)

In [51]:
df.shape

(258483, 17)

In [52]:
df.columns

Index(['Is Incoming', 'Case Origin', 'Closed', 'From Address', 'Case Owner',
       'Case Comments', 'Account Name', 'Subject', 'Date/Time Opened', 'Open',
       'Email Subject', 'Age (Hours)', 'Email Status', 'Email Message Date',
       'Case Comment Created By', 'From Name', 'Case Number'],
      dtype='object')

In [53]:
# Remove Records with "NULL" Comments
df = df.dropna(subset=['Case Comments'])
df.shape

(253897, 17)

---
##### 1) Perform Parsing w.r.t. "Case Origin"
- Not to Consider "Voicemail" case origin for now.
- 49.67% of Cases are w.r..t "Email" case origin.
---

In [54]:
df = df[df['Case Origin'] != 'Voicemail']
df.shape

(253875, 17)

---
##### 2) Perform Parsing w.r.t. **Account Name**
- Remove Subject with below patterns
  - In String         : Azuga 4G LTE | Azuga Insight | Azuga Internal Account | Azuga Partner Test | Azuga Sales Demo
---

In [55]:
#Azuga | Azuga 4G LTE | Azuga Insight | Azuga Internal Account | Azuga Partner Test | Azuga Sales Demo
#df = df[df['Subject'] != re.search('^Azuga', df['Subject'])]
#txt = df['Subject']
df0 = df[df['Account Name'].str.contains(r'Azuga 4G LTE',
                                         regex=True, na=False) == False]
df0 = df0[df0['Account Name'].str.contains(r'Azuga Insight', 
                                         regex=True, na=False) == False]
df0 = df0[df0['Account Name'].str.contains(r'Azuga Internal Account', 
                                         regex=True, na=False) == False]
df0 = df0[df0['Account Name'].str.contains(r'Azuga Partner Test', 
                                         regex=True, na=False) == False]
df0 = df0[df0['Account Name'].str.contains(r'Azuga Sales Demo', 
                                         regex=True, na=False) == False]
df0.shape

(246974, 17)

---
##### 3) Perform Parsing w.r.t. **"Subject"**
- Remove Subject with below patterns
  - At Start of String: Welcome to Azuga! | Test | Re: Test | [SpeedGauge] | Fwd: Transaction Receipt | KORE Ticket
  - In String         : 4803 | 7302 | Case zd:[0-9] | Shipped 
---

In [56]:
# | Welcome to Azuga! | Test | Testing email | Re: Test | 4803 | 7302 | ??? (Start pattern) | (Case zd: 
# | [SpeedGauge] Re: | Azuga - Devices Shipped | Azuga - ELD Devices Shipped | Azuga - SafetyCam Kit(s) Shipped 
# | Devices Shipped for UBI | Fwd: Transaction Receipt | KORE Ticket
df1 = df0[df0['Subject'].str.contains(r'^Test', 
                                         regex=True, na=False) == False]
df1 = df1[df1['Subject'].str.contains(r'^Welcome to Azuga!', 
                                         regex=True, na=False) == False]
df1 = df1[df1['Subject'].str.contains(r'^Re: Test', 
                                         regex=True, na=False) == False]
df1 = df1[df1['Subject'].str.contains(r'4803', 
                                         regex=True, na=False) == False]
df1 = df1[df1['Subject'].str.contains(r'7302', 
                                         regex=True, na=False) == False]
df1 = df1[df1['Subject'].str.contains(r'Case zd:[0-9]', 
                                         regex=True, na=False) == False]
df1 = df1[df1['Subject'].str.contains(r'^[SpeedGauge]', 
                                         regex=True, na=False) == False]
df1 = df1[df1['Subject'].str.contains(r'^SpeedGauge', 
                                         regex=True, na=False) == False]
df1 = df1[df1['Subject'].str.contains(r'Shipped', 
                                         regex=True, na=False) == False]
df1 = df1[df1['Subject'].str.contains(r'^Fwd: Transaction Receipt', 
                                         regex=True, na=False) == False]
df1 = df1[df1['Subject'].str.contains(r'^KORE Ticket', 
                                         regex=True, na=False) == False]

df1.shape

(213638, 17)

In [57]:
#Important Subjects, that can mostly contain Customer Comments
#-------
#Freshdesk | Feedback Widget | CID: 15870 | Feature Request
#Feedback / Complaints 
dfx = df1[df1['Subject'].str.contains(r'^Freshdesk | Feedback Widget', 
                                         regex=True, na=False) == True]
dfx.shape

(45904, 17)

In [58]:
dfy = df1[df1['Subject'].str.contains(r'^Feedback / Complaints', 
                                         regex=True, na=False) == True]
dfy.shape

(4936, 17)

In [59]:
dfx.head(10)

Unnamed: 0,Is Incoming,Case Origin,Closed,From Address,Case Owner,Case Comments,Account Name,Subject,Date/Time Opened,Open,Email Subject,Age (Hours),Email Status,Email Message Date,Case Comment Created By,From Name,Case Number
81,0,Email,1.0,customercare@azuga.com,Santiago Urbizu,"Hi Jason,\n \n Trust you're doing good,\n \n P...","United Private Car, Inc.",Freshdesk | Feedback Widget | CID: 15700 | SOM...,01/02/22 10:09,0.0,Freshdesk | Feedback Widget | CID: 15700 | SOM...,217.0,Replied,01/06/22 19:19,Santiago Urbizu,Fleet Support,518858.0
82,0,Email,1.0,info@azuga.com,Santiago Urbizu,"Hi Jason,\n \n Trust you're doing good,\n \n P...","United Private Car, Inc.",Freshdesk | Feedback Widget | CID: 15700 | SOM...,01/02/22 10:09,0.0,Fleet Support case# 00518858 -,217.0,Sent,01/02/22 10:10,Santiago Urbizu,Fleet Support,518858.0
83,0,Email,1.0,customercare@azuga.com,Santiago Urbizu,Called back again customer told that the issue...,"United Private Car, Inc.",Freshdesk | Feedback Widget | CID: 15700 | SOM...,01/02/22 10:09,0.0,Freshdesk | Feedback Widget | CID: 15700 | SOM...,217.0,Replied,01/06/22 19:19,Santiago Urbizu,Fleet Support,518858.0
84,0,Email,1.0,info@azuga.com,Santiago Urbizu,Called back again customer told that the issue...,"United Private Car, Inc.",Freshdesk | Feedback Widget | CID: 15700 | SOM...,01/02/22 10:09,0.0,Fleet Support case# 00518858 -,217.0,Sent,01/02/22 10:10,Santiago Urbizu,Fleet Support,518858.0
85,0,Email,1.0,customercare@azuga.com,Santiago Urbizu,"Hi Jason,\n \n Trust you're doing good,\n \n I...","United Private Car, Inc.",Freshdesk | Feedback Widget | CID: 15700 | SOM...,01/02/22 10:09,0.0,Freshdesk | Feedback Widget | CID: 15700 | SOM...,217.0,Replied,01/06/22 19:19,Santiago Urbizu,Fleet Support,518858.0
86,0,Email,1.0,info@azuga.com,Santiago Urbizu,"Hi Jason,\n \n Trust you're doing good,\n \n I...","United Private Car, Inc.",Freshdesk | Feedback Widget | CID: 15700 | SOM...,01/02/22 10:09,0.0,Fleet Support case# 00518858 -,217.0,Sent,01/02/22 10:10,Santiago Urbizu,Fleet Support,518858.0
87,0,Email,1.0,customercare@azuga.com,Santiago Urbizu,"Hi Jason,\n \n Trust you're doing good,\n \n P...","United Private Car, Inc.",Freshdesk | Feedback Widget | CID: 15700 | SOM...,01/02/22 10:09,0.0,Freshdesk | Feedback Widget | CID: 15700 | SOM...,217.0,Replied,01/06/22 19:19,Santiago Urbizu,Fleet Support,518858.0
88,0,Email,1.0,info@azuga.com,Santiago Urbizu,"Hi Jason,\n \n Trust you're doing good,\n \n P...","United Private Car, Inc.",Freshdesk | Feedback Widget | CID: 15700 | SOM...,01/02/22 10:09,0.0,Fleet Support case# 00518858 -,217.0,Sent,01/02/22 10:10,Santiago Urbizu,Fleet Support,518858.0
89,0,Email,1.0,customercare@azuga.com,Santiago Urbizu,Called back the customer on (617) 782-0055 \n ...,"United Private Car, Inc.",Freshdesk | Feedback Widget | CID: 15700 | SOM...,01/02/22 10:09,0.0,Freshdesk | Feedback Widget | CID: 15700 | SOM...,217.0,Replied,01/06/22 19:19,Santiago Urbizu,Fleet Support,518858.0
90,0,Email,1.0,info@azuga.com,Santiago Urbizu,Called back the customer on (617) 782-0055 \n ...,"United Private Car, Inc.",Freshdesk | Feedback Widget | CID: 15700 | SOM...,01/02/22 10:09,0.0,Fleet Support case# 00518858 -,217.0,Sent,01/02/22 10:10,Santiago Urbizu,Fleet Support,518858.0


In [17]:
# Try doing Regex EDA on this - focuss
#dfx.to_csv(r'Feedback_Processed.csv', index = False)

In [18]:
# Try doing Regex EDA on this
#dfy.to_csv(r'Feedback_Complaints.csv', index=False)

In [60]:
df1.head(10)

Unnamed: 0,Is Incoming,Case Origin,Closed,From Address,Case Owner,Case Comments,Account Name,Subject,Date/Time Opened,Open,Email Subject,Age (Hours),Email Status,Email Message Date,Case Comment Created By,From Name,Case Number
18,0,Email,1.0,info@azuga.com,Gerard Revaz,Email from the client: \n\n---------- Forwarde...,Ecotech Hydro Excavation Inc.,Help with 4G Swap,01/01/22 9:43,0.0,Fleet Support case# 00518845 -,60.0,Sent,01/01/22 9:43,Gerard Revaz,Fleet Support,518845.0
19,0,Email,1.0,info@azuga.com,Gerard Revaz,"Good Morning Team,\n\nPlease reach out to Blak...",Ecotech Hydro Excavation Inc.,Help with 4G Swap,01/01/22 9:43,0.0,Fleet Support case# 00518845 -,60.0,Sent,01/01/22 9:43,System,Fleet Support,518845.0
20,0,Email,1.0,info@azuga.com,Gerard Revaz,"Hi ,\n\nThank you for contacting Azuga Custome...",Ecotech Hydro Excavation Inc.,Help with 4G Swap,01/01/22 9:43,0.0,Fleet Support case# 00518845 -,60.0,Sent,01/01/22 9:43,System,Fleet Support,518845.0
21,0,Email,1.0,info@azuga.com,Gerard Revaz,Note: \n\nTagging this case as a duplicate of ...,Ecotech Hydro Excavation Inc.,Help with 4G Swap,01/01/22 9:43,0.0,Fleet Support case# 00518845 -,60.0,Sent,01/01/22 9:43,Gerard Revaz,Fleet Support,518845.0
22,1,Email,1.0,hunterb@azuga.com,Gerard Revaz,Email from the client: \n\n---------- Forwarde...,Ecotech Hydro Excavation Inc.,Help with 4G Swap,01/01/22 9:43,0.0,Help with 4G Swap,60.0,Read,01/01/22 9:42,Gerard Revaz,Hunter Boggs,518845.0
23,1,Email,1.0,hunterb@azuga.com,Gerard Revaz,"Good Morning Team,\n\nPlease reach out to Blak...",Ecotech Hydro Excavation Inc.,Help with 4G Swap,01/01/22 9:43,0.0,Help with 4G Swap,60.0,Read,01/01/22 9:42,System,Hunter Boggs,518845.0
24,1,Email,1.0,hunterb@azuga.com,Gerard Revaz,"Hi ,\n\nThank you for contacting Azuga Custome...",Ecotech Hydro Excavation Inc.,Help with 4G Swap,01/01/22 9:43,0.0,Help with 4G Swap,60.0,Read,01/01/22 9:42,System,Hunter Boggs,518845.0
25,1,Email,1.0,hunterb@azuga.com,Gerard Revaz,Note: \n\nTagging this case as a duplicate of ...,Ecotech Hydro Excavation Inc.,Help with 4G Swap,01/01/22 9:43,0.0,Help with 4G Swap,60.0,Read,01/01/22 9:42,Gerard Revaz,Hunter Boggs,518845.0
26,0,Email,1.0,info@azuga.com,Gerard Revaz,Email sent to the customer: \n\n---------- For...,Ecotech Hydro Excavation Inc.,RE: Help with 4G Swap,01/01/22 10:21,0.0,Fleet Support case# 00518846 - Ecotech Hydro E...,87.0,Sent,01/01/22 10:21,Gerard Revaz,Fleet Support,518846.0
27,0,Email,1.0,info@azuga.com,Gerard Revaz,Case Closure Note: \n\n4G Swap team is taking ...,Ecotech Hydro Excavation Inc.,RE: Help with 4G Swap,01/01/22 10:21,0.0,Fleet Support case# 00518846 - Ecotech Hydro E...,87.0,Sent,01/01/22 10:21,Gerard Revaz,Fleet Support,518846.0


In [61]:
#df1.to_csv(r'Jan_Jun_2022_Processed.csv', index = False)

In [62]:
213638/258483

0.826506965641841

##### Conclusion-1: 
- Able to Parse out **17.36%** of Total unwanted records, by using logic w.r.t. **"Subject" / "Account Name" / "Case Origin"**.
- 16 Parsing points in total, till now w.r.t. the Entire Dataset.

- Now let's focuss on **Parsing out the Customer Comments**
---
---


#### 4) Parsing w.r.t. "Comments"
---

In [91]:
cmts = pd.DataFrame(df1['Case Comments']) 
cmts = cmts.reset_index(drop=True) #- Reset the Index values
print(cmts)

                                            Case Comments
0       Email from the client: \n\n---------- Forwarde...
1       Good Morning Team,\n\nPlease reach out to Blak...
2       Hi ,\n\nThank you for contacting Azuga Custome...
3       Note: \n\nTagging this case as a duplicate of ...
4       Email from the client: \n\n---------- Forwarde...
...                                                   ...
213633       Closing the case as not  related to  support
213634  Dear Azuga -\n\nCan you please verify that I a...
213635  Greetings,\n\nThanks for your email.\n\nYour r...
213636  Dear Azuga -\n\nCan you please verify that I a...
213637  Greetings,\n\nThanks for your email.\n\nYour r...

[213638 rows x 1 columns]


---
#### Part-1 :
- Parsing only "Subject", starting from **---------- Forwarded message ---------**
    - This isn't that helpful, as of now
---

In [92]:
## Parsing Based on "---------- Forwarded message ---------"
com1 =  cmts[cmts['Case Comments'].str.contains(r'---------- Forwarded message ---------', 
                                         regex=True, na=False) == True]
com1 = com1.reset_index(drop=True)                                         
com1.shape

(20318, 1)

In [93]:
com1.head(10)

Unnamed: 0,Case Comments
0,Email from the client: \n\n---------- Forwarde...
1,Email from the client: \n\n---------- Forwarde...
2,Email sent to the customer: \n\n---------- For...
3,"As the 4G swap team is taking this forward, I ..."
4,Email sent to the customer: \n\n---------- For...
5,"As the 4G swap team is taking this forward, I ..."
6,---------- Forwarded message ---------\nFrom: ...
7,---------- Forwarded message ---------\nFrom: ...
8,---------- Forwarded message ---------\nFrom: ...
9,---------- Forwarded message ---------\nFrom: ...


In [94]:
#dtale.show(com1)

---
#### Part-2 :
- Parsing only "Subject", starting from **Freshdesk | Feedback Widget**
    -  **Case Comments** contain *"The decription of the ticket is as below"*
    -  **Case Comments** contain *"Helpful Info powered by"* (At Last)
---

In [136]:
dfx = df1[df1['Subject'].str.contains(r'^Freshdesk | Feedback Widget', 
                                         regex=True, na=False) == True]
dfx = dfx.reset_index(drop=True)
dfx.shape

(45904, 17)

In [137]:
#dtale.show(dfx)

In [138]:
dfx_f = dfx[dfx['Case Comments'].str.contains(r'The description of the ticket is as below', 
                                            regex=True, na=False) == True]
dfx_f =  dfx_f.reset_index(drop=True)   
dfx_f.shape                                        

(5948, 17)

In [139]:
x = dfx_f['Case Comments'].str.split("The description of the ticket is as below",  n = 1, expand = True)
x.shape

(5948, 2)

In [140]:
#- Using "Str.split()" To just split the strings & create two Index columns | Assign [0][1] columns after split, to Main DataFrame (dfx_f) with New Column names
#-<support@azuga.freshdesk.com> wrote:
x = dfx_f['Case Comments'].str.split("The description of the ticket is as below", n = 1, expand = True)
# Creating New Columns
dfx_f['Initial String'] = x[0] 
dfx_f['Customer Comments'] = x[1]

In [141]:
# Drop the Column that is not required
dfx_f.drop(columns=['Initial String'], inplace=True)
dfx_f.columns

Index(['Is Incoming', 'Case Origin', 'Closed', 'From Address', 'Case Owner',
       'Case Comments', 'Account Name', 'Subject', 'Date/Time Opened', 'Open',
       'Email Subject', 'Age (Hours)', 'Email Status', 'Email Message Date',
       'Case Comment Created By', 'From Name', 'Case Number',
       'Customer Comments'],
      dtype='object')

In [142]:
# Re Index the Pandas Data Frame, w.r.t. Required Positions for all columns
dfx_f = dfx_f.reindex(columns=['Is Incoming', 'Case Origin', 'Closed', 'From Address', 'Case Owner',
       'Customer Comments', 'Account Name', 'Subject', 'Date/Time Opened', 'Open',
       'Email Subject', 'Age (Hours)', 'Email Status', 'Email Message Date',
       'Case Comment Created By', 'From Name', 'Case Number',
       'Case Comments'])

In [143]:
#- In total There are 1842 Unique Comments
x = dfx_f['Customer Comments'].unique()
x.shape
print(x)

[' \n   \n I am attempting to set up my new system by adding vehicle types but you will not allow me due to this error message below.     SOMETHING BAD HAPPENED.     Someone needs to call me ASAP 617 938 6522'
 " \n   \n Good morning,  Truck #7's tacker is not working. Please advise. Thank you"
 "\n\n\n\n1101605018 can you check this Serial #. I'm thinking I have two devices that need to be swapped. Currently I have this in a 2019 F650 but its not showing mileage on my dashboard. The other vehicle that is not showing mileage is a 2009 Kenworth with serial # 1101704245. Could you check to see if I need to swap these?\n\nHelpful Info powered by Freshdesk Support Desk"
 ...
 ' \n   \n Hi,†  We received a notification for hard braking for Unit 10 and Unit 11 today but neither have shown up in videos.† Does that mean their cameras are not plugged in?'
 ' \n   \n The device name Truck 6 is picking up the wrong mileage. Mileage should read 373713'
 ' \n   \n I just added two new devices. They

In [144]:
#- Import "Stop Words" & "Word_Tokenize" Libraries
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize 
stop_words = set(stopwords.words('english')) #- Consider only Stop Words w.r.t. "English" lang.

In [145]:
import string
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

##### Create a Function - "Pre-Process" / "Remove-Links"
- To Remove (Puntuations /  Email User Ids / StopWords/ Double Spacing / Additional Spacing / Numbers)
- Convert all sentences to Lower Cases.

In [147]:
#-> Final Function (To Automate all)
#- Remove Links
def remove_links(comment):
    '''Takes the String Input and removes HTTP , bit.ly and [links]'''
    comment = re.sub(r'From:\s+','',comment)   # Remove http links
    comment = re.sub(r'To:\s+','',comment) # Remove bit.ly links
    comment = comment.strip('[links]')        # Remove Links of all types
    return comment

my_punctuation = '!"$%&\'()*+,-./:;<=>?[\\]^_`{|}~•@†#'
def preprocess(sent):
    ''' 
    Takes String Input, Removes Links/ User Info. / Retweets.
    Converts the Text Data to Lower Case, completely.
    Removes "Double Spacing" & "Punctuations" & "Numbers" from Text.
    Performs Tokenization , & sents TOKENs as an o/p (After Joining)
    Removes StopWords & sents O/P w.o. Stopwords (After Joining)
    '''
    sent = re.sub('Helpful Info powered by Freshdesk Support Desk','',sent)                            # Send a call to "Remove_Users" function
    sent = sent.lower()                                  # Convert to Lower Case
    sent = re.sub('\n', '', sent)
    sent = re.sub('\s+',' ',sent)                        # Remove Double Spacing
    #sent = re.sub('\n\n' , '\n', sent)
    sent = re.sub('[0-9+]','',sent)                      # Remove Numbers
    sent = re.sub('[+'+ my_punctuation +']+','',sent)    # Removes Punctuation "In Between/ at Start/ at End" of Text
    sent_token_list = [word for word in sent.split(' ')] # Basic Tokenization
    sent = ' '.join(sent_token_list)                     # Join Tokenized output and sends
    sent_stopwords_rm = [word for word in sent.split() if word.lower() not in stopwords.words('english')]                
    sent = ' '.join(sent_stopwords_rm)
    return sent

#def remove_extra(sent):
     

In [148]:
#- Only Considering Records, with Unique Customer Comments
#df_unique = dfx_f.drop_duplicates('Customer Comments', keep='last')
#df_unique = dfx_f.drop_duplicates('Customer Comments', keep='last', inplace=True)
#df_unique.shape


#dfx_f['Customer Comments'].apply(text_process)

In [149]:
#- Apply Above Defined Functions to "short_df['Tweet']" (Tweets)
dfx_f['Customer Comments'] = dfx_f['Customer Comments'].progress_apply(
    lambda x: preprocess(x))

100%|██████████| 5948/5948 [00:22<00:00, 261.08it/s]


In [150]:
dfx_f['Customer Comments'] 

0       attempting set new system adding vehicle types...
1       attempting set new system adding vehicle types...
2       attempting set new system adding vehicle types...
3       attempting set new system adding vehicle types...
4       attempting set new system adding vehicle types...
                              ...                        
5943    hi received notification hard braking unit uni...
5944    device name truck picking wrong mileage mileag...
5945    device name truck picking wrong mileage mileag...
5946    added two new devices registered vehicle pleas...
5947    added two new devices registered vehicle pleas...
Name: Customer Comments, Length: 5948, dtype: object

In [151]:
#- Download "perluniprops" from "NLTK" using "download_shell()" - For De-Tokenaization
#import nltk
#nltk.download_shell()


In [152]:
dfx_f.drop(columns='From Address', inplace=True)
dfx_f.shape

(5948, 17)

In [153]:
#df_unique = dfx_f['Case Comments'].to_string()
df_unique = dfx_f.drop_duplicates('Customer Comments', keep='last')
df_unique = df_unique.reset_index(drop=True)
df_unique.shape

(1696, 17)

In [154]:
df_unique['Customer Comments'].unique

<bound method Series.unique of 0       attempting set new system adding vehicle types...
1       good morning truck tacker working please advis...
2       check serial im thinking two devices need swap...
3                   device issues weeks ago isnt tracking
4                              please advise unit working
                              ...                        
1691                   neither gps camera working vehicle
1692                would like someone call susan haupert
1693    hi received notification hard braking unit uni...
1694    device name truck picking wrong mileage mileag...
1695    added two new devices registered vehicle pleas...
Name: Customer Comments, Length: 1696, dtype: object>

In [155]:
dtale.show(df_unique)



In [156]:
# - Generate a Processed Output file
df_unique.to_csv('Feedback_processed_f1.csv', index=False)

##### Issues to Resolve, when going for scaling this algorithm
- **1)** Spacing Issue (Many Duplicate comments already have spacing, At start/In Middle etc..) | This needs to be taken care of.