### This project analyzes customer satisfaction by counting word frequency of received feedbacks.
We will count the most frequent words to uncover issues that our customers are most concerned or the most satisfied about. Each comment has its own customer rate from 1 to 5 where _1 - unsatisfactory_ and _5 - satisfactory_. This rate will help us better understand if the particular word is meant in positive or negative way. 

In [30]:
# Open csv file, get list of list to work_list

import csv
from csv import reader

work_list = []
with open("datasets/Sluzby (Voice Pausal Go) - Satisfaction rate.csv", encoding = 'utf-8') as fp:
    read_file = csv.reader(fp)
    for row in read_file:
        work_list.append(row)

In [2]:
# Print first 3 rows. Each list inside of list represents one row in excel 
# We can see name of columns in the first row

print(work_list[:3])

[['\ufeffNumber', 'User', 'Date Submitted', 'Country', 'Device', 'Browser', 'OS', 'Ako ste spokojní s našimi Paušálmi Go na škále 1-5?', 'Prosím, opíšte prečo. Je to pre nás veľmi dôležité.'], ['1', '0', '17/03/2020 16:44', 'Slovakia', 'phone', 'Chrome Mobile 80.0.3987', 'Android', '', ''], ['2', '0', '17/03/2020 16:44', 'Slovakia', 'desktop', 'Chrome 80.0.3987', 'Windows 10', '3', '']]


In [3]:
# We need just rate and comment columns. We will assign them into new_list

new_list = [row[-2:] for row in work_list]
print(new_list[:5])

[['Ako ste spokojní s našimi Paušálmi Go na škále 1-5?', 'Prosím, opíšte prečo. Je to pre nás veľmi dôležité.'], ['', ''], ['3', ''], ['', ''], ['3', '']]


In [4]:
# We dont need name of columns. This command will exclude first list and print last 3 rows.

new_list = new_list[1:]
print(new_list[-3:])

[['2', 'Mám paušál go Europe 40 všade je napísané že mám neobmedzené hovory, sms,  internet ale keďže sa nachádzam často v zahraničí a menej na Slovensku tak mi douctovavali ešte hovorme v rámci eú mimo Sk, ale sám paušál má názov go Europe nie go Slovensko čo pri mesačnom pausale 40 euro dávno mám zaplatené všetky hovory či na Sk alebo v Európe keby som si vynasobil cenu minúty a počet prevodových minút tak možno zaplatím okolo 30 euro aj s internetom som vašim zákazníkom od roku 1998 a ostatný operátori majú lepšie podmienky ako sú u vás. '], ['4', 'Dobré ponuky a mohli by byť aj viac'], ['5', '']]


We have many rates without comments. Firstly, we need to delete unwanted characters as commas, dots, question marks and etc. After deleting these characters, new empty comments will appear, therefore this step is neccessary to run before deleting empty comments. 

In next step we have 3 functions:   
* remove_char_function remove unwanted characters from bad_char list.  
* edit_new_row function remove special character \n which means new row. Every comment that was added as new row, has this special character, therefore we need to have comments as one line of string.  
* strip_lower function strip comments of any additional spaces and make lower cases. 

In [5]:
# list of unwanted characters
bad_char = [',','.','!','?', '(',')', ':','"','-', '😁', '+']

def remove_char_function(string):
    for char in bad_char:
        string = string.replace(char,'') # removing unwanted character
    return string

def edit_new_row(string):
    if '\n' in string:
        string = string.replace("\n", ' ') # removing \n special character
    return string

def strip_lower(string):
    string = string.strip() # strip comments of additional spaces
    string = string.lower() # lower case
    return string

# iteration through each list in new_list

for row in new_list:
    comment = row[1] # second column assigned to variable comment
    comment = remove_char_function(comment) # first function
    comment = edit_new_row(comment) # second function 
    comment = strip_lower(comment) # third function
    row[1] = comment # variable comment assigned back to second column

Now we can delete empty comments. We will do that by creating clean_list

In [6]:
# deleting empty comments

clean_list = [i for i in new_list if i[1] is not '']
clean_list[:5]
    

[['1', 'su drahe'],
 ['4', 'mam starsi pausal potrebujem ho zmenit na sucasne potreby'],
 ['2',
  'pozdravujem vas som vas dlhodoby zakaznik a a neviete mi ponuknut ovela nisi pausol alebo moznosti ako konkurencia'],
 ['3', 'vysoké ceny oproti iným operátorom'],
 ['4', 'vsetko ok ale cena je vyssia ako konkurencia']]

Firstly we will analyze word counts of all comments regardless of rate. To do that, we need to put all comments into one string. 

In [7]:
all_comments = '' # initialize empty string

for row in clean_list: # iterating through list
    comment = row[1] # assign second column to comment
    all_comments += comment + ' ' # add comment to string

all_comments = all_comments.strip() # delete space after last tring

Now we have a long line of string in all comments. We can now remove any unwanted words as we have done with unwanted characters. In next step we have 3 functions.  
* removeStopWords function will remove unwanted words from stopwords list.  
* wordListToFreqDict fuction will count the words into list and create dictionary with key as particular word and value as frequency count of that word.  
* sortDict fuction sorts dictionary from most frequent to less frequent word and return list.  

**Resource** of these functions can be find here https://programminghistorian.org/en/lessons/counting-frequencies

In [8]:
stopwords = ['za', 'je', 'to', 'na', 'v', 'a', 'som', 'sa', 'mi', 'o', 'by', 's', 'ako', 'pre', 'aj', 'si', 'k', 'u', 'tak', 'lebo']

def removeStopWords(wordlist, stopwords):
    return [word for word in wordlist if word not in stopwords]

def wordListToFreqDict(wordlist):
    wordfreq = [wordlist.count(p) for p in wordlist] 
    return dict(list(zip(wordlist, wordfreq)))

def sortDict(freqdict):
    aux = [(freqdict[key], key) for key in freqdict]
    aux.sort()
    aux.reverse()
    return aux

all_comments_list = all_comments.split() # split comments to words
all_comments_list = removeStopWords(all_comments_list, stopwords) # removing unwanted words
word_freq_dict = wordListToFreqDict(all_comments_list) # counting words and return dictionary
word_freq = sortDict(word_freq_dict) # sorting dictionary from most frequent 

Now we have counted all words. Let's print first 10. 

In [9]:
word_freq[:10]

[(58, 'dát'),
 (54, 'dat'),
 (54, 'cena'),
 (52, 'paušál'),
 (40, 'viac'),
 (40, 'malo'),
 (39, 'pausal'),
 (37, 'mám'),
 (36, 'internet'),
 (34, 'málo')]

We can see that most frequent words are related to data. We cannot clearly say if customers are satisfied or unsatisfied with data. This applies to other words as well for example price (cena). This is the reason why we are working with rate of customers. We can use rate of comment to give us better overview of customer satisfaction. 

In order to include rate to particular word, we can firstly use dictionary that has unique values of rate with value of comments. In iteration through clean list, each row will have assigned rate as key and comment as value. These will be included to new dictionary main_dict where each key will be updated with another comment. 

In [10]:
main_dict = {} # initialize empty dictionary

def merge(main_dict, sub_dict):
    for key, value in sub_dict.items():
        if key in main_dict: # if rate is already in main_dict add comment
            main_dict[key] = main_dict[key] + ' ' + sub_dict[key] 
        else: # if rate does not exist in main_dict update all from sub_dict 
            main_dict.update({key: value}) 
    return main_dict

for row in clean_list:
    value = row[1] # assign second column to value
    key = row[0] # assign first column to key
    sub_dict = {key : value} # make dictionary with key and value
    main_dict = merge(main_dict, sub_dict) # call fuction merge
    

Now when we have dictionary with rates as keys and comments as values, we can iterate through comments that represents line of strings and follow same process as we have done with all comments. We added new function for sorting our final dictionary. 

In [11]:
def sortMainDict(main_dict):
    sort = [(key, main_dict[key]) for key in main_dict]
    sort.sort()
    return sort  

for key, value in main_dict.items():
    wordstring = value # assign value of dictionary to wordstring
    wordlist = wordstring.split() # split line of strings to words
    wordlist = removeStopWords(wordlist, stopwords) # function to remove unwanted words
    dictionary = wordListToFreqDict(wordlist) # counting words and return dictionary
    sort_dict = sortDict(dictionary) # sorting dictionary from most frequent 
    main_dict[key] = sort_dict # assign list as value to final dictionary
    sorted_list = sortMainDict(main_dict) # sort dictionary based on rate

Our last function returned list, so finally we can print first 10 rows and see what we get. 

In [12]:
print('Rate 1 - Unsatisfied \nRate 5 - Satisfied\n')
for s in sorted_list: 
    key = s[0]
    value = s[1]
    value = value[:7]
    print('Rate {}: \n{}\n'.format(key,value))


Rate 1 - Unsatisfied 
Rate 5 - Satisfied

Rate 1: 
[(12, 'paušál'), (10, 'nemam'), (10, 'dat'), (9, 'malo'), (7, 'ste'), (7, 'pausal'), (7, 'go'), (6, 'volania'), (6, 'nemám'), (6, 'mam')]

Rate 2: 
[(16, 'dát'), (12, 'paušál'), (8, 'od'), (8, 'malo'), (8, 'cenu'), (7, 'sú'), (7, 'pausal'), (7, 'nie'), (7, 'málo'), (7, 'go')]

Rate 3: 
[(29, 'dat'), (26, 'dát'), (25, 'viac'), (21, 'cena'), (20, 'malo'), (18, 'pausal'), (18, 'málo'), (17, 'paušál'), (15, 'nie'), (14, 'go')]

Rate 4: 
[(11, 'cena'), (8, 'sms'), (8, 'potrebujem'), (8, 'neviem'), (7, 'internet'), (7, 'dát'), (7, 'ale'), (6, 'vyhovuje'), (6, 'nie'), (6, 'mam')]

Rate 5: 
[(18, 'mám'), (12, 'volania'), (10, 'všetko'), (10, 'cena'), (8, 'vyhovuje'), (8, 'sms'), (8, 'paušál'), (7, 'čo'), (7, 'super'), (7, 'ste')]



From this overview, we can see that customers were most unsatisfied with **product (pausal)**.  
More than 40 customers rated pausal from 1 to 3 from total 52 word frequency. Same applies to data.  
Opposite conclusion we can make is about **price (cena)**. More than 40 customers out of total 54, rated price from 3 to 5.  

We can conclude that customers are satisfied with **prices**, but are unsatisfied with **data of pausal.**  
This helped us better understand our customers and make correct decisions about prices and products. 