# Load and preview the data

In [267]:
import pandas as pd
PATH_TO_DATA = "../data/pnlp_data_en.csv"

# change display options so that we can read the whole comments
df = pd.read_csv(PATH_TO_DATA, delimiter=';')
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_colwidth', 500)

# rename columns for easier usability
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')
# preview
df.head(100)

(17672, 3)


Unnamed: 0,report_grouping,question_text,comments
0,Large Department,Please tell us what is working well.,"we do what our customers need, we communicate aperiodically."
1,Large Department,Please tell us what is working well.,"Customs business development continues to grow and expand, through the use of our internal business network & the team of people which are in place at the moment"
2,Large Department,Please tell us what is working well.,"I think the team work hard, are committed to continuous improvement and doing a good job for our customers."
3,Large Department,Please tell us what is working well.,Overall working towards a customer centric environment is working well and more effective teamwork within the company
4,Large Department,Please tell us what is working well.,Customer centricity is a growing culture in the company creating a very positive customer experience
5,Large Department,Please tell us what is working well.,•Develop a comfortable rapport with clients and determine their preferences for products and services •Resolve customer complaints with patience and creativity.
6,Large Department,Please tell us what is working well.,THE CUSTOMER IS THE CENTER.
7,Large Department,Please tell us what is working well.,"we are usually seeking customer satisfaction, helping our custom on fixing problems what ever it is related to [company] or not."
8,Large Department,Please tell us what is working well.,Alignment between regional office and country A focus on the customer with hunting /farming and operations owning customers
9,Large Department,Please tell us what is working well.,"innovation, customer relations ship and customer feedback"


# Inspect the data

## Capitalization

#### Identify the comments that contain ONLY uppercase words

**1180** comments **(0.07%)** are all uppercase


In [286]:
all_caps = df[df['comments'].str.isupper()]
print("{0} comments ({1:.2f}%) are all uppercase".format(all_caps.shape[0], all_caps.shape[0]/17672))
all_caps

1180 comments (0.07%) are all uppercase


Unnamed: 0,report_grouping,question_text,comments
6,Large Department,Please tell us what is working well.,THE CUSTOMER IS THE CENTER.
19,Large Department,Please tell us what is working well.,"WORKING WELL IS DOING A GREAT JOB WITH EFFICIENCY, ACCURACY, TIMELIBESS AND MEETING THE EXPECTATION OF THE CUSTOMER"
40,Large Department,Please tell us what is working well.,FIRST OF ALL CUSTOMER SATISFACTION
52,Large Department,Please tell us what is working well.,"I AM VERY PROVED IN WORK IN OUR COMPANY. OUR COMPANY'S PROCESS AND RUNNING ARE VERY RESPECTFULL. I AM HAPPY TO INFORM THAT GOT MONTHLY SALARY, UNIFORM, AIR TICKET AND WEEKLY FRUIT ON TIME."
53,Large Department,Please tell us what is working well.,WE WORK AS A TEAM AND SPORTING EACHOTHERS AND OUR MENAGEMENT GUID US THAT IS A GOOD.
...,...,...,...
17646,Small Department,Please tell us what needs to be improved.,ADDITIONAL EQUIPMENT MORE SUPERVISION MORE DOCK SPACE MORE SALES PEOPLE NEEDED MUCH MORE FREIGHT IN AREAS WE COULD GET
17647,Small Department,Please tell us what needs to be improved.,I THINK MANY IF NOT ALL WOULD AGREE THE INDUSTRY PAY STANDARD HAS DECLINED. TO BE MORE COMPETITIVE AND BRING IN NEW DRIVERS AND COMBAT DRIVER SHORTAGE A BETTER PAY INCENTIVES IS A MUST.CURRENT INCENTIVES ARE WHAT DRIVERS WERE MAKING 10 YEARS AGO.
17648,Small Department,Please tell us what needs to be improved.,NO MORE 12 YR OLD LEADERS IN MANAGMENT
17667,Small Department,Please tell us what needs to be improved.,"FOR THE PAST YEARS THE COMPANY HAS COMMITED TO UPDATE THE EQUIPMENT FROM COMPUTERS,TRACTORS,TRAILES AND FORKLIFTS BUT NO ONE SEEMS TO NOTICE THE SPOTTERS REQUEST FOR NEW ( SPOTTER TRUKS) WE ARE WORKING WITH REALY UNRELIABLE SPOTTER TRUCK (CHI TERMINL"


#### Identify the comments that contain SOME capitalized words (i.e. words containing at least 4 capitalized letters)
**462** comments **(0.03%)** contain at least one uppercase 4-letter or longer word 

Sometimes the only non-capitalized word is the result of the anonymization process e.g. [company]

In [284]:
some_caps = df[df['comments'].str.contains(pat = '[a-z ]+[A-Z]{4,}', regex = True)]

# concatenate with the all caps comments, and get the difference between the two data frames 
concat = pd.concat([all_caps, some_caps]).loc[all_caps.index.symmetric_difference(some_caps.index)]
print("{0} comments ({1:.2f}%) contain at least one uppercase 4-letter or longer word ".format(concat.shape[0], concat.shape[0]/17672))
concat

462 comments (0.03%) contain at least one uppercase 4-letter or longer word 


Unnamed: 0,report_grouping,question_text,comments
57,Large Department,Please tell us what is working well.,I WORK VERY PROUD OF THIS COMPANY..new idyas and davalaping my managment giving me new idyas and work knowlege
74,Large Department,Please tell us what is working well.,For me working with [company] is all about being punctual and responsible at Work Place. Really love to work on critical shipments with efficiency to deliver excellence to all the customer.This is possible by taking OWNERSHIP of the shipment and TEAM WORK.
182,Large Department,Please tell us what is working well.,Customer Focus and QEHS
272,Large Department,Please tell us what is working well.,In [company] we are defintiely making earnest efforts to be customer centric and the SENIOR LEADERSHIP team in INDIA is doing a great job in leading people in that direction
404,Large Department,Please tell us what is working well.,Need all levels of SMT to support customer requirements ensuring best customer experience. Difficult when you cannot get SMT attention on key customer issues impacting our service level agreements and trying to get a solution that is needed ASAP
474,Large Department,Please tell us what is working well.,THE PARTNERSHIP BETWEEN [company] AND THE CUSTOMER
584,Large Department,Please tell us what is working well.,Ontime salary & yearly salary. Q43-Interest Free loan has to be provided to those staffs who are in need for genuine reasons.Trucking is not centralised in DAFZ warehouse staffs is still doing trucking activity.
585,Large Department,Please tell us what is working well.,Yearly ticket Q43- Hardship allowance has to be included during winter period as well.We need canteen at DAFZ for all employees. Haymanagment has been introduced but department heads are not aware & not briefed.
675,Large Department,Please tell us what is working well.,"DuPont group email with the categories is working well, import no one moved emails with out the response party knowing or a pick up can be missed. Warehouse using the sheets provided to then for HAWBs and destination."
705,Large Department,Please tell us what is working well.,EMPLOYEE ENGAGEMENT with Fun Filled activities enhance team spirit... I am proud of working in [company]


#### Identify potential acronyms (words that have between 2 and 3 capitalized letters)
returns uppercase 2-3 letter words along with their frequency counts. if we remove the stopwords, there's a good chance we can identify many of the acronyms used 

In [201]:
acronyms = df[df['comments'].str.contains(pat = '[A-Z ]{2,3}', regex = True)]
acronyms = pd.concat([acronyms, all_caps]).loc[acronyms.index.symmetric_difference(all_caps.index)]

# tokenize each comment and get the frequencies of each token
counts = acronyms['comments'].str.split(expand=True).stack().value_counts()
# print the first 35 frequencies of the tokens that are all uppercase
counts.loc[my_value_count.index.str.isupper()][:35]

I       2556
IT       193
A        181
EOS      156
CIF      151
HR       150
THE      101
TO        94
AND       87
SMT       71
PD        70
IS        64
US        64
KPI       57
WE        53
IN        52
NFE       51
CS        43
FOR       42
NOT       42
OF        41
CEO       37
GSC       36
ARE       35
CFM       34
AFR       32
WORK      31
ALL       31
TEAM      30
L&D       30
GP        30
OFR       28
OUR       28
NO        27
EDM       27
dtype: int64

## Punctuation
**970** comments **(0.05%)** contain at least two consecutive punctuation characters

**51** comments contain at least 3 consecutive punctuation *excluding* ellipsis

In [293]:
# comments that contain at least 3 consecutive punctuation characters, except for . 
punctuation = df[df['comments'].str.contains(pat = '[^\w\s^(.)]{3,}', regex = True)] # remove ^(.) from the regex to display comments with ellipsis
print("{0} comments contain at least 3 consecutive punctuation excluding ellipsis".format(punctuation.shape[0], punctuation.shape[0]/17672))
punctuation

51 comments contain at least 3 consecutive punctuation excluding ellipsis


Unnamed: 0,report_grouping,question_text,comments
2542,Large Department,Please tell us what is working well.,The employees and their dedication to each other. Without customers a company can't exist. Without employees there are no customers. ADD MORE CHARACTER SPACE FOR WRITING!!!!
2723,Large Department,Please tell us what is working well.,AT THIS PINT I HAVE NOT CLUE!!!!
2875,Large Department,Please tell us what is working well.,"Making HR a true partner in the business ---- this is a work in progress, but we are making ouw way nicely."
3530,Large Department,Please tell us what is working well.,New transforming technologies plus processes will change the [company] GSC at higher level. There is going to be good scope and opportunities for improvement while working in such transformational environment. GSC Mumbai Rocks & keep flying high!!!
7047,Large Department,Please tell us what is working well.,I can say all are working very well !!! Thank you for all the support !!!
7871,Large Department,Please tell us what is working well.,"The ozone layer, really big brick walls, the sun, ME!!, this computer that i'm doing this survey on, the light bulbs in this room and my last bowel movement."
8265,Large Department,Please tell us what needs to be improved.,"The customer needs to see more Customer Service from [company]>[company] does the best we can, but we need more people customer facing. More people in the pricing desks. We miss revenue because we do not quote the shipments on time, pickup on time, and respond."
8357,Large Department,Please tell us what needs to be improved.,There are a lot of initiatives takes with customer centricity and quality improvement but again there is no concrete plan to achieve the real Customer satisfaction or WOW !!!!
8902,Large Department,Please tell us what needs to be improved.,if possible we could have on duty nurse and small clinic to cater small injuries and twice a month in house doctor to be available in the clinic ///retirement plan for employees wherein staff can buy stocks
9536,Large Department,Please tell us what needs to be improved.,There is so much that can be Discussed in this subject...The most important is that the Management needs to make the employees feel that they are appreciated for the work they do and the 100% effort that is given on a daily....MANAGER TEAMWORK!!!


## Sentiment
The questions would ideally predict the sentiment of the answers i.e. What is working well = positive sentiment, What can be improved = negative sentiment. However, there are cases in which this doesn't apply and below there is a simplistic overview of those cases

#### Negative words in answers to the question about what is going well

In [248]:
# create some dummy positive and negative seed word lists
positive_words = ['good', 'well', 'acceptable', 'excellent', 'exceptional', 'favorable', 'great', 'marvelous', 'positive', 'satisfactory', 'satisfying']
negative_words = ['bad', 'badly' 'atrocious', 'awful', 'dreadful', 'lousy', 'poor', 'sad', 'unacceptable']


working_well = df.loc[df['question_text'] == 'Please tell us what is working well.']['comments']
#TODO figure out how to do this analysis inside the dataframe and not from a list
for comment in working_well:
    for word in comment.split():
        if word in negative_words:
            print(comment+'\n')

communication is very poor in cae, management doesn't explain what's going on or the future of cae.

Not all of the employees are dedicated. The company should recruit better employees. We don't need bad employees.

Communication to all Staff from the Top about the business position - good and bad  SMT presence at sites and getting involved with feedback from the staff

I like the steps the company has taken to counteract many of the poor decisions that were made as part of ORGA and NFE. I also appreciate the new focus on the FTZ program in DFW, it has been a long time coming.

It's not getting any better the disrespect, isolation, disconnect and poor leadership from [company] LAX management is absolutely unbearable. If I didn't care about our customers I would leave. There's no motivation, involvement and or support by mgmt.

My direct supervisor is fair and always give feedback for job good or bad done.

US SMT understands changes are necessary but poor execution.

Environmental Prot

#### Positive words in the question about what could be improved
The way this question is phrased could make it more likely that people use positive language, however many people also mention that they have no complaints (e.g. 'all are going well'), or talk about the things they like at the company (*'I have new and very good working tools', 'right now I think we are headed in the right direction with excellent leadership.'*)

In [247]:
improvement = df.loc[df['question_text'] == 'Please tell us what needs to be improved.']['comments']
for comment in improvement:
    for word in comment.split():
        if word in positive_words:
            print(comment +'\n')

There needs to be more innovation in practices and not only except change, when it is forced upon us. We need to look at better practices and look at how we can ensure excellent service to customers.

Focus on setting up training workshops, because it improves the employees' skills. For example, train the employees to work in other branches, for two weeks to a month for example, this will have a positive effect on the increase of the competence levels of the employees and pass over experience from one branch to the other.

Quality- needs to be given a greater voice. The development of  site level customer champions to assist with for example areas of continuous improvement will help. Culturally better management through engagement and a positive approach will help.

Quality of service is not close to being good enough. Pricing is also not always competitive - especially at the initial quotation stage. Both these issues are barriers to improving customer relationships and business growt

## Corpus analysis

### Frequencies

### Collocations