### Prepping Data Challenge: NPS for Airlines (week 23)

This week Prep Air are looking into their Net Promoter Score (NPS) and how this compares with a variety of other new airlines. NPS usually takes the form of asking customers "How likely ae you to recommend this company on a scale of 0-10?" You then subtract the detractors of your company from the promoters and end up with a score between -100 and +100. The higher the NPS, the better!

However, like most metrics, on its own it doesn't tell you a lot. Do customers feel strongly one way or the other about any airlines? So it would be good to compare Prep Air's NPS with other airline's Net Promoter Scores too! In this challenge we'll use Z-Scores to standardise the scores and see whether Prep Air is above or below average.

### Requirements
 - Input the data
 - Combine Prep Air dataset with other airlines
 - Exclude any airlines who have had less than 50 customers respond
 - Classify customer responses to the question in the following way:
   - 0-6 = Detractors
   - 7-8 = Passive
   - 9-10 = Promoters
 - Calculate the NPS for each airline
   - NPS = % Promoters - % Detractors
 - Calculate the average and standard deviation of the dataset
 - Take each airline's NPS and subtract the average, then divide this by the standard deviation
 - Filter to just show Prep Air's NPS along with their Z-Score
 - Output the data

In [1]:
import pandas as pd
import numpy as np

In [2]:
#Input the data
#Combine Prep Air dataset with other airlines

with pd.ExcelFile('WK23-NPS Input.xlsx') as xlsx:
    df = pd.concat([pd.read_excel(xlsx, s) for s in xlsx.sheet_names]) 

In [3]:
df.rename(columns={'How likely are you to recommend this airline?':'Score'}, inplace=True)

In [4]:
df.head()

Unnamed: 0,Airline,CustomerID,Score
0,"Schmeler, Schimmel and Collier",013d950,6
1,"Schmeler, Schimmel and Collier",0d25185,10
2,"Schmeler, Schimmel and Collier",a1b541d,10
3,"Schmeler, Schimmel and Collier",6b24ea8,9
4,"Schmeler, Schimmel and Collier",d5f96ab,7


In [5]:
#Exclude any airlines who have had less than 50 customers respond
df = df.groupby('Airline').filter(lambda x: len(x) >= 50)

In [6]:
#Classify customer responses to the question
df['Cust_class'] = pd.cut(df['Score'], bins=[0, 6, 8, 10],
                     labels=['Detractors', 'Passive', 'Promoters'], right=True, 
                     include_lowest=True).astype(str)

<div class="alert alert-block alert-info">
    
Pandas <strong>cut()</strong> function is used to separate the array elements into different bins . The cut function is mainly used to perform statistical analysis on scalar data.  

<ul>
<li><strong>Syntax:</strong> <br> <strong>cut</strong> (x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates=”raise”,)</li>

<li><strong>Parameters:</strong><br> <strong>x</strong> :The input array to be binned. Must be 1-dimensional</li>

<li><strong>bins:</strong> defines the bin edges for the segmentation.</li>

<li><strong>right : (bool, default True )</strong>  Indicates whether bins includes the rightmost edge or not. If <strong>right == True</strong> (the default), then the bins [0, 6, 8, 10] indicate (0,6], (6,8], (8,10].</li> 

<li><strong>labels : (array or bool, optional)</strong>  Specifies the labels for the returned bins. Must be the same length as the resulting bins. If False, returns only integer indicators of the bins. in this case <strong>['Detractors', 'Passive', 'Promoters']</strong></li> 

<li><strong>retbins : (bool, default False)</strong> Whether to return the bins or not. Useful when bins is provided as a scalar.</li>
    </ul>
    
</div>

In [7]:
# Calculate the NPS for each airline
df_pivot = df.pivot_table(values=['CustomerID'], index=['Airline'], columns=['Cust_class'], 
                          aggfunc='count', fill_value=0).reset_index()
df_pivot.columns = [c[1] if c[1] != '' else c[0] for c in df_pivot.columns]

In [8]:
df_pivot['total'] = df_pivot['Detractors'] + df_pivot['Passive'] + df_pivot['Promoters']

In [9]:
df_pivot['NPS'] = (np.floor(df_pivot['Promoters'] / df_pivot['total'] * 100) \
                   - np.floor(df_pivot['Detractors'] / df_pivot['total'] * 100)).astype(int)

In [10]:
# calculate the average and standard deviation of the dataset
nps_mean = df_pivot['NPS'].mean()
nps_std = df_pivot['NPS'].std()

In [11]:
# take each airline's NPS and subtract the average, then divide this by the standard deviation
df_pivot['Z-Score'] = (df_pivot['NPS'] - nps_mean) / nps_std

In [12]:
df_pivot.head()

Unnamed: 0,Airline,Detractors,Passive,Promoters,total,NPS,Z-Score
0,"Abbott, Boyle and Morar",15,22,23,60,13,0.295114
1,"Abbott, Gutkowski and Cummings",29,41,31,101,2,-0.844939
2,Abshire Group,24,32,35,91,12,0.191473
3,Bayer-Collier,23,37,28,88,5,-0.534015
4,"Bernhard, Ernser and Toy",20,32,30,82,12,0.191473


In [13]:
df = df_pivot[['Airline', 'NPS', 'Z-Score']]

In [14]:
df.head()

Unnamed: 0,Airline,NPS,Z-Score
0,"Abbott, Boyle and Morar",13,0.295114
1,"Abbott, Gutkowski and Cummings",2,-0.844939
2,Abshire Group,12,0.191473
3,Bayer-Collier,5,-0.534015
4,"Bernhard, Ernser and Toy",12,0.191473


In [15]:
df.to_csv('WK23-output.csv', index=False)