## Instructions: https://docs.google.com/document/d/11v_d1bkFbTM4N3WU6JAbdR_9_eUChrN32-mdQ9zKaOw/preview


### NPS background: https://hbr.org/2003/12/the-one-number-you-need-to-grow 

"Retention rates provide, in many industries, a valuable link to profitability, but their relationship to growth is tenuous. That’s because they basically track customer defections—the degree to which a bucket is emptying rather filling up "

"'How likely is it that you would recommend [company X] to a friend or colleague?' ranked first or second in 11 of the 14 cases studies. And in two of the three other cases, “would recommend” ranked so close behind the top two predictors that the surveys would be nearly as accurate by relying on results of this single question."


### Project: 

“On a scale of 1 to 10, how likely are you to recommend [X] to a friend or colleague?”

NPS segments all responses between 1 and 10 into three categories based on their sentiment:
Promoter (9 – 10)
Passive (7 – 8)
Detractor (1 – 6)

## NPS = 
### (Promoters - Detractors) / (Promoters + Passives + Detractors)

NPS scores range from -100 (all detractors) to +100 (all promoters)


## Import Libraries & CSV File ... learn about the data

In [436]:
import numpy as np
import pandas as pd

df = pd.read_csv('Datasets/SA Feedback Surveys_FINAL/2017/Student Feedback Surveys-Superview.csv')
df.head(10)

Unnamed: 0,ID,Location,Track,Week,Rating (Num),Schedule Pacing
0,134,San Francisco,"Apps, Explorer",Week 1,3,Just right
1,36,Los Angeles,Apps,Week 1,4,A little too fast
2,117,San Francisco,Games,Week 1,4,Way too slow
3,253,,,Week 2,4,A little too fast
4,350,New York City,"Apps, Explorer",Week 1,4,Just right
5,23,Redwood City,Apps,Week 1,5,Just right
6,28,Los Angeles,Apps,Week 7,5,Just right
7,65,San Francisco,Apps,Week 1,5,A little too slow
8,101,Santa Clara,Apps,Week 1,5,A little too slow
9,124,Santa Clara,Apps,Week 1,5,Just right


In [437]:
df.shape

(1453, 6)

In [438]:
df.columns

Index(['ID', 'Location', 'Track', 'Week', 'Rating (Num)', 'Schedule Pacing'], dtype='object')

## Identify & learn about relevant column 'Ratings' for calculating NPS

In [439]:
# Rename for ease
df.rename(columns={'Rating (Num)':'Rating'}, inplace=True)

# Store ratings df in a variable
ratings_df = df[['Rating']]
ratings_df

Unnamed: 0,Rating
0,3
1,4
2,4
3,4
4,4
...,...
1448,10
1449,8
1450,10
1451,1


In [440]:
ratings_df.describe()

Unnamed: 0,Rating
count,1453
unique,12
top,8
freq,392


In [441]:
ratings_df.max()  # ????

Rating    9.0
dtype: float64

In [442]:
df['Rating'].unique()

array(['3', '4', '5', '6', '7', '8', '9', '10', '0', '1', '2', '#ERROR!'],
      dtype=object)

In [443]:
df['Rating'].value_counts()

8          392
9          384
10         376
7          177
6           59
5           35
4           13
3            8
#ERROR!      3
1            2
2            2
0            2
Name: Rating, dtype: int64

### Separate the responses into their respective category of Promoter, Passive, or Detractor. We disregard '#ERROR!' values and consider 0 values as 1s. 

In [444]:
df.loc[df['Rating']=='#ERROR!']

Unnamed: 0,ID,Location,Track,Week,Rating,Schedule Pacing
1310,1356,,,Week 2,#ERROR!,
1322,1368,,,Week 3,#ERROR!,
1411,1458,,,Week 3,#ERROR!,


In [445]:
clean_ratings_df = df[['Rating']].replace('#ERROR!', '-1')
clean_ratings_df

Unnamed: 0,Rating
0,3
1,4
2,4
3,4
4,4
...,...
1448,10
1449,8
1450,10
1451,1


In [446]:
clean_ratings_df.loc[clean_ratings_df['Rating']=='#ERROR!']

Unnamed: 0,Rating


In [447]:
clean_ratings_df['Rating'].value_counts()

8     392
9     384
10    376
7     177
6      59
5      35
4      13
3       8
-1      3
1       2
2       2
0       2
Name: Rating, dtype: int64

In [448]:
clean_ratings_df.Rating.dtype

dtype('O')

In [449]:
rating_ints = pd.to_numeric(clean_ratings_df.Rating)

In [450]:
rating_ints.dtype

dtype('int64')

In [451]:
PROMOTERS = rating_ints[rating_ints > 8] # 9 & 10
PASSIVES = rating_ints[(rating_ints > 6) & (rating_ints < 9)] # 7 & 8
DETRACTORS = rating_ints[(rating_ints > -1) & (rating_ints < 7)] # 0 - 6

In [452]:
PROMOTERS 

256      9
257      9
258      9
259      9
260      9
        ..
1444     9
1445     9
1447    10
1448    10
1450    10
Name: Rating, Length: 760, dtype: int64

In [453]:
384 + 376

760

In [454]:
PASSIVES 

44      7
45      7
46      7
47      7
48      7
       ..
1438    7
1440    8
1441    7
1449    8
1452    8
Name: Rating, Length: 569, dtype: int64

In [455]:
177 + 392

569

In [456]:
DETRACTORS

0       3
1       4
2       4
3       4
4       4
       ..
1376    6
1387    6
1407    5
1446    3
1451    1
Name: Rating, Length: 121, dtype: int64

In [457]:
rating_ints.value_counts()

 8     392
 9     384
 10    376
 7     177
 6      59
 5      35
 4      13
 3       8
-1       3
 2       2
 1       2
 0       2
Name: Rating, dtype: int64

In [458]:
6 + 8 + 13 + 35 + 59

121

## NPS = (Promoters - Detractors) / (Promoters + Passives + Detractors)

In [466]:
PROM = len(PROMOTERS)
PASS = len(PASSIVES)
DET = len(DETRACTORS)

In [467]:
NPS = (PROM - DET) / (PROM + PASS + DET)
NPS

0.4406896551724138

In [None]:
def retrieve_clean_ratings(series):
    '''Removes #ERROR! and converts Ratings to type int'''
    df = df.loc[df['Rating'] != '#ERROR!']
    df_ratings = pd.to_numeric(df.Rating)
    return df_ratings

In [465]:
def categorize(series):
    for item in series.Rating:
        if item 

In [None]:
def calculate_NPS():
    
    return NPS