# Olist's Net Promoter Score (NPS) 🔥

The `Net Promoter Score (NPS) of a service` answers the following question:

>_How likely would you recommend our company/product/service to a friend or colleague?_

For a `service rated between 1 to 5 stars` as Olist, we can **classify customers into three categories** based on their answers:
- ✅ `Promoters` : customers who answered  with a score of 5
- 😴 `Passive` : customers who answered with a score of 4 
- 😡 `Detractors` customers who answered with a score of 1/2/3 

👉 NPS is computed by subtracting the percentage of customers who are Detractors from the percentage of customers who are Promoters.

> NPS  
= % Promoters - % Detractors   
= (# Promoter - # Detractors) / # Reviews  
= (# 5 stars - # <4 stars) / # Reviews

## Computing the overall NPS Score of Olist

In [1]:
#Colab read data from google drive
import os
from google.colab import drive
drive.mount('/content/drive/')
import sys
sys.path.append('/content/drive/MyDrive/Pornpan(Eye)')

Mounted at /content/drive/


In [2]:
# Import the usual modules
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Import Olist data
from olist.data import Olist
from olist.order import Order
data = Olist().get_data()
orders = Order().get_training_data()

In [3]:
orders.review_score

0        4
1        4
2        5
3        5
4        5
        ..
96356    5
96357    4
96358    5
96359    2
96360    5
Name: review_score, Length: 96353, dtype: int64

In [4]:
type(orders.review_score)

pandas.core.series.Series

👉 Compute a function that converts a `review_score` into `promoter_score` and apply it to the review scores.

In [5]:
def promoter_score(x):
    # $CHALLENGIFY_BEGIN
    score = 0
    if x == 5:
        score = 1
    if x < 4:
        score = -1
    return score


orders.review_score.map(promoter_score)
    # $CHALLENGIFY_END

0        0
1        0
2        1
3        1
4        1
        ..
96356    1
96357    0
96358    1
96359   -1
96360    1
Name: review_score, Length: 96353, dtype: int64

😏 Instead of using this function, try to do the same task in one single line of code.

*There are possible ways to do it.*

Two general principles when it comes to programming/coding are:
- `KISS` : **K**eep **I**t **S**imple and **S**mart
- `DRY` : **D**on't **R**epeat **Y**ourself 😉

👇 Now that you have the different promoter scores, you can compute `Olist's NPS`.

In [6]:
nps = orders.review_score.map({5:1, 4:0, 3:-1, 2:-1, 1:-1}).sum() / orders.review_score.count()

In [7]:
# Display as percentage to 1 decimal place e.g "NPS score = 47.8%"

In [8]:
f'NPS Score = {nps}'

'NPS Score = 0.381430780567289'

In [9]:
f'NPS Score = {nps*100:.1f}%'

'NPS Score = 38.1%'

## NPS per customer states

### What is the average review score per state ?

❓First, create the dataset required for computation

In [10]:
# Practice "chaining" methods in pandas

In [11]:
merge = data['orders']\
.merge(data['order_reviews'], on='order_id')\
.merge(data['customers'], on='customer_id')

len(merge)


99224

In [12]:
merge.nunique()

order_id                         98673
customer_id                      98673
order_status                         8
order_purchase_timestamp         98115
order_approved_at                90082
order_delivered_carrier_date     80451
order_delivered_customer_date    95022
order_estimated_delivery_date      459
review_id                        98410
review_score                         5
review_comment_title              4527
review_comment_message           36159
review_creation_date               636
review_answer_timestamp          98248
customer_unique_id               95380
customer_zip_code_prefix         14973
customer_city                     4117
customer_state                      27
dtype: int64

👉 Now, we can aggregate this dataset per  `customer_state` using aggregation methods of our choice :)

❓ Let's start with the average review score: Compute the average `review_score` per `customer_state`.


In [13]:
# Compute the average review_score per customer_state
merge.groupby(['customer_state'])['review_score'].mean().head()

customer_state
AC    4.049383
AL    3.751208
AM    4.183673
AP    4.194030
BA    3.860888
Name: review_score, dtype: float64

In [14]:
# Use .apply() to do the same thing
merge.groupby(['customer_state'])['review_score'].apply(np.mean).head()

customer_state
AC    4.049383
AL    3.751208
AM    4.183673
AP    4.194030
BA    3.860888
Name: review_score, dtype: float64

In [15]:
# Try with .agg(). It's much more flexible!
merge.groupby(['customer_state']).agg({'review_score': np.mean})\
.rename(columns={'review_score': 'mean_review_score'})\
.head()

Unnamed: 0_level_0,mean_review_score
customer_state,Unnamed: 1_level_1
AC,4.049383
AL,3.751208
AM,4.183673
AP,4.19403
BA,3.860888


🤩 `.agg()` is much more flexible than the other methods, push it further !

In [16]:
merge.groupby(['customer_state']).agg({
    'review_score': [max, np.mean],
    'customer_zip_code_prefix': [pd.Series.nunique]
}).head()\
#.loc[:, ('review_score', 'max')]

Unnamed: 0_level_0,review_score,review_score,customer_zip_code_prefix
Unnamed: 0_level_1,max,mean,nunique
customer_state,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
AC,5,4.049383,20
AL,5,3.751208,126
AM,5,4.183673,55
AP,5,4.19403,18
BA,5,3.860888,734


### NPS per state
❓Now, it is time to create a 🔥 **custom aggregation function** to  compute the `NPS per customer_state` directly.

1️⃣ Create your `nps` function 

2️⃣ Try to debug it using the `breakpoint()` debugger within your function to understand clearly what objects you are manipulating <br/>

💡 *PS: always exit your debugger in a clean way by typing "exit" when debugging. Otherwise you will have to restart your Notebook.*

In [17]:
def nps(serie):
    #import breakpoint()
    return serie.map(promoter_score).sum()/serie.count()

👉 Now, use your `nps` function to compute the `NPS per customer_state`.

In [18]:
merge.groupby(['customer_state']).agg({
    'review_score': nps,
}).head()

Unnamed: 0_level_0,review_score
customer_state,Unnamed: 1_level_1
AC,0.296296
AL,0.166667
AM,0.414966
AP,0.313433
BA,0.199881


😏 Again, instead of using this function, try to do the same task in one line of code, remember the `KISS` principle :) ?

In [19]:

merge.groupby(['customer_state'])['review_score'].apply(lambda s: s.map(promoter_score).sum() / s.count())

customer_state
AC    0.296296
AL    0.166667
AM    0.414966
AP    0.313433
BA    0.199881
CE    0.207675
DF    0.333799
ES    0.311012
GO    0.305830
MA    0.156836
MG    0.372989
MS    0.374309
MT    0.343300
PA    0.205579
PB    0.306968
PE    0.312272
PI    0.250509
PR    0.398769
RJ    0.253349
RN    0.358921
RO    0.301587
RR    0.043478
RS    0.368959
SC    0.332597
SE    0.206304
SP    0.397481
TO    0.326165
Name: review_score, dtype: float64

# Cheat Sheet for `map`, `apply`, `applymap` and `groupby`

```python
## MAP (for Series)
series.map(function) 
Series.map({mapping dict})

## APPLY (for DataFrame)
df.apply(lambda col: col.max(), axis = 0)     # default axis
df.apply(lambda row: row[‘A’] + row[‘B’], axis = 1)
df.applymap(my_funct_for_indiv_elements)
    df.applymap(lambda x: '%.2f' % x)
```

```python
## GROUPBY
group = df.groupby('col_A')
group.mean()
group.apply(np.mean)
group.agg({
    col_A: ['mean', np.sum],
    col_B: my_custom_sum,
    col_B: lambda s: my_custom_sum(s)
    })

group.apply(custom_mean_function)
```