# Olist's Net Promoter Score (NPS)

The Net Promoter Score (NPS) of a service answers the following question:

>_How likely is it that you would recommend our company/product/service to a friend or colleague?_

For a service rated between 1 to 5 stars as Olist:
- Those who respond with a score of 5 are called **Promoters**
- Those who respond with a score of 4 are called **Passive**
- Those who respond with a score of 1/2/3 are called **Detractors**

NPS is computed by subtracting the percentage of customers who are Detractors from the percentage of customers who are Promoters.

In [1]:
# Usual modules
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
# Import olist data
from olistdash.data import Olist
from olistdash.order import Order

data = Olist().get_data()
orders = Order().get_training_data()

In [5]:
orders.columns

Index(['order_id', 'wait_time', 'expected_wait_time', 'delay_vs_expected',
       'order_status', 'dim_is_five_star', 'dim_is_one_star', 'review_score',
       'number_of_products', 'number_of_sellers', 'price', 'freight_value'],
      dtype='object')

In [3]:
orders.review_score

0        4
1        4
2        5
3        5
4        5
        ..
97010    5
97011    4
97012    5
97013    2
97014    5
Name: review_score, Length: 97007, dtype: int64

% Promoters - % Detractors = (# Promoter - # Detractors) / # Reviews

In [6]:
type(orders.review_score)

pandas.core.series.Series

In [8]:
def promoter_score(x):
    score = 0
    if x == 5:
        score = 1
    if x < 4:
        score = -1
    return score


orders.review_score.map(promoter_score)

0        0
1        0
2        1
3        1
4        1
        ..
97010    1
97011    0
97012    1
97013   -1
97014    1
Name: review_score, Length: 97007, dtype: int64

Computing it in one line

In [14]:
# Option 1
orders.review_score.map(lambda x: 1 if x == 5 else -1 if x < 4 else 0)

0        0
1        0
2        1
3        1
4        1
        ..
97010    1
97011    0
97012    1
97013   -1
97014    1
Name: review_score, Length: 97007, dtype: int64

In [16]:
# Option 2
orders.review_score.map({5:1, 4:0, 3:-1, 2:-1, 1:-1})

0        0
1        0
2        1
3        1
4        1
        ..
97010    1
97011    0
97012    1
97013   -1
97014    1
Name: review_score, Length: 97007, dtype: int64

In [17]:
nps = orders.review_score.map({
    5: 1,
    4: 0,
    3: -1,
    2: -1,
    1: -1
}).sum() / orders.review_score.count()


In [19]:
f'NPS Score = {nps}'

'NPS Score = 0.37439566216870945'

In [20]:
f'NPS Score = {nps*100:.1f}%'

'NPS Score = 37.4%'

## NPS per customer states

### Mean review score per state

### 1. Creating the dataset required for computation

In [22]:
# "chaining" methods in pandas

merge = data['orders']\
.merge(data['order_reviews'], on='order_id')\
.merge(data['customers'], on='customer_id')

len(merge)

100000

In [23]:
merge.nunique()

order_id                         99441
customer_id                      99441
order_status                         8
order_purchase_timestamp         98875
order_approved_at                90733
order_delivered_carrier_date     81018
order_delivered_customer_date    95664
order_estimated_delivery_date      459
review_id                        99173
review_score                         5
review_comment_title              4600
review_comment_message           36921
review_creation_date               637
review_answer_timestamp          99010
customer_unique_id               96096
customer_zip_code_prefix         14994
customer_city                     4119
customer_state                      27
dtype: int64

Grouping by `customer_state` and aggregating "mean" review score

Computing mean review_score per customer_state

In [25]:
# Otion 1
merge.groupby(['customer_state'])['review_score'].mean().head()

customer_state
AC    4.049383
AL    3.731415
AM    4.154362
AP    4.176471
BA    3.834314
Name: review_score, dtype: float64

In [27]:
# Option 2 .apply()
merge.groupby(['customer_state'])['review_score'].apply(np.mean).head()

customer_state
AC    4.049383
AL    3.731415
AM    4.154362
AP    4.176471
BA    3.834314
Name: review_score, dtype: float64

In [28]:
# Option 3 .agg()
merge.groupby(['customer_state']).agg({'review_score': np.mean})\
.rename(columns={'review_score': 'mean_review_score'})\
.head()

Unnamed: 0_level_0,mean_review_score
customer_state,Unnamed: 1_level_1
AC,4.049383
AL,3.731415
AM,4.154362
AP,4.176471
BA,3.834314


### NPS per state
Creating a **custom aggregation function** to directly compute the NPS per customer_state.

In [29]:
def nps(serie):
    #import ipdb; ipdb.set_trace()
    return serie.map(promoter_score).sum() / serie.count()


merge.groupby(['customer_state']).agg({
    'review_score': nps,
}).head()

Unnamed: 0_level_0,review_score
customer_state,Unnamed: 1_level_1
AC,0.296296
AL,0.158273
AM,0.395973
AP,0.294118
BA,0.187169


In [30]:
# One liner version
merge.groupby([
    'customer_state'
])['review_score'].apply(lambda s: s.map(promoter_score).sum() / s.count())


customer_state
AC    0.296296
AL    0.158273
AM    0.395973
AP    0.294118
BA    0.187169
CE    0.201643
DF    0.329167
ES    0.296133
GO    0.297496
MA    0.149134
MG    0.365539
MS    0.373278
MT    0.334066
PA    0.193483
PB    0.303538
PE    0.302102
PI    0.241935
PR    0.392773
RJ    0.241841
RN    0.344969
RO    0.304348
RR    0.043478
RS    0.364990
SC    0.324569
SE    0.202857
SP    0.390307
TO    0.328571
Name: review_score, dtype: float64

# Cheat Sheet


```python
## MAP (for Series)
series.map(function) 
Series.map({mapping dict})

## APPLY (for DataFrame)
df.apply(lambda col: col.max(), axis = 0)     # default axis
df.apply(lambda row: row[‘A’] + row[‘B’], axis = 1)
df.applymap(my_funct_for_indiv_elements)
    df.applymap(lambda x: '%.2f' % x)
```

```python
## GROUPBY
group = df.groupby('col_A')
group.mean()
group.apply(np.mean)
group.agg({
    col_A: ['mean', np.sum],
    col_B: my_custom_sum,
    col_B: lambda s: my_custom_sum(s)
    })

group.apply(custom_mean_function)
```