In [28]:
import numpy as np

score = np.loadtxt('survey.txt', dtype='int')
score

array([ 7, 10,  5, ...,  5,  9, 10])

In [29]:
# How's data is looking?
score[:5]

array([ 7, 10,  5,  9,  9])

In [30]:
# Let's check sanity of data
print(score.min())
print(score.max())

1
10


In [31]:
# Shape
score.shape

(1167,)

#### Imagine you are a Data Analyst @ Airbnb

You've been asked to analyze user survey data and report NPS to the management

#### But, what exactly is NPS? 

#### Have you seen something like this ? 

Link: https://drive.google.com/file/d/1-u8e-v_90JdikorKsKzBM-JJqoRtzsN8/view?usp=sharing

<img src="https://drive.google.com/uc?id=1-u8e-v_90JdikorKsKzBM-JJqoRtzsN8">

This is called **Likelyhood to Recommend Survey**

- Responses are given a scale ranging from 0–10, 
    - with 0 labeled with “Not at all likely,” and 
    - 10 labeled with “Extremely likely.”

Based on this, we calculate the Net Promoter score

#### How to calculate NPS score? 

<img src="https://drive.google.com/uc?id=1KPIYlaN68vlL99iApaF5QbeBoyT24-Eu">

We label our responses into 3 categories:
- Detractors: Respondents with a score of 0-6
- Passive: Respondents with a score of 7-8
- Promoters: score of 9-10.

And
```
Net Promoter score = % Promoters - % Detractors.
```

In [32]:
# No. of Detractors
detractors = score[score <= 6].shape[0]
promoters = score[score >= 9].shape[0]

print(round((promoters / score.shape[0] * 100) - (detractors / score.shape[0] * 100)))

24


## How'll we bin our data ?

## Will this work ? 

In [33]:
score[score <= 6] = 'Detractors'

ValueError: invalid literal for int() with base 10: 'Detractors'

#### Why didn't the above code work?

Recall the array are of homogenours datatype
- `dtype` of our array is int

We are trying to assign string to int array; Hence, it is throwing an error

#### So, what do we do?

What if we create an array of same length as `score` array and assign values to new array based on values present in `score` array.

#### How do we initialize new array based on length of preexisting array ?
Numpy provides us with a method to initialize empty array : `np.empty()`

It takes the following arguments:
- shape
- dtype

#### Question: What will be the shape and dtype of new array ?



In [37]:
arr = np.empty(shape = score.shape, dtype = 'str')
arr

array(['', '', '', ..., '', '', ''], dtype='<U1')

Notice the following
- All the elements of the array are empty string
- But, the dtype is being shown as `U1`.

Didn't we initialize the dtype as string?

#### Why is the dtype being shown as `<U1` ?

`U1` means Unicode string of length 1.

Whenever we initialize the array with `str` datatype, it automatically initializes it of type Unicode string with length 1.

#### Question: What will happen in following case? Will the string be assigned to the 0th index ? 

In [39]:
arr[0] = 'Hello'
arr

array(['H', '', '', ..., '', '', ''], dtype='<U1')

Notice that,
- as the length is defined as 1
- it automatically truncates the rest of string and only stores the first character.

But, we want to store whole string 'Detractor/Promoter/Passive'.

#### How do we change the cap on length of string ?

In [40]:
arr = np.empty(shape = score.shape, dtype = 'U10')
arr

array(['', '', '', ..., '', '', ''], dtype='<U10')

In [42]:
#  Check shape
arr.shape[0]

1167

In [46]:
arr[score <= 6] = "Detractors"

arr

array(['', '', 'Detractors', ..., 'Detractors', '', ''], dtype='<U10')

In [48]:
arr[(score >= 7) & (score <= 8)] = "Passives"
arr

array(['Passives', '', 'Detractors', ..., 'Detractors', '', ''],
      dtype='<U10')

In [49]:
arr[score >= 9] = "Promoters"
arr

array(['Passives', 'Promoters', 'Detractors', ..., 'Detractors',
       'Promoters', 'Promoters'], dtype='<U10')

In [50]:
arr[:15]

array(['Passives', 'Promoters', 'Detractors', 'Promoters', 'Promoters',
       'Detractors', 'Passives', 'Promoters', 'Promoters', 'Promoters',
       'Promoters', 'Detractors', 'Promoters', 'Promoters', 'Passives'],
      dtype='<U10')

Now, we have array with desired values.

#### How do we count the number of instance for each value ?

There are two ways of doing it.

Let's look at long way first. 

We do fancy indexing for each unique value and get the shape 

In [53]:
new_detractors = arr[arr == "Detractors"].shape[0]
new_detractors

332

In [54]:
new_promoters = arr[arr == "Promoters"].shape[0]
new_promoters

609

In [55]:
NPS = np.round(((new_promoters / arr.shape[0]) * 100) - ((new_detractors / arr.shape[0]) * 100))
NPS

24.0

Now, there's a short way as well.

Numpy provides us a function `.unique()` to get unique element 

In [63]:
unique, count = np.unique(arr, return_counts=True)
unique

array(['Detractors', 'Passives', 'Promoters'], dtype='<U10')

In [64]:
count

array([332, 226, 609], dtype=int64)

<img src="https://drive.google.com/uc?id=1FYgRM2XmJs4Rv-l8CCUn_aXcyw7GfZlp">

Source: https://chattermill.com/blog/what-is-a-good-nps-score/

#### (Optional) Industry wise NPS benchmark

<img src="https://drive.google.com/uc?id=1vyRFRpHMC7LJ6MNB_K7Mih7pR_m7t-_A">
