# Math question

> Suppose that satisfied customers recommend a product with likelihood S;

> and that customers who are not satisfied (incorrectly) recommend the same product with likelihood U.

> Given that the recorded recommendation rate is R, which of the following provide the best estimate of the true fraction of satisfied customers?

Let's call that fraction of satisfied customers _H_, which stands for _Happy_. Given that, we can establish the following definitions:

$R = f(recommend)\ /\ f(recommend  ∨ ¬recommend)$

$U = p(recommend\ |\ ¬satisfied)$

$S = p(recommend\ |\ satisfied)$

$N = \text{size of the population}$

$H_{rec} = \text{satisfied users who made a recommendation}$

$H_{¬rec} = \text{satisfied users who didn't make a recommendation}$



Our goal becomes calculating _H_ in

$$H = |H_{rec}| + |H_{¬rec}|$$

Since we know the recommendation rate _R_ and it is a joint probability (as opposed to the conditional probabilities like _S_ and _U_), we can use it to estimate the total volume of instances. More specifically,

1. $N R$ (_number of users times the recommendation rate_) denotes the total number of recommendations in our sample (both from satisfied and dissatisfied users).

2. $H S$ denotes the number of satisfied users who did make a recommendation (probability _S_).

3. $(N - H)\ U$ denotes the number of dissatisfied users (total users _N_ minus satisfied users _H_) who made a recommendation by error (probability _U_).

Combining all three definitions, we can estipulate that

$$N R = (H S) + ((N - H)\ U)$$

Provided we are given N, we can use this equation to solve for _H_ (the happy users, namely, the number of them who are satisfied), because we already know S, U and R.

Given the following definitions:

$N = 100$

$R = 0.4$

$S = 0.7$

$U = 0.05$

then we can replace in the equation above:

$N\ 0.4 = x\ 0.7\ +\ (N - x)\ 0.05$

and solve for x (_H_)

$H = 53.84$

We estimate `53.84` satisfied users over a population of 100, whereas the test in the cells below yields an empirical value of `52` satisfied users (line `~f(satisfied) 52.00...` at the bottom of the notebook), which approximates our original estimate within 96% Accuracy.

## Create toy dataset

The code cell below defines the dependencies and function `make_events`, which will be used to generate a test sample with a population of events.

For simplicity, it samples from a distribution with conditional probabilities _U_ and _S_.

In [5]:
import random

from collections import Counter


def make_events(population_size, S, U, R, silent=False):
    
    events = [
        # satisfied, recommends
        [1 if random.random() <= 0.5 else 0, 0]
        for _ in range(population_size)
    ]


    for u in events:
        sat, rec = u
        if sat:
            u[1] = (1 if random.random() <= S else 0)
        else:
            u[1] = (1 if random.random() <= U else 0)

        
    satisfied = sum([u[0] for u in events])
    recommends = sum([u[1] for u in events])

    if not silent:
        print('make_events(): total # events', len(events))
        print('make_events(): # satisfied subjects', satisfied)
        print('make_events(): # subjects who recommend', recommends)
    
    return events, satisfied


In [6]:
def apply_formula(population_size, satisfied, S, U, R, silent=False):

    # R = (satisfied * S + dissatisfied * U) / population_size
    # R = (satisfied * S + (population_size - satisfied) * U) / population_size
    
    R_ = ((satisfied * S) + (population_size - satisfied) * U) / population_size
    
    n_recommendations = ((satisfied * S) + (population_size - satisfied) * U)
    do_not_recommend = population_size - n_recommendations
    do_not_recommend_principled = (population_size - satisfied) * (1 - U)
    
    if not silent:
        print('\n\n[1] f(¬recommend, satisfied ∨ ¬satisfied) =', do_not_recommend)
        print('[2] Actual R = ', R)
        print('[3] Estimated R\' = ', R_)
        #print('estimated satisfied', `satisfied` IN EQUATION)

    return R_, do_not_recommend, do_not_recommend_principled


In [7]:
population_size = 100
S = 0.7
U = 0.05
R = 0.4

print('S=%.2f\nR=%.2f\nU=%.2f' % (S, R, U))

events, satisfied = make_events(population_size, S, U, R)

counts = Counter()
for u in events:
    counts[tuple(u)] += 1
    
print('\n\nSatisfied  Recommended  No.   p(sat, rec)   p(rec | sat)')
print('---------  -----------  ---   -----------   ------------')
for u, f in counts.most_common():
    args = (u[0], u[1], f, f / len(events), round(f / (satisfied if u[0] else len(events) - satisfied), 2))
    print('%-10d %-12d %-5d %.2f %13.2f' % args)

R_, do_not_recommend, do_not_recommend_principled = apply_formula(
    population_size, satisfied, S, U, R
)

satisfied_dontrecommend = do_not_recommend - do_not_recommend_principled
print('\n\n~f(¬recommend)', do_not_recommend)
print('~f(¬recommend, ¬satisfied)', do_not_recommend_principled)
print('~f(¬recommend, satisfied)', satisfied_dontrecommend)
print('~f(satisfied)', satisfied_dontrecommend + (satisfied * S))

S=0.70
R=0.40
U=0.05
make_events(): total # events 100
make_events(): # satisfied subjects 52
make_events(): # subjects who recommend 41


Satisfied  Recommended  No.   p(sat, rec)   p(rec | sat)
---------  -----------  ---   -----------   ------------
0          0            45    0.45          0.94
1          1            38    0.38          0.73
1          0            14    0.14          0.27
0          1            3     0.03          0.06


[1] f(¬recommend, satisfied ∨ ¬satisfied) = 61.2
[2] Actual R =  0.4
[3] Estimated R' =  0.38799999999999996


~f(¬recommend) 61.2
~f(¬recommend, ¬satisfied) 45.599999999999994
~f(¬recommend, satisfied) 15.600000000000009
~f(satisfied) 52.00000000000001
