Problem Statement -
The Titan Insurance Company has just installed a new incentive payment scheme for its lift policy sales force. It wants to have an early view of the success or failure of the new scheme. Indications are that the sales force is selling more policies, but sales always vary in an unpredictable pattern from month to month and it is not clear that the scheme has made a significant difference.

Life Insurance companies typically measure the monthly output of a salesperson as the total sum assured for the policies sold by that person during the month. For example, suppose salesperson X has, in the month, sold seven policies for which the sums assured are £1000, £2500, £3000, £5000, £10000, £35000. X's output for the month is the total of these sums assured, £61,500. Titan's new scheme is that the sales force receives low regular salaries but are paid large bonuses related to their output (i.e. to the total sum assured of policies sold by them). The scheme is expensive for the company, but they are looking for sales increases which more than compensate. The agreement with the sales force is that if the scheme does not at least break even for the company, it will be abandoned after six months.

The scheme has now been in operation for four months. It has settled down after fluctuations in the first two months due to the changeover.

To test the effectiveness of the scheme, Titan have taken a random sample of 30 salespeople measured their output in the penultimate month prior to changeover and then measured it in the fourth month after the changeover (they have deliberately chosen months not too close to the changeover).

In [None]:
import numpy as np
import pandas as pd
from scipy.stats import ttest_1samp,wilcoxon,shapiro,levene
from statsmodels.stats.power import ttest_power
import scipy.stats as st

In [135]:
sales=pd.read_csv("sales.csv")

In [136]:
sales

Unnamed: 0,SALESPERSON,Old Scheme (in thousands),New Scheme (in thousands)
0,1,57,62
1,2,103,122
2,3,59,54
3,4,75,82
4,5,84,84
5,6,73,86
6,7,35,32
7,8,110,104
8,9,44,38
9,10,82,107


# Find the mean of old scheme and new scheme column.

In [137]:
#mean of Old Scheme
sales["Old Scheme (in thousands)"].mean()

68.03333333333333

In [138]:
#mean of new scheme
sales["New Scheme (in thousands)"].mean()

72.03333333333333

# Use the five percent significance test over the data to determine the p value to check new scheme has significantly raised outputs?

In [139]:
old_scheme=sales["Old Scheme (in thousands)"]
new_scheme=sales["New Scheme (in thousands)"]

In [140]:
#first we need to determine the normality and equality of variances of old and new scheme using shapiro test
#Null hypothesis:old scheme distribution is normally distributed
statistics,p_value=shapiro(old_scheme)
if p_value>0.05:
    print("old_scheme is normally distributed",p_value)
else:
    print("old scheme is not normally distributed",p_value)


old_scheme is normally distributed 0.9813658595085144


In [141]:
#Null hypothesis:new scheme distribution is nomally distributed
statistics,p_value=shapiro(new_scheme)
if p_value>0.05:
    print("new scheme is normally distributed",p_value)
else:
    print("new scheme is not normally distributed",p_value)
   

new scheme is normally distributed 0.5057420134544373


In [142]:
#Equality Of Variances - Levene's test 
statistic,p_value=levene(old_scheme,new_scheme)
if p_value>0.05:
    print("both schemes have same variances",p_value)
else:
    print("both schemes don't have same varinaces")

both schemes have same variances 0.30679836081811235


In [143]:
#Now as we know from above shapiro and levene test that both the schemes are normally distributed and have equal variances.
#so we can go for ttest_1samp(parametric) test because given sample is an example of PAIRED class
#null hypothesis: mean of old scheme and new scheme are equal 
t_statistic,p_value=ttest_1samp(new_scheme-old_scheme,0)
print(p_value)
print("p_value %.3f shows that we can't reject null hypothesis as p_value >0.05" %(p_value))

0.13057553961337662
p_value 0.131 shows that we can't reject null hypothesis as p_value >0.05


# What conclusion does the test (p-value) lead to?

In [144]:
#Conclusion drawn from p-value test is that we can reject null hypothesis as p-value is greater than 5%(0.05).
#it means old scheme and new scheme distributions have equal mean and the new scheme has not raised outputs significantly

# Suppose it has been calculated that in order for Titan to break even, the average output must increase by £5000 in the scheme compared to the old scheme. If this figure is alternative hypothesis, what is:

# a) The probability of a type 1 error?

In [145]:
#Probaility of type 1 error is 5%

# b) What is the p- value of the hypothesis test if we test for a difference of $5000?

In [146]:
#it is said that output must increase by 5000 in the scheme compared to the old scheme ,so we can add 5000 euros in old scheme.
#then our null hypothesis would be mean(new_scheme)-mean(old_scheme) = zero it means new scheme has been increase by 5000 euros.
#because if we add 5000 euros in old scheme then there should not be any difference in old scheme and new scheme.

#Null hypothesis: u2-u1 = 0(after adding 5k euros in old scheme)
#Alternative hypothesis: u2-u1 is not equal to 0

old_scheme=old_scheme+5
t_statistic,p_value=ttest_1samp(new_scheme-old_scheme,0)

In [147]:
print(p_value)
#it shows that we can't reject null hypothesis that is new scheme does increase by 5000 euros as there is no difference between
#old scheme(after adjustment) and new scheme.

0.7001334912613286


In [132]:
t_statistic

-0.3889785955886094

In [133]:
st.ttest_rel(old_scheme, new_scheme)

Ttest_relResult(statistic=0.3889785955886094, pvalue=0.7001334912613286)

# c) Power of the test

In [148]:
(np.mean(new_scheme)-np.mean(old_scheme))/np.sqrt(((30-1)*np.var(old_scheme)+(30-1)*np.var(new_scheme))/30+30-2)

-0.03227388878292411

In [149]:
print(ttest_power(0.129,nobs=30,alpha=0.05,alternative='two-sided'))

0.10498428747328836
