# Project 2

The Titan Insurance Company has just installed a new incentive payment scheme for its lift policy sales force. It wants to have an early view of the success or failure of the new scheme. Indications are that the sales force is selling more policies, but sales always vary in an unpredictable pattern from month to month and it is not clear that the scheme has made a significant difference.

Life Insurance companies typically measure the monthly output of a salesperson as the total sum assured for the policies sold by that person during the month. For example, suppose salesperson X has, in the month, sold seven policies for which the sums assured are £1000, £2500, £3000, £5000, £10000, £35000. X's output for the month is the total of these sums assured, £61,500. Titan's new scheme is that the sales force receives low regular salaries but are paid large bonuses related to their output (i.e. to the total sum assured of policies sold by them). The scheme is expensive for the company, but they are looking for sales increases which more than compensate. The agreement with the sales force is that if the scheme does not at least break even for the company, it will be abandoned after six months. The scheme has now been in operation for four months. It has settled down after fluctuations in the first two months due to the changeover.

To test the effectiveness of the scheme, Titan have taken a random sample of 30 salespeople measured their output in the penultimate month prior to changeover and then measured it in the fourth month after the changeover (they have deliberately chosen months not too close to the changeover). The outputs of the salespeople are shown in Table 1

# Questions

1) Find the mean of old scheme and new scheme column. (5 points)

2) Use the five percent significance test over the data to determine the p value to check new scheme has significantly raised outputs? (10 points)

3) What conclusion does the test (p-value) lead to? (2.5 points)

4) Suppose it has been calculated that in order for Titan to break even, the average output must increase by £5000 in the
scheme compared to the old scheme. If this figure is alternative hypothesis, what is:

    a) The probability of a type 1 error? (2.5 points)

    b) What is the p- value of the hypothesis test if we test for a difference of $5000? (10 points)

    c) Power of the test (5 points)

In [49]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


from scipy import stats
from scipy.stats import ttest_1samp, ttest_ind, ttest_rel
from statsmodels.stats.power import ttest_power

In [55]:
# Reading data from the csv file which is created from the data given in question
titan = pd.read_csv('Project2.csv')
titan.head()

Unnamed: 0,SALESPERSON,OLD_SCHEME,NEW_SCHEME
0,1,57,62
1,2,103,122
2,3,59,54
3,4,75,82
4,5,84,84


In [23]:
# converting into thousands
#titan.OLD_SCHEME=titan.OLD_SCHEME*1000
#titan.NEW_SCHEME=titan.NEW_SCHEME*1000

In [51]:
#1) Find the mean of old scheme and new scheme column.

mean_old = titan['OLD_SCHEME'].mean()
mean_new = titan['NEW_SCHEME'].mean()

print('Mean of Old Scheme:', mean_old)
print('Mean of New Scheme:', mean_new)

Mean of Old Scheme: 68.03333333333333
Mean of New Scheme: 72.03333333333333


In [67]:
#titan.head()

In [53]:
# 2)  Use the five percent significance test over the data to determine the p value to check new scheme has significantly raised outputs?

# Defining null and Alternate Hypothesis before performing the 5% significance test.


# Null Hypothesis(H0): New Scheme has not significantly raised the output i.e. No change in the output(mu1=mu2)
# Alternate Hypothesis(H1): New Scheme has made the changes in output i.e. Significant raise in the output(mu2-mu1>0)

# Setting the value of alpha as .05 as it is 5% significance
alpha =0.05

#Using ttest indicator by passing both the old_scheme and new_scheme data
t,p = ttest_rel(titan['OLD_SCHEME'], titan['NEW_SCHEME'])
print(t,p)

if (p>alpha):
    print("Accept Null Hypothesis")
else:
    print("Reject Null Hypothesis")

-1.5559143823544377 0.13057553961337662
Accept Null Hypothesis


# 3) What conclusion does the test (p-value) lead to?

#There is no significant change due to new scheme in the output. 
#There is insufficient evidence to reject the null hypothesis and hence we will accept the null hypothesis 
#This means we accept Null Hypothesis(H0) and Reject Alternate Hypothesis(H1)
Hence, The new scheme has not significantly raised the output.

# 4 a) The probability of a type 1 error? 

Probability of Type I error = significant level = 0.05 or 5%

In [56]:
#4 b) What is the p- value of the hypothesis test if we test for a difference of $5000?

# Increasing the New Scheme by 5000 by introducing a new column NEW_SCHEME_5
#titan['NEW_SCHEME_5'] = titan['NEW_SCHEME']+5000
titan['NEW_SCHEME_5'] = titan['NEW_SCHEME']+5
titan.head()

Unnamed: 0,SALESPERSON,OLD_SCHEME,NEW_SCHEME,NEW_SCHEME_5
0,1,57,62,67
1,2,103,122,127
2,3,59,54,59
3,4,75,82,87
4,5,84,84,89


In [57]:
# mu1 = mean of sums assured by salesperson BEFORE changeover
# mu2 = mean of sums assured by salesperson AFTER changeover +5000 average increase
# mud = mu2 - mu1
# Null hypothesis(H0): mud <= 5000  
# Alternate Hypothesis(HA): mud > 5000

# alpha = 0.05 as it is 5% signifcance
alpha = 0.05

# Using the t test for paired sample data. 
t,p = ttest_rel(titan['NEW_SCHEME_5'], titan['OLD_SCHEME'])
print('t value:', t)
print('p value:', p)

if (p/2>alpha):
    print("Accept Null Hypothesis")
else:
    print("Reject Null Hypothesis")

t value: 3.500807360297485
p value: 0.0015212146316676064
Reject Null Hypothesis


We reject the null hypothesis and accept our alternate hypothesis, that is, 'Increasing 5000 in the new scheme would break even'

In [58]:
from scipy.stats import pearsonr
pearsonr(titan['OLD_SCHEME'], titan['NEW_SCHEME_5'])

(0.811801956520917, 5.199771231215977e-08)

In [59]:
#4 c) Power of the test 

# Calculating Cohen's D (effect size)

import math  
a1 =30
a2= 30
mu2 = titan['OLD_SCHEME'].mean()
mu1 = titan['NEW_SCHEME_5'].mean()
std1 = titan['NEW_SCHEME_5'].std(ddof =1)
std2 = titan['OLD_SCHEME'].std(ddof =1)
s = math.sqrt(((a1 - 1) * std1 + (a2 - 1) * std2) / (a1 + a2 - 2))

effect_size = (mu1 - mu2) / s
print(effect_size)
print(mu2, mu1)
print(std2,std1)
print(s)



1.907602376383051
68.03333333333333 77.03333333333333
20.455980212074454 24.062394946777697
4.717964346985474


Calculated Cohen's D (effect size). Effect size is coming as 1.90 which is very large. 

In [65]:
# Calculating power of the test
# alpha =.05
# nobs =30
power = ttest_power(effect_size = 1.907, nobs=30, alpha=0.05)
print(power)

1.0
