In [151]:
import numpy as np
import pandas as pd

In [152]:
df = pd.read_csv('Salesperson_Data.csv')
old_scheme = df[df.columns[1]]
new_scheme = df[df.columns[2]]

In [153]:
old_scheme_mean = df.iloc[:,1:2].mean()

In [154]:
new_scheme_mean = df.iloc[:,2:3].mean()

In [155]:
print("Old Scheme Mean: %f, New Scheme Mean: %f " % (old_scheme_mean, new_scheme_mean))

Old Scheme Mean: 68.033333, New Scheme Mean: 72.033333 


In [156]:
import scipy.stats as stats
from scipy.stats import t
N1 = len(old_scheme)
N2 = len(new_scheme)
degrees_of_freedom = (N1 + N2 - 2)
t_val = t.ppf([0.95], degrees_of_freedom)
print("Confidence Interval for Lower Tail Test: ", -t_val[0])

Confidence Interval for Lower Tail Test:  -1.671552762153672


As there are 2 independent samples, and we are not aware of the population standard deviation, I suggest to use the Student T Test for 2 samples, given that we are aware of the means of each sample.
We will form the hypothesis as follows:
    H0 (Null Hypothesis): There has been no significant change in the output
    H1 (Alternate Hypothesis): The new scheme has been effective in significantly raising output
Note that this is a one tail test (lower tail) as we attempt to check whether the new sample indicates significant increase in output, or rather that the old sample is significantly less effective. The alpha value we will use is 0.05 (95% confidence level)

The t-statistic for a lower tail test with significance level 95% will be taken. However, we will first have to undertake the Normality Test for ensuring normal distribution and the Levene test for equal variance, to detect whether there is a variability in variance between the 2 samples of the old and new scheme, with alpha 0.05.

Subject to passing the Normality and Levene test, we shall perform the t test for 2 independent samples and we shall reject the null hypothesis should the t-value be lesser than/outside the confidence interval (-1.672) calculated above and the p-value to fall below alpha i.e. p-value < 0.05, else we will not reject the null hypothesis. Note that the program calculated p-value will be divided by two as it is a one tail test.
Rejecting the null hypothesis would indicate that the new scheme has indeed caused a significant increase in output.

In [157]:
## normality test
## p <= alpha: reject H0, not normal.
## p > alpha: fail to reject H0, normal.

from scipy.stats import shapiro
stat, p = shapiro(old_scheme)
print('Statistics=%.3f, p=%.3f' % (stat, p))


Statistics=0.989, p=0.981


In [158]:
stat, p = shapiro(new_scheme)
print('Statistics=%.3f, p=%.3f' % (stat, p))

Statistics=0.969, p=0.506


In [159]:
## Levene test for Equal Variance
## p <= alpha: reject H0, not equal variance.
## p > alpha: fail to reject H0, equal variance.
import statistics as stats
print("Std. Dev of Old Scheme", stats.stdev(old_scheme))
print("Std. Dev of New Scheme", stats.stdev(new_scheme))

from scipy.stats import levene
levene(old_scheme, new_scheme)

Std. Dev of Old Scheme 20.455980212074454
Std. Dev of New Scheme 24.06239494677769


LeveneResult(statistic=1.063061539437244, pvalue=0.30679836081811235)

Thus, the p-value of both normality tests for old and new schemes and the p-value of the Levene test for equal variance are higher than the alpha value, which indicates that the data is sufficiently clean to undertake the t test for 2 independent samples.

In [160]:
from scipy.stats import ttest_ind

ttest = ttest_ind(old_scheme, new_scheme, equal_var=True)
print('T-statistic: ', ttest.statistic)
print('p-value (One Tail): ', ttest.pvalue/2)

T-statistic:  -0.6937067608923764
p-value (One Tail):  0.24531757843124052


In [167]:
from scipy.stats import t
N1 = len(old_scheme)
N2 = len(new_scheme)
df = (N1 + N2 - 2)
std1 = old_scheme.std()
std2 = new_scheme.std()
std_N1N2 = sqrt( ((N1 - 1)*(std1)**2 + (N2 - 1)*(std2)**2) / df) 

diff_mean_old = old_scheme.mean() - new_scheme.mean()
MoE_old = t.ppf(0.95, df) * std_N1N2 * sqrt(1/N1 + 1/N2)

print ("\nThe difference between groups is \n\
       {:3.3f} [{:3.3f} to {:3.3f}] (mean [95% CI])".format(
        diff_mean_old, diff_mean_old - MoE_old, diff_mean_old + MoE_old))


The difference between groups is 
       -4.000 [-13.638 to 5.638] (mean [95% CI])


Based on the above results, we see that the t-stat value is greater than the confidence interval i.e. t-stat (-0.6937) > -1.672 and the one tailed p-value (0.2453) is greater than alpha. Based on the theory of t-stat and p-value tests for 2 independent samples, we conclude that there is insufficient evidence to prove that the null hypothesis can be rejected. Thus we may not reject the null hypothesis. 

Based on the p-value calculation (0.2453) above, which is greater than alpha (0.05), we cannot reject the null hypothesis. Thus we can say that the new scheme has not been effective in significantly raising output or that there has been no significant change in the output after bringing in the new scheme, based on the sample data provided. We can thus accept H0 as there is not much of a difference.

Q4. Suppose it has been calculated that in order for Titan to break even, the average output must increase by £5000 in the scheme compared to the old scheme.

Ans:
The null and alternate hypothesis would then be,
H0: Titan will not break even on increasing average output by 5000.
H1: Titan will break even on increasing average output by 5000.

Ans 4a) We will take the probability of a type I error i.e. alpha to be 5% or 0.05 for this test. The probability of a type I error indicates the probability where we may reject the null hypothesis, even if it is true.

Now, for the p-value of the new test we shall add all rows of the new scheme by 5 (in thousands) considering the average output will increase by 5 (in thousands), and then run the t test for 2 independent samples, as we had done previously.

In [162]:
#Increase Average Output by 5 (in thousands) for Titan to break even with the new scheme
new_scheme_post_add = new_scheme + 5

Considering the Levene and normality test has been cleared, we shall perform the t test for 2 independent samples and we shall reject the null hypothesis should the t-value be lesser than/outside the confidence interval (-1.672) and the p-value to fall below alpha i.e. p-value < 0.05, else we will not reject the null hypothesis. Note that the program calculated p-value will be divided by two as it is a one tail test.
Rejecting the null hypothesis would indicate that the increasing output by 5000 has allowed Titan to break even.

In [163]:
ttest = ttest_ind(old_scheme, new_scheme_post_add, equal_var=True)
print('T-statistic: ', ttest.statistic)
print('p-value (One Tail): ', ttest.pvalue/2)

T-statistic:  -1.5608402120078468
p-value (One Tail):  0.062000899994376045


In [164]:
N1 = len(old_scheme)
N2 = len(new_scheme_post_add)
df = (N1 + N2 - 2)
std1 = old_scheme.std()
std2 = new_scheme_post_add.std()
std_N1N2 = sqrt( ((N1 - 1)*(std1)**2 + (N2 - 1)*(std2)**2) / df) 
print(std_N1N2)
diff_mean_new = old_scheme.mean() - new_scheme_post_add.mean()
MoE_new = t.ppf(0.95, df) * std_N1N2 * sqrt(1/N1 + 1/N2)

print ("\nThe difference between groups is \n\
       {:3.3f} [{:3.3f} to {:3.3f}] (mean [95% CI])".format(
        diff_mean_new, diff_mean_new - MoE_new, diff_mean_new + MoE_new))

22.33210667415296

The difference between groups is 
       -9.000 [-18.638 to 0.638] (mean [95% CI])


Based on the above results, we see that the t-stat value is greater than the confidence interval i.e. t-stat (-1.561) > -1.672 and the one tailed p-value (0.062) is greater than alpha. Based on the theory of t-stat and p-value tests for 2 independent samples, we conclude that there is insufficient evidence to prove that the null hypothesis can be rejected. Thus we may not reject the null hypothesis.

Based on the p-value calculation (0.062) above, which is greater than alpha (0.05), we cannot reject the null hypothesis. Thus we can say that Titan will not break even on increasing output by 5 (in thousands).

After extensive research on the internet, it was very difficult to find a correct method for the above case to detect the probability of making a type II error, through which we can deduce the power of our test i.e. (1 - type II error). I referred a reference from an R tutorial, and deduced a method, which was to calculate new t scores based on the confidence intervals obtained previously. I used the formula for retrieving the t score for each tail of the confidence interval, and then deduced the cumulative distribution function difference for the 2 scores, for the lower and upper tail. This provided me the probability of a type II error, which I subtracted from 1 to get the power of the test.


In [169]:
from scipy.stats import t
df = (N1 + N2 - 2)
std1 = old_scheme.std()
std2 = new_scheme_post_add.std()
std_N1N2 = sqrt( ((N1 - 1)*(std1)**2 + (N2 - 1)*(std2)**2) / df) 
tleft = ((diff_mean_old - MoE_old) - (diff_mean_new - MoE_new))/std_N1N2
tright = ((diff_mean_old + MoE_old) - (diff_mean_new - MoE_new))/std_N1N2
print(tleft)
print(tright)
typeIIprob = (t.cdf(tright,df=df)) - (t.cdf(tleft,df=df))
power = 1 - typeIIprob
print("Power of Test in Percent (Approximate): ", power*100,"%")

0.22389289434063872
1.0870790290243961
Power of Test in Percent (Approximate):  72.8935774921431 %


Thus, the power of the test is approximately 72.89% which means that if we increase the average output by 5 (in thousands), the probability that we will reject the null hypothesis is 72.89%.