# Statistical Test Notebook
For evaluating the signficance of the improvement in correctly answered scenarios, reduction in "I don't know" and increase helpfulness.

In [1]:
# Imports
import numpy as np
from scipy import stats

## Correctly answered scenarios

In [2]:
# Both days, all scenarios
#wo_va = np.array([32.4, 17.6, 32.4, 2.9, 53.3, 50, 63.3, 36.7])
#w_va = np.array([46.7, 26.7, 56.7, 26.7, 35.3, 73.5, 50, 32.4])

# Both days, without scenarios 5:
#wo_va = np.array([32.4, 17.6, 32.4, 2.9, 50, 63.3, 36.7])
#w_va = np.array([46.7, 26.7, 56.7, 26.7, 73.5, 50, 32.4])

# Both days, without scenarios 5 and 7:
wo_va = np.array([32.4, 17.6, 32.4, 2.9, 50, 36.7])
w_va = np.array([46.7, 26.7, 56.7, 26.7, 73.5, 32.4])

### Using Scipy and NumPy

In [3]:
stats.ttest_rel(a = wo_va, b = w_va)

TtestResult(statistic=-3.262957265749972, pvalue=0.022372269756085163, df=5)

As the p-value is less than 0.05 and the test statistic is negative, we can construct a 95% confidence interval that the results with the use of a Virtual Assistant is better. 

### Using material from the Maastricht Unversity "Simulation and Statistical Analysis" course at the Department of Advanced Computing Sciences.

Especially inspired by the chapter about "Comparing Alternative System Configuration" in "Simulation Modeling and Analysis" by Averill M. Law.

BibTex:
@inbook{Law_2015, edition={5}, title={Comparing Alternative System Configurations}, booktitle={Simulation Modeling and Analysis}, publisher={McGraw-Hill Education}, author={Law, Averill M}, year={2015}, pages={556–586}, language={en} }

In [4]:
#https://www.ttable.org/
t_value = 2.365 #95%
#t_value = 1.895 #90%

In [5]:
def construct_confidence_interval(wo_va, w_va, t_value):
    # Construct an array with the differences
    z = wo_va - w_va
    mean_z = np.mean(z)

    # Estimate the variance
    l = []
    for i in range(len(wo_va)):
        l.append(np.square(z[i] - mean_z))
    l = np.array(l)

    var_est = np.sum(l) / (len(wo_va)*(len(wo_va) - 1))

    # Constructing the confidence interval
    lb = mean_z - t_value * np.sqrt(var_est)
    ub = mean_z + t_value * np.sqrt(var_est)

    return lb, ub

In [6]:
lb, ub = construct_confidence_interval(wo_va, w_va, t_value)
print(f"Confidence interval: ({lb:3f}, {ub:3f})")

Confidence interval: (-26.073266, -4.160067)


As the confidence interval does not include 0, it means that we can reject the null hypothesis which was that the means between the two test versions is the same. In other words, the null hypothesis is that there is no difference in the correctly answered scenarios with and without using the Virtual Assistant. As the confidence interval is on the negative side, it means that the using the virtual assistant performed better, recall that ```z=wo_va-w_va```

## Number of "I don't know" answered scenarios

In [7]:
# Both days, all scenarios:
#wo_va = np.array([29.4, 32.4, 50, 41.4, 30, 30, 3.3, 30])
#w_va = np.array([16.7, 6.7, 16.7, 26.7, 2.9, 5.9, 2.9, 32.4])

# Both days, without scenarios 5:
wo_va = np.array([29.4, 32.4, 50, 41.4, 30, 3.3, 30])
w_va = np.array([16.7, 6.7, 16.7, 26.7, 5.9, 2.9, 32.4])

# Both days, without scenarios 5 and 7:
#wo_va = np.array([29.4, 32.4, 50, 41.4, 30, 30])
#w_va = np.array([16.7, 6.7, 16.7, 26.7, 5.9, 32.4])

### Using Scipy and NumPy

In [8]:
stats.ttest_rel(a = wo_va, b = w_va)

TtestResult(statistic=3.0974346165366513, pvalue=0.021185731908727624, df=6)

As the p-value is less than 0.05 and the test statistic is negative again, we can construct a 95% confidence interval that the results with the use of a Virtual Assistant is better. 

### Using material from the Maastricht Unversity "Simulation and Statistical Analysis" course at the Department of Advanced Computing Sciences.

Especially inspired by the chapter about "Comparing Alternative System Configuration" in "Simulation Modeling and Analysis" by Averill M. Law.

BibTex:
@inbook{Law_2015, edition={5}, title={Comparing Alternative System Configurations}, booktitle={Simulation Modeling and Analysis}, publisher={McGraw-Hill Education}, author={Law, Averill M}, year={2015}, pages={556–586}, language={en} }

In [9]:
lb, ub = construct_confidence_interval(wo_va, w_va, t_value)
print(f"Confidence interval: ({lb:3f}, {ub:3f})")

Confidence interval: (3.665206, 27.334794)


As the confidence interval does not include 0, it means that we can reject the null hypothesis which was that the means between the two test versions is the same. In other words, the null hypothesis is that there is no difference in the amount of "I don't know"s submitted answered in the scenarios with and without using the Virtual Assistant. As the confidence interval is on the positive side, it means that the using the Virtual Assistant performed better, recall that ```z=wo_va-w_va```, but this time we want to avoid giving "I don't know" as an answer.