# Statistical Tests 

In [2]:
import numpy as np
from statsmodels.stats.weightstats import ztest, ttest_ind

In this exercise, we will practice performing Z-tests to test hypotheses in various tasks. For correct implementation and interpretation of the tests, I recommend that you first review the documentation:

https://www.statsmodels.org/dev/generated/statsmodels.stats.weightstats.ztest.html

### Task 1: Z-test for one sample

**Goal**: Check whether the sample mean differs from the hypothetical mean.

**Assumption**: The average height of university students is assumed to be 170 cm. We want to check whether the average height of a random sample of students is significantly higher. The sample is given in the `heights` variable.

To complete the task, import the `ztest` method, perform the Z-test from Python, output the p-value, and conclude whether the assumption about the average height of students at the 0.05 significance level is true?

We use the Z-test here because the sample is large enough (more than 30 samples) and we can assume that its standard deviation is close to the standard deviation of the general population, and therefore is known.

In [3]:
heights = [174, 171, 175, 179, 170, 170, 179, 175, 169, 174, 169, 169, 173, 162, 163, 169, 166,
           173, 167, 164, 179, 170, 172, 164, 169, 175, 169, 174, 169, 169, 173, 162, 177]

In [4]:
mean_height = np.mean(heights)
mean_null = 170
alpha = 0.05
std = np.std(heights)
n = len(heights)
st = std / np.sqrt(n)
print(f"mean: {mean_height:.2f}, std: {std:.2f}, number of records: {n:.2f}")

mean: 170.70, std: 4.65, number of records: 33.00


In [5]:
z_test, p_value = ztest(heights, value=mean_null, alternative='larger')
print(f'Z-test: {z_test:.4f}, p-value: {p_value:.4f}')

Z-test: 0.8482, p-value: 0.1982


In [6]:
if p_value < alpha:
    print('H0 is rejected. Accept H1')
    print('Mean height is significantly greater than 170')
else:   
    print('H0 is not rejected. Accept H0')  
    print('Mean height is NOT significantly greater than 170')  

H0 is not rejected. Accept H0
Mean height is NOT significantly greater than 170


### Conclusion of HW1
- The main goal was to check whether the sample mean is different from the hypothetical mean.
- Hypothetical mean = 170
- The mean for our sample is 170.70 (with std = 4.65),
- In order to check whether the height of our sample of students is statistically higher, we should use the setting: alternative='larger', in this case:
    - *Null hypothesis*: The sample mean is equal to or less than 170 cm
    - *Alternative hypothesis*: The sample mean is greater than 170 cm
- Based on ztest, we have the following Z-test calculations: 0.8482, p-value: 0.1982, at a given significance level of 0.05
- The calculated p-value is *greater* than the given significance level, so we should accept the null hypothesis. Which says that the sample mean is equal to or less than 170 cm
- at this stage we do not have sufficient evidence that the sample mean is significantly greater than 170

### Task 2: Z-test for two independent samples

**Task 2.1.**

**Objective**: To test whether there is a statistical difference between the mean scores of two groups of students.

**Assumption**: Group A received a new course of study and Group B continued with the standard course. We are testing whether the new course is more effective.

Perform a Z-test with Python, output the p-value, and conclude whether the assumption about the students is true at the 0.05 significance level?

In [7]:
group_a_scores = [78.55, 72.25, 79.88, 75.  , 76.54, 74.99, 87.26, 77.93, 72.71,
       82.11, 71.9 , 79.04, 68.2 , 71.36, 78.98, 81.69, 78.86, 77.42,
       76.49, 70.61, 74.4 , 75.7 , 83.29, 79.72, 69.18, 79.62, 76.07,
       74.62, 81.06, 83.15, 82.66, 73.8 , 76.45, 79.66, 82.88, 75.6 ,
       77.07, 72.47, 72.02, 82.06]

group_b_scores = [81.78, 74.64, 80.02, 76.81, 71.77, 76.81, 82.69, 74.82, 82.82,
       61.9 , 79.11, 75.44, 73.5 , 75.46, 65.06, 73.9 , 76.79, 82.39,
       72.41, 70.96, 72.49, 79.58, 76.64, 72.35, 77.57, 75.49, 79.84,
       71.49, 73.36, 73.04, 67.68, 76.48, 76.31, 75.03, 73.83, 67.92,
       72.9 , 73.29, 70.99, 74.19]

In [8]:
mean_A = np.mean(group_a_scores)
mean_B = np.mean(group_b_scores)
std_A = np.std(group_a_scores)
std_B = np.std(group_b_scores)
n_A = len(group_a_scores)
n_B = len(group_b_scores)

alpha_2 = 0.05

print(f"mean A: {mean_A:.2f}, std A: {std_A:.2f}, number of records A: {n_A:.2f}")
print(f"mean B: {mean_B:.2f}, std B: {std_B:.2f}, number of records B: {n_B:.2f}")

mean A: 77.08, std A: 4.31, number of records A: 40.00
mean B: 74.74, std B: 4.46, number of records B: 40.00


In [9]:
z_test_2, p_value_2 = ztest(group_a_scores, group_b_scores, value=0, alternative='two-sided')
print(f'Z-test: {z_test_2:.4f}, p-value: {p_value_2:.4f}')

Z-test: 2.3574, p-value: 0.0184


In [10]:
if p_value_2 < alpha_2:
    print('H0 is rejected. Accept H1')
    print('Mean scores of group A and group B are significantly different')
else:
    print('H0 is not rejected. Accept H0')
    print('Mean scores of group A and group B are NOT significantly different')    

H0 is rejected. Accept H1
Mean scores of group A and group B are significantly different


In [22]:
z_test_2, p_value_2 = ztest(group_a_scores, group_b_scores, value=0, alternative='two-sided')
print(f'Z-test: {z_test_2:.4f}, p-value: {p_value_2:.4f}')

Z-test: 2.3574, p-value: 0.0184


### Conclusion HW 2_1
- Check whether there is a statistical difference between the mean scores of two groups of students.
- Given that the statistics show
    - for group A: mean A: 77.08, std A: 4.31
    - for group B: mean B: 74.74, std B: 4.46
- In order to test the relationship in the two samples, we will form hypotheses:
    - **Null hypothesis**: The mean values ​​for both groups are the same
    - **Alternative hypothesis**: The mean values ​​for the groups are different

- Based on ztest, we have the following calculations Z-test: 2.3574, p-value: 0.0184, at a given significance level of 0.05

- The calculated p-value is *less* than the given significance level, so we must reject the null hypothesis (and accept the alternative one).
- Therefore, we can say that there is a statistically significant difference between the mean scores of group A and group B.
- In order to find out whether group A has better results, we can use (alternative='larger'). In our case, we see from the statistics that the mean score for group A (77.08) is higher than the mean score for group B (74.74). So the new course is more effective

**Task 2.2.**

**Goal**: To see how sample size affects the test result.

**Task**: Imagine that from group A we have only the first 5 records, and from group B - all. This could happen if we already have the results of the students' tests under the previous program, and only 5 students have passed the tests under the new program so far and we decided not to wait any longer.
Select the first 5 records for group A and run a t-test (for this we use the following [method](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html)). Output the p-value. Draw a conclusion, what result do you see regarding the stat. significance at the 0.05 level for this experiment?

Here we run a t-test because one of the samples is very small.

In [12]:
group_a_scores_short = group_a_scores[:5]

mean_A_short = np.mean(group_a_scores_short)
std_A_short = np.std(group_a_scores_short)
n_A_short = len(group_a_scores_short)

print(f"mean A short: {mean_A_short:.2f}, std A short: {std_A_short:.2f}, number of records A short: {n_A_short:.2f}")
print(f"mean B: {mean_B:.2f}, std B: {std_B:.2f}, number of records B: {n_B:.2f}")

mean A short: 76.44, std A short: 2.68, number of records A short: 5.00
mean B: 74.74, std B: 4.46, number of records B: 40.00


In [15]:
t_test = ttest_ind(group_a_scores_short, group_b_scores, alternative='two-sided')
p_value_ttest = t_test[1]
print(f'T-test: {t_test[0]:.4f}, p-value: {t_test[1]:.4f}')

T-test: 0.8168, p-value: 0.4185


In [16]:
if p_value_ttest < alpha_2:
    print('H0 is rejected. Accept H1')
    print('Mean scores of group A and group B are significantly different')
else:
    print('H0 is not rejected. Accept H0')
    print('Mean scores of group A and group B are NOT significantly different')    

H0 is not rejected. Accept H0
Mean scores of group A and group B are NOT significantly different


### Conclusion HW 2_2
- We need to understand how sample size affects the test value.
- Statistics:
    - for group A: mean = 76.44, std = 2.68, number of records = 5
    - for group B: mean = 74.74, std = 4.46, number of records = 40
- In order to test the relationship in two samples, we will form hypotheses:
    - **Null hypothesis**: The mean values ​​for both groups are the same
    - **Alternative hypothesis**: The mean values ​​for the groups are different
- Because we have a small sample, we use the t test (T-test: 0.8168, p-value: 0.4185)

- The calculated p-value is *Greater* than the given significance level (0.05), so we cannot reject the null hypothesis.
- Thus, by reducing the number of records in group A, we can no longer claim that we have a statistically significant difference between the groups.
- Therefore, even if there is a real difference between the two samples (which was the case in the previous case), with an insufficient number of records we can get a statistically insignificant result

### Task 3: Z-test for two **related** samples

**Goal**: To test whether training has affected employee productivity when all employees have received the training.

**Assumption**: Our employees receive the same training, and we want to find out if their performance improved after training at the 0.05 significance level. Run a Z-test in Python, output the p-value, and conclude whether the employees' performance improved at the 0.05 significance level?

Note that these samples are related, so it would not be correct to run a z-test between them, instead, we can compare whether the difference in the results (employee productivity) is statistically significant.

In [17]:
before_training = [57.82, 37.63, 36.8 , 55.22, 52.97, 52.5 , 53.46, 43.2 , 52.32,
       52.93, 42.86, 68.66, 54.74, 38.09, 56.57, 40.25, 57.87, 61.59,
       41.79, 59.63, 54.13, 58.22, 68.97, 47.55, 42.46, 41.1 , 41.84,
       49.23, 53.41, 52.77]

after_training = [62.47, 40.66, 42.7 , 57.69, 61.41, 56.76, 54.75, 44.06, 56.29,
       55.48, 47.28, 72.6 , 57.59, 39.39, 56.54, 42.36, 62.58, 65.01,
       42.3 , 62.98, 57.9 , 59.45, 72.28, 50.66, 43.18, 44.82, 45.96,
       54.4 , 58.52, 53.01]

before = np.array(before_training)
after = np.array(after_training)
training_diff = after - before

training_diff

array([ 4.65,  3.03,  5.9 ,  2.47,  8.44,  4.26,  1.29,  0.86,  3.97,
        2.55,  4.42,  3.94,  2.85,  1.3 , -0.03,  2.11,  4.71,  3.42,
        0.51,  3.35,  3.77,  1.23,  3.31,  3.11,  0.72,  3.72,  4.12,
        5.17,  5.11,  0.24])

In [18]:
mean_before = np.mean(before)
mean_after = np.mean(after)
std_before = np.std(before)
std_after = np.std(after)

mean_trainung_diff = np.mean(training_diff)
mean_trainung_diff_null = 0 # no difference
std_trainung_diff = np.std(training_diff)
n_trainung_diff = len(training_diff)
alpha_3 = 0.05
print(f"mean diff: {mean_trainung_diff:.2f}, std  diff: {std_trainung_diff:.2f}, number of records: {n_trainung_diff:.2f}")
print(f"mean before: {mean_before:.2f}, std before: {std_before:.2f}")
print(f"mean after: {mean_after:.2f}, std after: {std_after:.2f}")

mean diff: 3.15, std  diff: 1.86, number of records: 30.00
mean before: 50.89, std before: 8.58
mean after: 54.04, std after: 8.96


In [19]:
z_test_3, p_value_3 = ztest(training_diff, value = mean_trainung_diff_null, alternative='larger')
print(f'Z-test: {z_test_3:.4f}, p-value: {p_value_3:.4f}')

Z-test: 9.1389, p-value: 0.0000


In [20]:
if p_value_3 < alpha_3:
    print('H0 is rejected. Accept H1')
    print('Mean difference is significantly greater than 0')
else:   
    print('H0 is not rejected. Accept H0')  
    print('Mean difference is NOT significantly greater than 0')    

H0 is rejected. Accept H1
Mean difference is significantly greater than 0


### Conclusion of DZ 3
- We need to Check whether the training has affected the productivity of employees (whether the performance of employees has improved).
- Because these two events are dependent on each other, we need to rephrase the task to get rid of this constraint. To do this, we will calculate the difference between the performance After training and Before training (to see if there is a difference between the current state and the past))
- Statistics:
    - Before training: mean = 50.89, std = 8.58
    - After training: mean = 54.04, std = 8.96
    - Difference: mean = 3.15, std = 1.86
- Hypothetical mean = 0 (assumption that there is no difference between Before and After, the indicators remained the same)
- In order to check whether the indicators have statistically improved, we will select the setting: alternative='larger'
    - **Null hypothesis**: The average difference in performance between indicators is equal to\less than 0
    - **Alternative hypothesis**: The average difference in performance between indicators is greater than 0

- Based on ztest, we have the following calculations Z-test: 9.1389, p-value: 0.0000, at a given significance level of 0.05
- The calculated p-value is *less* than the given significance level, so we must reject the null hypothesis (and accept the alternative).
- As a conclusion, we accept that the difference between performance Before and After training has improved. Training has statistically significantly affected performance in a positive way.