# Comparing 2 groups of performance measurements

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from scipy.stats import mannwhitneyu

## Tip! 

If you wan't to do performance measurements in C# then have a look at [BenchmarkDotNet](https://benchmarkdotnet.org/index.html).

Let's say we have 2 groups of performance measurements - each with 25 observations

In [None]:
perfMeasurement1 = np.array([27, 20, 21, 26, 27, 31, 24, 21, 20, 19, 23, 24, 28, 19, 24, 29, 18, 20, 17, 31, 20, 25, 28, 21, 27])
perfMeasurement2 = np.array([21, 22, 15, 12, 21, 21, 19, 26, 22, 24, 17, 23, 19, 22, 20, 24, 18, 13, 29, 21, 19, 14, 23, 17, 20])

In [None]:
data = pd.DataFrame(
    {"perfMeasurement1": perfMeasurement1, "perfMeasurement2": perfMeasurement2}
)

In [None]:
data.plot(kind="box", title="boxplot");

Please be aware that there is more accurate way of visualizing this but those plots require a bit more insight in probability distributions, so we are using the boxplot as it is easier to understand.

## Statistical test

### Student's t-test

- Prerequisites for using test
- Calculate test statistic (using some formula and find determine degrees of freedom)
- P-value
- Null hypothesis
- Significant


Let's move on by assumning that it DOES NOT make a difference if a measurement is in one group or another. If that is the case they we should be able to shuffle the measurements around without any 
impact.

<video width="6400" height="480" 
       src="./sample_640x360.mp4"  
       controls>
</video>

In [None]:
mean = data.mean()
mean

In [None]:
mean = mean.iloc[0] - mean.iloc[1]
mean

In [None]:
allPerfMeasurements = np.concatenate((perfMeasurement1, perfMeasurement2))

In [None]:
n_halfPerfMeasurements = int(len(allPerfMeasurements) / 2)

In [None]:
simulated_means = []
samples = 100000

In [None]:
for i in range(0, samples):
    np.random.shuffle(allPerfMeasurements)

    simulated_means.append(
        np.mean(allPerfMeasurements[:n_halfPerfMeasurements])
        - np.mean(allPerfMeasurements[n_halfPerfMeasurements:])
    )

In [None]:
two_std = 2.0 * np.std(simulated_means, ddof=1)

In [None]:
plt.figure(1, figsize=(16, 8))
plt.hist(simulated_means, bins=300)
_ = plt.axvline(x=mean, ymax=1, linewidth=1.5, color="red")
_ = plt.axvline(x=two_std, ymax=1, linewidth=1.5, color="black")

In [None]:
len(np.where(simulated_means >= mean)[0])

In [None]:
obs = len(np.where(simulated_means >= mean)[0]) / samples
obs

We are on solid ground. This approach is not just something I came up with. More generally this is a type of statistics called non-parametric statistics which is characterized by having a minimum of requirements for using it.

In particular the above approach is known as Mann Whitney U-test!


In [None]:
alpha = 0.05
stat, p = mannwhitneyu(perfMeasurement1, perfMeasurement2)
print("Statistics=%.3f, p=%.3f" % (stat, p))

# interpret
if p > alpha:
    print("Same distribution (fail to reject H0)")
else:
    print("Different distribution (reject H0)")