## The goal
The **goal** of our experiments is to determine the coefficient $a_{t_i t_j}$ of impact of type ${t_i}$ on type ${t_j}$.

We run the experiment and we calculate the coefficient. Then we run it again, and we get a slightly different result. We can proceed with that procedure and think of it as sampling a **distribution** of coefficients.

### Is this distribution normal?
This distribution is not necessarily normal, but we suspect it to be normal, because we use average latency/throughput from some time intervals to calculate our coefficient.

Luckily, this can be checked with large enough sample: https://en.wikipedia.org/wiki/Normality_test

### How to determine one coefficient from a list samples?
This is an important conceptual question. We want a coefficient that reflects the impact of two types best.

One of the possible aproaches is to simply take the **average coefficient**:
* The average coefficient is the good for most number of cases - in total, we gain the most.
* We already use average values to calculate the coefficient in every experiment

### How many samples are necessary?
We want to be as accurate as possible, but each experiment takes much time to run.

If we use average coefficient, then we can use interval estimation to calculate the confidence interval with some confidence level, using following formula:

$ P\left({\overline {X}}-u_{\alpha }{\frac {\sigma }{\sqrt {n}}}<m<{\overline {X}}+u_{\alpha }{\frac {\sigma }{\sqrt {n}}}\right)=1-\alpha $

where:
* $n$ – sample size,
* $\overline {X}$ – average of the sample,
* $\sigma$  – standard deviation of the sample,
* $u_{\alpha }$ – a statistic satysfying $P(-u_{\alpha }<U<u_{\alpha })=1-\alpha$, where $U$ is of $N(0,1)$

Find more: https://en.wikipedia.org/wiki/Confidence_interval

In [None]:
import numpy as np
import scipy.stats


def mean_confidence_interval(data, confidence=0.95):
    a = 1.0 * np.array(data)
    n = len(a)
    m, se = np.mean(a), scipy.stats.sem(a)
    h = se * scipy.stats.t.ppf((1 + confidence) / 2, n - 1)
    return m, h


def print_confidence_intervals(measurements):
    confidences = [0.9, 0.95, 0.99, 0.999]
    print(f'Measurements: {measurements}')
    
    for confidence in confidences:
        m, h = mean_confidence_interval(measurements, confidence=confidence)
        print(f'Mean is between {m-h:.4} and {m+h:.4} (±{h/m*100:.4}%) with confidence {confidence}')
    print('')

        
measurements = [0.512, 0.534, 0.491]
print_confidence_intervals(measurements)


measurements = [0.512, 0.534, 0.491, 0.522]
print_confidence_intervals(measurements)
    
measurements = [0.512, 0.534, 0.491, 0.522, 0.481]
print_confidence_intervals(measurements)

We can see that the interval is narrowing down significantly as size of sample increases.

Here we have some linear regression coefficients obtained by adding instances of `redis_ycsb` on `naan`: `[0.01432621438691767, 0.014196753917670415, 0.014124212515962489, 0.014220166651701715]`. Let's see how our confidence intervals look like:

In [None]:
measurements = [0.01432621438691767, 0.014196753917670415, 0.014124212515962489, 0.014220166651701715]

for i in range(0,3):
    print_confidence_intervals(measurements[:i+2])

### Does machine configuration impact the coefficient?
This is easy to check experimentally - run tests on different machine configurations and check if the difference is relevant.

Measurements: [0.512, 0.534, 0.491]
Mean is between 0.4761 and 0.5486 (±7.075%) with confidence 0.9
Mean is between 0.4589 and 0.5657 (±10.43%) with confidence 0.95
Mean is between 0.3891 and 0.6355 (±24.05%) with confidence 0.99
Mean is between 0.1201 and 0.9046 (±76.57%) with confidence 0.999

Measurements: [0.512, 0.534, 0.491, 0.522]
Mean is between 0.4933 and 0.5362 (±4.163%) with confidence 0.9
Mean is between 0.4858 and 0.5437 (±5.629%) with confidence 0.95
Mean is between 0.4616 and 0.5679 (±10.33%) with confidence 0.99
Mean is between 0.3971 and 0.6324 (±22.86%) with confidence 0.999

Measurements: [0.512, 0.534, 0.491, 0.522, 0.481]
Mean is between 0.4872 and 0.5288 (±4.097%) with confidence 0.9
Mean is between 0.4809 and 0.5351 (±5.335%) with confidence 0.95
Mean is between 0.4631 and 0.5529 (±8.848%) with confidence 0.99
Mean is between 0.4239 and 0.5921 (±16.55%) with confidence 0.999



We can see that the interval is narrowing down significantly as size of sample increases.

Here we have some linear regression coefficients obtained by adding instances of `redis_ycsb` on `naan`: `[0.01432621438691767, 0.014196753917670415, 0.014124212515962489, 0.014220166651701715]`. Let's see how our confidence intervals look like:

In [58]:
measurements = [0.01432621438691767, 0.014196753917670415, 0.014124212515962489, 0.014220166651701715]

for i in range(0,3):
    print_confidence_intervals(measurements[:i+2])

Measurements: [0.01432621438691767, 0.014196753917670415]
Mean is between 0.01385 and 0.01467 (±2.866%) with confidence 0.9
Mean is between 0.01344 and 0.01508 (±5.767%) with confidence 0.95
Mean is between 0.01014 and 0.01838 (±28.89%) with confidence 0.99
Mean is between -0.02695 and 0.05547 (±288.9%) with confidence 0.999

Measurements: [0.01432621438691767, 0.014196753917670415, 0.014124212515962489]
Mean is between 0.01404 and 0.01439 (±1.214%) with confidence 0.9
Mean is between 0.01396 and 0.01447 (±1.788%) with confidence 0.95
Mean is between 0.01363 and 0.0148 (±4.125%) with confidence 0.99
Mean is between 0.01235 and 0.01608 (±13.13%) with confidence 0.999

Measurements: [0.01432621438691767, 0.014196753917670415, 0.014124212515962489, 0.014220166651701715]
Mean is between 0.01412 and 0.01432 (±0.6918%) with confidence 0.9
Mean is between 0.01408 and 0.01435 (±0.9355%) with confidence 0.95
Mean is between 0.01397 and 0.01446 (±1.717%) with confidence 0.99
Mean is between 0.01

### Does machine configuration impact the coefficient?
This is easy to check experimentally - run tests on different machine configurations and check if the difference is relevant.