In [19]:
import pandas as pd         # Import and maniuplate our csv files
import scipy.stats as stats # Perform statistical tests


In [20]:
# Simple function to display the significance of the t-test

def test_results(p_val, t_stat):
    """Print whether the test is significant or not,
       whether the t-statistic is positive or negative, and
       if the t_stat is > 2 or < -2
    """
    significance = "significant" if p_val < 0.05 else "not significant"
    mean_comparison = "greater" if t_stat > 0 else "less"
    t_stat_significance = "significant" if abs(t_stat) > 2 else "not significant"
    
    out_str = (
        f"P-value: {p_val}\n"
        f"T-statistic: {t_stat}\n"
        f"The test is {significance}.\n"
        f"The sample mean is {mean_comparison} than the population mean.\n"
        f"The t-statistic is {t_stat_significance}."
    )
    
    print(out_str)

In [21]:
# Load the data
data = pd.read_csv("data.csv")
data.head()


Unnamed: 0,value1,value2
0,1.764052,2.383151
1,0.400157,-0.847759
2,0.978738,-0.770485
3,2.240893,1.469397
4,1.867558,-0.673123


# One-Sample T-Test

A one-sample t-test is used to determine whether a sample mean is statistically different from a known or hypothesized population mean. The population mean is known as the null hypothesis. The null hypothesis is that the sample mean is equal to the population mean. The alternative hypothesis is that the sample mean is different from the population mean.

In [22]:
# calculate the mean of value2
# we'll use this for our "one-sample" t-test
mean_value2 = data['value2'].mean()
mean_value2

0.5820129707478374

In [23]:
# perform the test, using our `mean_data2` as the population mean
t_statistic, p_value = stats.ttest_1samp(data['value1'], mean_value2)
# calculate the degrees of freedom
df = len(data['value1']) - 1

test_results(p_value, t_statistic)

P-value: 1.2984708779004685e-06
T-statistic: -5.155238846163232
The test is significant.
The sample mean is less than the population mean.
The t-statistic is significant.


## Independent T-Test

An independent t-test is used to determine whether there is a statistically significant difference between the means of two independent groups. The null hypothesis is that the means of the two groups are equal. The alternative hypothesis is that the means of the two groups are different.

In [26]:
# perform the t-test
t_stat, p_val = stats.ttest_ind(data['value1'], data['value2'])
test_results(p_val, t_stat)


P-value: 0.00040627960203625224
T-statistic: -3.597192759749613
The test is significant.
The sample mean is less than the population mean.
The t-statistic is significant.


## Paired Samples T-Test

A paired samples t-test is used to determine whether there is a statistically significant difference between the means of two related groups. The null hypothesis is that the means of the two groups are equal. The alternative hypothesis is that the means of the two groups are different.

In [27]:
# perform the t-test
t_stat, p_val = stats.ttest_rel(data['value1'], data['value2'])
test_results(p_val, t_stat)

P-value: 0.00023576268939936413
T-statistic: -3.816643761053478
The test is significant.
The sample mean is less than the population mean.
The t-statistic is significant.
