### T test

In [18]:
import numpy as np
import pandas as pd
from scipy import stats

# Sample data
data = {'Group': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
        'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}

# Convert the data into a pandas data frame
df = pd.DataFrame(data)

# Perform the t-test
t_stat, p_val = stats.ttest_ind(df[df['Group'] == 'A']['Value'], df[df['Group'] == 'B']['Value'])

# Print the results
print("T-statistic: ", t_stat)
print("P-value: ", p_val)


T-statistic:  -5.0
P-value:  0.001052825793366539


Finally, we print the t-statistic and p-value, which represent the test statistic and the p-value, respectively. The p-value is used to determine the statistical significance of the results. A small p-value indicates that the difference between the groups is significant, while a large p-value indicates that the difference is not significant.

### Z test

In [22]:
import numpy as np
import pandas as pd
from scipy import stats

# Sample data
data = {'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}

# Convert the data into a pandas data frame
df = pd.DataFrame(data)

# The population mean
population_mean = 5

# Perform the Z-test
z_stat, p_val = stats.zscore(df['Value'], ddof=1), population_mean

# Print the results
print("Z-statistic: ", z_stat)
print("P-value: ", p_val)

Z-statistic:  0   -1.486301
1   -1.156012
2   -0.825723
3   -0.495434
4   -0.165145
5    0.165145
6    0.495434
7    0.825723
8    1.156012
9    1.486301
Name: Value, dtype: float64
P-value:  5


Finally, we print the Z-statistic and p-value, which represent the test statistic and the p-value, respectively. The p-value is used to determine the statistical significance of the results. A small p-value indicates that the sample mean is significantly different from the population mean, while a large p-value indicates that the sample mean is not significantly different from the population mean.

### ANOVA

In [17]:
import numpy as np
import pandas as pd
from scipy import stats

# Sample data
data = {'Group': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'],
        'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}

# Convert the data into a pandas data frame
df = pd.DataFrame(data)

# Perform the ANOVA test
f_val, p_val = stats.f_oneway(df[df['Group'] == 'A']['Value'], df[df['Group'] == 'B']['Value'], df[df['Group'] == 'C']['Value'])

# Print the results
print("F-value: ", f_val)
print("P-value: ", p_val)


F-value:  50.0
P-value:  1.5127924217375409e-06


Finally, we print the F-value and P-value, which represent the test statistic and the p-value, respectively. The p-value is used to determine the statistical significance of the results. A small p-value indicates that the difference between the groups is significant, while a large p-value indicates that the difference is not significant.

### Correlation test

Categorical variables
- Chi-Square test
    - Start with a cross table
    
Continuous variables
- Pearson correlation test
    - Start with a scatter plot

In [23]:
import numpy as np
import pandas as pd

# Sample data
data = {'X': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
        'Y': [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]}

# Convert the data into a pandas data frame
df = pd.DataFrame(data)

# Calculate the correlation between X and Y
correlation = df['X'].corr(df['Y'])

# Print the results
print("Correlation: ", correlation)


Correlation:  0.9999999999999999


Finally, we print the correlation value, which indicates the strength and direction of the relationship between the two variables. A value of 1 indicates a strong positive correlation, a value of -1 indicates a strong negative correlation, and a value close to 0 indicates a weak or no correlation.

### Chi-Square Test

In [24]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency

# Sample data
data = {'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
        'Smoker': ['Yes', 'No', 'No', 'Yes', 'Yes', 'No']}

# Convert the data into a pandas data frame
df = pd.DataFrame(data)

# Perform the chi-square test
chi2, p, dof, ex = chi2_contingency(pd.crosstab(df['Gender'], df['Smoker']))

# Print the results
print("Chi-Square Statistic: ", chi2)
print("P-value: ", p)
print("Degrees of Freedom: ", dof)

Chi-Square Statistic:  0.0
P-value:  1.0
Degrees of Freedom:  1


Finally, we print the results, including the chi-square statistic, the p-value, and the degrees of freedom. If the p-value is less than a specified significance level (e.g. 0.05), we reject the null hypothesis and conclude that there is a significant association between the two variables.