In [1]:
# ANOVA test

# Suppose we have the following dataset consisting of the grades of students in three different schools:

school1 = [85, 75, 90, 80, 95]
school2 = [70, 80, 75, 85, 80]
school3 = [90, 85, 95, 80, 85]

We want to determine whether there is a significant difference in the average grades of the three schools.

To perform an ANOVA test, we can use the f_oneway function from the scipy.stats module. Here's how to do it:

In [2]:
import scipy.stats as stats

# Combine the data into a single array
data = [school1, school2, school3]

# Perform the ANOVA test
f_statistic, p_value = stats.f_oneway(*data)

print("F-Statistic:", f_statistic)
print("P-Value:", p_value)

F-Statistic: 2.627450980392157
P-Value: 0.11313904276130801


This will output the F-statistic and p-value for the ANOVA test. The F-statistic measures the variation between the sample means relative to the variation within the samples, while the p-value indicates the probability of obtaining such a result if the null hypothesis (i.e., the means of the three schools are equal) were true.

Since the p-value is less than the commonly used significance level of 0.05, we can reject the null hypothesis and conclude that there is a significant difference in the average grades of the three schools.

In [3]:
# Chi-Square test

import pandas as pd

data = {'Gender': ['M', 'M', 'F', 'F', 'F', 'M', 'M', 'F', 'M', 'F'],
        'Color': ['Red', 'Blue', 'Green', 'Red', 'Blue', 'Green', 'Green', 'Red', 'Blue', 'Green']}

df = pd.DataFrame(data)

We want to determine whether there is a significant association between gender and color preference in the entire dataframe.

To perform a chi-square test on the entire dataframe, we can use the chi2_contingency function from the scipy.stats module along with the crosstab function from pandas. Here's how to do it:

In [4]:
import scipy.stats as stats

# Create the contingency table using pandas' crosstab function
contingency_table = pd.crosstab(df['Gender'], df['Color'])

# Perform the chi-square test
chi2_statistic, p_value, dof, expected = stats.chi2_contingency(contingency_table)

print("Chi-Square Statistic:", chi2_statistic)
print("P-Value:", p_value)
print("Degrees of Freedom:", dof)
print("Expected Frequencies:\n", expected)

Chi-Square Statistic: 0.6666666666666666
P-Value: 0.7165313105737892
Degrees of Freedom: 2
Expected Frequencies:
 [[1.5 2.  1.5]
 [1.5 2.  1.5]]


This will output the chi-square statistic, p-value, degrees of freedom, and expected frequencies for the chi-square test on the entire dataframe. The chi-square statistic measures the difference between the observed and expected frequencies, while the p-value indicates the probability of obtaining such a result if the null hypothesis (i.e., there is no association between gender and color preference) were true.

Since the p-value is greater than the commonly used significance level of 0.05, we fail to reject the null hypothesis and conclude that there is no significant association between gender and color preference in the entire dataframe.