In [None]:
import pandas
import numpy
import matplotlib.pyplot as plt

In [None]:
mydata = pandas.read_csv('/content/StudentStatistics.csv', sep=';')
mydata

In [None]:
mydata.groupby('Gen_Gender')['Over_Grade'].mean()

In [None]:
mydata.groupby('Gen_Gender')['Over_Grade'].std()

In [None]:
mydata.groupby('Gen_Gender')['Over_Grade'].agg(['mean', 'median', 'std', 'min', 'max'])

In [None]:
maleScores = mydata['Over_Grade'][mydata['Gen_Gender'] == 'Male'].dropna()

In [None]:
femaleScores = mydata['Over_Grade'][mydata['Gen_Gender'] == 'Female'].dropna()

In [None]:
plt.figure(1, figsize=(8,8))

plt.subplot(2,1,1)
plt.hist(maleScores, color='skyblue', edgecolor='green', lw=2, bins = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100])

plt.subplot(2,1,2)
plt.hist(femaleScores, color='yellow', edgecolor='green', lw=2, bins = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100])

plt.show()

The most commonly used test, for testing if the mean between two categories is different, is probably the Student t-test (Student, 1908).

However, this test does not perform well if the variances in each group are different. The Welch test (Welch, 1938, 1947) is designed to use if the variances are not equal, and does not differ much from the Student t-test if the variances are the same.

Ruxton (2006) compared the Welch t-test and the Student’s t-test and concluded that: “the unequal variance t-test should always be used in preference to the Student's t-test” (p. 690).

There are four similar tests, with different assumptions.

|test|equal variance|normality|
|----|--------------|---------|
|Student|yes|yes|
|Welch|no|yes|
|Trimmed|yes|no|
|Yuen-Welch|no|no|

In this document the Student and Welch version are discussed. The Trimmed and Yuen-Welch are in a separate document, as well as a fifth for large sample cases: Z-test.

In [None]:
maleMarks = mydata['Over_Grade'][mydata['Gen_Gender'] == 'Male'].dropna()
femaleMarks = mydata['Over_Grade'][mydata['Gen_Gender'] == 'Female'].dropna()

In [None]:
maleMarks

In [None]:
femaleMarks

In [None]:
from scipy.stats import ttest_ind

In [None]:
#Student t-test

'''
Data - marks of male and marks of female students in a exam

H0- marks are equal
Ha- marks are unequal

statistic=0.8007631996644367, pvalue=0.428120221815538
'''

ttest_ind(maleMarks, femaleMarks)

In [None]:
#Welch test
#TtestResult(statistic=0.6936894455945684, pvalue=0.4991001245520197, df=14.158919197596669)
'''
Data - marks of male and marks of female students in a exam

H0- marks are equal
Ha- marks are unequal

p<=0.05 - significant (null hypothesis can be rejected)
p>0.05 - not significant (null hypothesis cannot be rejected)
'''

ttest_ind(maleMarks, femaleMarks, equal_var=False)


In [None]:
'''
Effect size - Cohen d_s
'''

In [None]:
!pip install https://raw.githubusercontent.com/stikpet/stikpetP/main/stikpetP.tar.gz

In [None]:
import stikpetP as ps

Then for the interpretation, Cohen (1988, p. 40) has some rules of thumb for the small, medium and large, and Sawilowsky (2009, p. 599) expanded these with very small, very large and huge:

|\|Cohen d\|| Interpretation|
|-------|---------------|
|0 < .01| Negligible|
|0.01 < .20| Very small|
|0.20 < 0.50 |Small|
|0.50 < 0.80| Medium|
|0.80 < 1.20| Large|
|1.20 < 2.00| Very large|
|2.00 or more| Huge|

Lets, find out how to do this with Python

In [None]:
ps.es_hedges_g_is(mydata['Gen_Gender'], mydata['Over_Grade'])