In [1]:
import pandas as pd
import numpy as np
from scipy.stats import ttest_1samp

In [2]:
college_data = pd.read_csv('collegeData.csv')
college_data

Unnamed: 0,SexCode,MaritalCode,PrevEdCode,DDVeteran,DaysEnrollToStart,AgeAtStart,AgeAtGrad,GPA,MinutesAttended,HoursAttempt,HoursEarned,HoursReq,MinutesAbsent,TransferCredits,TransferGPA,MinEFC,MaxENTEntranceScore,gradFlag
0,M,M,BACH,0,55,24,27,3.22,145953,2925.0,2550.0,2565,3475,19.00,2.55,0.0,81.00,1
1,F,M,BACH,0,143,22,25,3.02,129045,2640.0,2565.0,2565,11840,12.00,,0.0,89.50,1
2,F,S,BACH,0,98,30,33,3.47,111385,2559.0,2514.0,2565,935,37.67,2.84,0.0,,1
3,F,UN,BACH,0,101,24,27,3.19,135401,2520.0,2520.0,2565,4549,6.00,,0.0,87.50,1
4,M,,SOMECOLL,0,61,19,22,3.84,115660,2520.0,2520.0,2565,1340,22.00,,3141.0,,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2779,F,UN,SOMECOLL,0,101,26,29,3.11,117301,2325.0,2295.0,2250,9619,10.00,,0.0,86.00,1
2780,F,S,HS,0,109,23,25,2.50,99735,1890.0,1620.0,2565,13950,,,5562.0,80.87,0
2781,M,S,SOMECOLL,0,103,22,23,3.30,165378,3135.0,2460.0,0,6042,,,0.0,91.91,1
2782,F,UN,SOMECOLL,0,47,26,28,3.19,31915,840.0,690.0,2415,4995,26.00,,5772.0,84.50,0


In [3]:
dropouts = college_data[college_data['gradFlag'] == 0]
graduates = college_data[college_data['gradFlag'] == 1]

### Research Question 1: Do students who drop out tend to have lower transferred GPA compared to those who graduate?

#### a) For students who do not graduate: The average transfer GPA is less than 2.75.



Ho​: The average transfer GPA = 2.75

Ha: The average transfer GPA < 2.75

In [4]:
dropouts_transfer_gpa = dropouts['TransferGPA'].dropna()
test_stat_dropout, p_value_dropout = ttest_1samp(dropouts_transfer_gpa, 2.75, alternative='less')

In [5]:
print(f"Test Statistic For Dropout Students: {test_stat_dropout}")
print(f"P-value For Dropout Students: {p_value_dropout}")

Test Statistic For Dropout Students: -2.2076632896626656
P-value For Dropout Students: 0.013896284659651276


Since the p-value (0.0139) is less than the 5% significance level, we reject the null hypothesis. This suggests that, for students who did not graduate, the average transfer GPA is indeed less than 2.75.

#### b) For students who graduate: The average transfer GPA is greater than 2.8.

Ho​: The average transfer GPA = 2.8

Ha: The average transfer GPA > 2.8

In [6]:
graduates_transfer_gpa = graduates['TransferGPA'].dropna()
test_stat_graduate, p_value_graduate = ttest_1samp(graduates_transfer_gpa, 2.8, alternative='greater')

In [7]:
print(f"Test Statistic For Graduate Students: {test_stat_graduate}")
print(f"P-value For Graduate Students: {p_value_graduate}")

Test Statistic For Graduate Students: 7.269468232482056
P-value For Graduate Students: 3.484567369010292e-13


The p-value is extremely low, well below the 5% significance level, so we reject the null hypothesis. This indicates that, for students who graduated, the average transfer GPA is significantly greater than 2.8.

### Research Question 2: Do students who drop out tend to have a shorter time gap between enrollment and the start of the semester compared to those who graduate?

#### a) For students who do not graduate: The average number of days between enrollment and the start of the semester is less than 71 days.

Ho​: The average days between enrollment and semester start = 71

Ha: The average days between enrollment and semester start < 71

In [8]:
dropouts_days_enroll = dropouts['DaysEnrollToStart'].dropna()
test_stat_dropouts, p_value_dropouts = ttest_1samp(dropouts_days_enroll, 71, alternative='less')

In [9]:
print(f"Test Statistic For Dropout Students: {test_stat_dropouts}")
print(f"P-value For Dropout Students: {p_value_dropouts}")

Test Statistic For Dropout Students: -0.006328839851492547
P-value For Dropout Students: 0.49747584434110703


The p-value (0.4975) is greater than the 5% significance level, so we do not reject the null hypothesis. This indicates that we lack sufficient evidence to conclude that students who did not graduate had a shorter enrollment-to-start time than 71 days.



#### b) For students who graduate: The average number of days between enrollment and the start of the semester is greater than 71 days.

Ho​: The average days between enrollment and semester start = 71

Ha: The average days between enrollment and semester start > 71

In [10]:
graduates_days_enroll = graduates['DaysEnrollToStart'].dropna()
test_stat_graduates, p_value_graduates = ttest_1samp(graduates_days_enroll, 71, alternative='greater')

In [11]:
print(f"Test Statistic For Graduate Students: {test_stat_graduates}")
print(f"P-value For Graduate Students: {p_value_graduates}")

Test Statistic For Graduate Students: 0.19628723532239142
P-value For Graduate Students: 0.4222035197119946


The p-value (0.4222) is also greater than 5%, so we do not reject the null hypothesis here either. This suggests we lack sufficient evidence to claim that students who graduated had a longer enrollment-to-start time than 71 days.

### Research Question 3: Do students who drop out tend to have lower entrance exam score compared to those who graduate?

#### a) For students who do not graduate: The average entrance exam score is less than 83.

Ho​: The average entrance exam score = 83

Ha: The average entrance exam score < 83

In [12]:
dropouts_exam_score = dropouts['MaxENTEntranceScore'].dropna()
test_stat_dropouts, p_value_dropouts = ttest_1samp(dropouts_exam_score, 83, alternative='less')

In [13]:
print(f"Test Statistic For Dropout Students: {test_stat_dropouts}")
print(f"P-value For Dropout Students: {p_value_dropouts}")

Test Statistic For Dropout Students: -2.9398551944074094
P-value For Dropout Students: 0.0016966364249860896


The p-value (0.0017) is less than the 5% significance level, so we reject the null hypothesis. This suggests that students who did not graduate had an average entrance exam score lower than 83.

#### b) For students who graduate: The average entrance exam score is greater than 90.



Ho​: The average entrance exam score = 90

Ha: The average entrance exam score > 90

In [14]:
graduates_exam_score = graduates['MaxENTEntranceScore'].dropna()
test_stat_graduates, p_value_graduates = ttest_1samp(graduates_exam_score, 90, alternative='greater')

In [15]:
print(f"Test Statistic For Graduate Students: {test_stat_graduates}")
print(f"P-value For Graduate Students: {p_value_graduates}")

Test Statistic For Graduate Students: 0.4259083577037395
P-value For Graduate Students: 0.3351288593129971


The p-value (0.3351) is greater than 5%, so we do not reject the null hypothesis. This indicates that there is insufficient evidence to conclude that students who graduated had an entrance exam score greater than 90.