#Importing Libraries and Loading Dataset

In [13]:
import pandas as pd
from scipy import stats
import statsmodels.formula.api as smf
import statsmodels.api as sm

In [14]:
df = pd.read_csv("/content/teachers_rating_dataset.csv")
df.head()

Unnamed: 0,Prof,Gender,Tenure,Beauty,Rating,Students,Age,Division
0,Prof_52,Female,No,5.66,4.11,62,26,Upper Division
1,Prof_93,Male,Yes,7.28,3.86,95,37,Lower Division
2,Prof_15,Female,No,7.02,4.22,84,35,Upper Division
3,Prof_72,Male,Yes,8.15,4.57,300,47,Lower Division
4,Prof_61,Female,No,6.68,4.25,131,40,Lower Division


Q1. Regression with T-test: Using the teachers rating data set, does gender
affect teaching evaluation rates?

In [15]:
model = smf.ols('Rating ~ Gender', data=df).fit()
print(model.summary())


                            OLS Regression Results                            
Dep. Variable:                 Rating   R-squared:                       0.018
Model:                            OLS   Adj. R-squared:                  0.010
Method:                 Least Squares   F-statistic:                     2.202
Date:                Sat, 25 Oct 2025   Prob (F-statistic):              0.141
Time:                        15:00:50   Log-Likelihood:                -88.945
No. Observations:                 120   AIC:                             181.9
Df Residuals:                     118   BIC:                             187.5
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept          3.8674      0.070     55.

Based on the regression results (F(1,118) = 2.20, p = 0.141), there is no statistically significant difference in teaching evaluation ratings between male and female instructors.

Although male instructors scored about 0.14 points higher on average than female instructors, this difference could be due to random variation rather than a real gender effect.

Q2. Regression with ANOVA: Using the teachers' rating data set, does
beauty score for instructors differ by age?


In [16]:
df['age_group'] = pd.cut(df['Age'], bins=[20, 30, 40, 50, 60, 70], labels=['20s','30s','40s','50s','60s'])
model = smf.ols('Beauty ~ C(age_group)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


                  sum_sq     df         F    PR(>F)
C(age_group)    7.116037    4.0  1.005531  0.393046
Residual      205.229932  116.0       NaN       NaN




An ANOVA test was conducted to determine whether beauty scores differ by age group.
The results were not significant, F(4,116) = 1.01, p = 0.393.
This indicates that instructors’ beauty scores do not significantly differ across age groups.

Q3. Correlation: Using the teachers' rating dataset, Is teaching evaluation
score correlated with beauty score?

In [19]:
X = df['Beauty']
y = df['Rating']
X = sm.add_constant(X)

model = sm.OLS(y, X).fit()
predictions = model.predict(X)
model.summary()

0,1,2,3
Dep. Variable:,Rating,R-squared:,0.001
Model:,OLS,Adj. R-squared:,-0.007
Method:,Least Squares,F-statistic:,0.1285
Date:,"Sat, 25 Oct 2025",Prob (F-statistic):,0.721
Time:,15:06:15,Log-Likelihood:,-89.989
No. Observations:,120,AIC:,184.0
Df Residuals:,118,BIC:,189.6
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,4.0212,0.220,18.266,0.000,3.585,4.457
Beauty,-0.0128,0.036,-0.358,0.721,-0.083,0.058

0,1,2,3
Omnibus:,1.836,Durbin-Watson:,2.095
Prob(Omnibus):,0.399,Jarque-Bera (JB):,1.532
Skew:,0.275,Prob(JB):,0.465
Kurtosis:,3.067,Cond. No.,29.6


The model was not statistically significant, F(1,118) = 0.13, p = 0.721, with an R² of 0.001.
Beauty score was not a significant predictor of ratings (β = -0.013, p = 0.721), suggesting that instructor beauty has no measurable impact on teaching evaluations in this dataset.