In [2]:
import pandas as pd
from scipy import stats
import statsmodels.formula.api as smf
import statsmodels.api as sm

In [3]:
df = pd.read_csv("/content/data.csv")
df.head()

Unnamed: 0,prof,gender,beauty,students,tenure,rating,age,division
0,Prof_52,Female,2.33,74,No,4.01,42,Upper
1,Prof_93,Female,3.68,155,Yes,3.29,54,Upper
2,Prof_15,Female,3.03,233,No,3.15,65,Lower
3,Prof_72,Female,2.9,248,No,4.51,56,Upper
4,Prof_61,Male,2.91,186,No,4.48,67,Upper


# Q1. Regression with T-test: Using the teachers rating data set, does gender affect teaching evaluation rates?

In [6]:
model = smf.ols('rating ~ gender', data=df).fit()
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                 rating   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.005
Method:                 Least Squares   F-statistic:                   0.06037
Date:                Tue, 28 Oct 2025   Prob (F-statistic):              0.806
Time:                        05:27:56   Log-Likelihood:                -152.01
No. Observations:                 200   AIC:                             308.0
Df Residuals:                     198   BIC:                             314.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept          4.0507      0.050     81.

An OLS regression was conducted to examine whether gender predicts teaching evaluations.
The results were not statistically significant, F(1,198) = 0.06, p = 0.806, with an R² = 0.000.
This indicates that gender explains virtually none of the variance in teaching evaluation scores.
Male instructors scored about 0.02 points higher on average than female instructors (β = 0.018, p = 0.806). However, this difference is not statistically significant and likely due to random variation, suggesting that gender has no meaningful effect on teaching evaluations in this dataset.

# Q2. Regression with ANOVA: Using the teachers' rating data set, does beauty score for instructors differ by age?

In [9]:
df['age_group'] = pd.cut(df['age'], bins=[20, 30, 40, 50, 60, 70],
                         labels=['20s', '30s', '40s', '50s', '60s'])
model = smf.ols('beauty ~ C(age_group)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


                 sum_sq     df         F    PR(>F)
C(age_group)   0.555752    4.0  0.286595  0.886443
Residual      94.533656  195.0       NaN       NaN


An ANOVA test was conducted to determine whether beauty scores differ by age group.
The results were not statistically significant, F(4,195) = 0.29, p = 0.886.
This indicates that instructors’ beauty scores do not significantly differ across age groups, suggesting that age group is not an important factor influencing beauty ratings in this sample.

# Q3. Correlation: Using the teachers' rating dataset, Is teaching evaluation score correlated with beauty score?

In [11]:
import statsmodels.api as sm

X = df['beauty']
y = df['rating']

X = sm.add_constant(X)

model = sm.OLS(y, X).fit()
predictions = model.predict(X)

print(model.summary())


                            OLS Regression Results                            
Dep. Variable:                 rating   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                 -0.004
Method:                 Least Squares   F-statistic:                    0.2487
Date:                Tue, 28 Oct 2025   Prob (F-statistic):              0.619
Time:                        05:30:15   Log-Likelihood:                -151.92
No. Observations:                 200   AIC:                             307.8
Df Residuals:                     198   BIC:                             314.4
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          4.1418      0.170     24.328      0.0

An OLS regression was conducted to examine whether beauty scores predict teaching evaluations.
The model was not statistically significant, F(1,198) = 0.25, p = 0.619, with an R² = 0.001.
This indicates that beauty scores explain less than 1% of the variance in teaching evaluations. The coefficient for beauty (β = -0.027, p = 0.619) was not significant, suggesting that instructors’ beauty scores do not have a meaningful impact on their evaluation ratings in this dataset.