# Practical 7 — Statistical Foundation of Data Sciences

**Student:** Aryan Dhiman  
**Course:** CSU1658  
**Date:** October 2025

---

## Overview

This notebook uses the teachers’ rating dataset to answer three regression and analysis questions:  
- Regression with T-test: Gender effect on teaching evaluation  
- Regression with ANOVA: Beauty score differences by age  
- Correlation/regression: Evaluation score vs beauty score


In [1]:

import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.stats.anova import anova_lm
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
df = pd.read_csv('TeachingRatings(TeachingRatings).csv')

print(f"Dataset loaded: {df.shape[0]} rows")
display(df.head())


Dataset loaded: 463 rows


Unnamed: 0,minority,age,female,onecredit,beauty,course_eval,intro,nnenglish
0,1,36,1,0,0.289916,4.3,0,0
1,0,59,0,0,-0.737732,4.5,0,0
2,0,51,0,0,-0.571984,3.7,0,0
3,0,40,1,0,-0.677963,4.3,0,0
4,0,31,1,0,1.509794,4.4,0,0


## Q1. Regression with T-test: Does gender affect teaching evaluation rates?


In [2]:

model_1 = smf.ols('course_eval ~ female', data=df).fit()
print(model_1.summary())


                            OLS Regression Results                            
Dep. Variable:            course_eval   R-squared:                       0.022
Model:                            OLS   Adj. R-squared:                  0.020
Method:                 Least Squares   F-statistic:                     10.56
Date:                Tue, 28 Oct 2025   Prob (F-statistic):            0.00124
Time:                        15:20:48   Log-Likelihood:                -378.50
No. Observations:                 463   AIC:                             761.0
Df Residuals:                     461   BIC:                             769.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      4.0690      0.034    121.288      0.0

## Q2. Regression with ANOVA: Does beauty score for instructors differ by age?


In [3]:

df['age_group'] = pd.qcut(df['age'], 3, labels=["Young", "Middle", "Old"])

import statsmodels.api as sm
from statsmodels.formula.api import ols

anova_model = ols('beauty ~ age_group', data=df).fit()
anova_results = sm.stats.anova_lm(anova_model, typ=2)
print(anova_results)


               sum_sq     df          F    PR(>F)
age_group   15.982491    2.0  13.546211  0.000002
Residual   271.365405  460.0        NaN       NaN


## Q3. Correlation: Is teaching evaluation score correlated with beauty score?
### Check using OLS regression and report.


In [4]:

model_2 = smf.ols('course_eval ~ beauty', data=df).fit()
print(model_2.summary())


                            OLS Regression Results                            
Dep. Variable:            course_eval   R-squared:                       0.036
Model:                            OLS   Adj. R-squared:                  0.034
Method:                 Least Squares   F-statistic:                     17.08
Date:                Tue, 28 Oct 2025   Prob (F-statistic):           4.25e-05
Time:                        15:21:52   Log-Likelihood:                -375.32
No. Observations:                 463   AIC:                             754.6
Df Residuals:                     461   BIC:                             762.9
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      3.9983      0.025    157.727      0.0