# Öğrenci Başarı Tahmini

Bu çalışma, öğrenci performansını etkileyen faktörleri analiz ederek final notunu (G3) tahmin etmek için yapılmıştır.

## 1. Gerekli Kütüphanelerin Yüklenmesi

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.feature_selection import SelectKBest, f_regression

## 2. Veri Setinin Yüklenmesi

In [2]:
file_path = "student-mat.csv"  # Colab'a yüklediğiniz dosyanın yolunu belirtin
data = pd.read_csv(file_path, sep=';')
data.head()

Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,3,4,1,1,3,6,5,6,6
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,5,3,3,1,1,3,4,5,5,6
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,4,3,2,2,3,3,10,7,8,10
3,GP,F,15,U,GT3,T,4,2,health,services,...,3,2,2,1,1,5,2,15,14,15
4,GP,F,16,U,GT3,T,3,3,other,other,...,4,3,2,1,2,5,4,6,10,10


## 3. Önerilen Değişkenlerin Seçilmesi

In [3]:
selected_columns = ["age", "sex", "Medu", "Fedu", "guardian", "studytime", "failures",
                     "schoolsup", "higher", "internet", "goout", "freetime", "health", "absences", "G3"]
data = data[selected_columns]
data.head()

Unnamed: 0,age,sex,Medu,Fedu,guardian,studytime,failures,schoolsup,higher,internet,goout,freetime,health,absences,G3
0,18,F,4,4,mother,2,0,yes,yes,no,4,3,3,6,6
1,17,F,1,1,father,2,0,no,yes,yes,3,3,3,4,6
2,15,F,1,1,mother,2,3,yes,yes,yes,2,3,3,10,10
3,15,F,4,2,mother,3,0,no,yes,yes,2,2,5,2,15
4,16,F,3,3,father,2,0,no,yes,no,2,3,5,4,10


## 4. Kategorik Değişkenlerin Dönüştürülmesi

In [4]:
label_encoders = {}
categorical_features = ["sex", "guardian", "schoolsup", "higher", "internet"]

for col in categorical_features:
    le = LabelEncoder()
    data[col] = le.fit_transform(data[col])
    label_encoders[col] = le

data.head()

Unnamed: 0,age,sex,Medu,Fedu,guardian,studytime,failures,schoolsup,higher,internet,goout,freetime,health,absences,G3
0,18,0,4,4,1,2,0,1,1,0,4,3,3,6,6
1,17,0,1,1,0,2,0,0,1,1,3,3,3,4,6
2,15,0,1,1,1,2,3,1,1,1,2,3,3,10,10
3,15,0,4,2,1,3,0,0,1,1,2,2,5,2,15
4,16,0,3,3,0,2,0,0,1,0,2,3,5,4,10


## 5. Veri Setinin Eğitim ve Test Olarak Bölünmesi

In [5]:
X = data.drop("G3", axis=1)
y = data["G3"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 6. Veriyi Ölçeklendirme

In [6]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## 7. Modelin Eğitilmesi ve Test Edilmesi

In [7]:
model = LinearRegression()
model.fit(X_train_scaled, y_train)
y_pred = model.predict(X_test_scaled)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared Score: {r2:.2f}")

Mean Squared Error: 18.10
R-squared Score: 0.12


## 8. En Önemli 5 Değişkenin Seçilmesi

In [8]:
selector = SelectKBest(score_func=f_regression, k=5)
X_new = selector.fit_transform(X_train_scaled, y_train)
selected_features = X.columns[selector.get_support()]
print("Seçilen en önemli 5 özellik:", list(selected_features))

Seçilen en önemli 5 özellik: ['age', 'Medu', 'studytime', 'failures', 'higher']


## 9. Yeni Modelin Eğitilmesi ve Test Edilmesi

In [9]:
X_train_selected = selector.transform(X_train_scaled)
X_test_selected = selector.transform(X_test_scaled)
model.fit(X_train_selected, y_train)
y_pred_selected = model.predict(X_test_selected)

mse_selected = mean_squared_error(y_test, y_pred_selected)
r2_selected = r2_score(y_test, y_pred_selected)
print(f"Yeni Model - Mean Squared Error: {mse_selected:.2f}")
print(f"Yeni Model - R-squared Score: {r2_selected:.2f}")

Yeni Model - Mean Squared Error: 19.14
Yeni Model - R-squared Score: 0.07


## 10. Sonuçların Karşılaştırılması

In [10]:
print(f"Orijinal Model R2 Skoru: {r2:.2f}, Yeni Model R2 Skoru: {r2_selected:.2f}")
print(f"Orijinal Model MSE: {mse:.2f}, Yeni Model MSE: {mse_selected:.2f}")

Orijinal Model R2 Skoru: 0.12, Yeni Model R2 Skoru: 0.07
Orijinal Model MSE: 18.10, Yeni Model MSE: 19.14
