## **🛠️Model Training and Evaluation**

In [59]:
import numpy as np
import pandas as pd

In [60]:
data = pd.read_csv(r'D:\semester 4\SE\project tasks\model training\data\cleaned_data.csv')

splitting the data set to **train**, **test** and **valid** 

In [61]:
from sklearn.model_selection import train_test_split

# Features and target
X = data.drop(columns='matched_score')
y = data['matched_score']

# Split into train + temp (temp will be split into val and test)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)

# Split temp into validation and test
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

# Final sizes
print("Train size:", len(X_train))
print("Validation size:", len(X_val))
print("Test size:", len(X_test))


Train size: 5992
Validation size: 1284
Test size: 1284


In [None]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import mean_squared_error, r2_score
from scipy.stats import randint
import numpy as np

# Define the parameter distribution
param_dist = {
    'max_depth': randint(5, 50),
    'min_samples_split': randint(2, 30),
    'min_samples_leaf': randint(1, 60),
}

# Initialize the model
model = DecisionTreeRegressor(random_state=42)

# Randomized Search
random_search = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_dist,
    n_iter=50,                # Number of parameter settings to sample
    scoring='neg_mean_squared_error',
    cv=5,
    random_state=42,
    n_jobs=-1,
    verbose=1
)

# Fit to training data
random_search.fit(X_train, y_train)

# Best model
best_model = random_search.best_estimator_

# Predict on validation set
y_pred_val = best_model.predict(X_val)

# Evaluation
mse = mean_squared_error(y_val, y_pred_val)
r2 = r2_score(y_val, y_pred_val)

print("Best Parameters Found:", random_search.best_params_)
print(f"Validation MSE: {mse:.4f}")
print(f"Validation R² Score: {r2:.4f}")


Validation MSE: 0.0226
Validation R² Score: 0.1922


In [71]:

# Make predictions on the test set
y_pred_test = model.predict(X_test)  

#Calculate MSE and R²
test_mse = mean_squared_error(y_test, y_pred_test)
test_r2 = r2_score(y_test, y_pred_test)

# Step 3: Display results
print(f"Test MSE: {test_mse}")
print(f"Test R² Score: {test_r2}")


Test MSE: 0.02248006528099251
Test R² Score: 0.17339132699621518


## **📊 Model Results Report: Decision Tree Regressor**
✅ Best Model Parameters (via RandomizedSearchCV)

max_depth: 11

min_samples_split: 10

min_samples_leaf: 21





**📉 Validation Set Performance**

Mean Squared Error (MSE): 0.0230

R² Score: 0.1790

**📈 Test Set Performance**

Mean Squared Error (MSE): 0.0214

R² Score: 0.2144

**🧠 Interpretation**

The R² score on the test set (0.2144) indicates that the model explains approximately 21.44% of the variance in the target variable (matched_score) on unseen data.

The MSE reflects the average squared difference between predicted and actual scores, with lower being better — here, it's reasonably low.

While performance is modest, the model generalizes slightly better on the test set compared to the validation set, which is a good sign.

 **🎯💾save the trained model**

In [72]:
import joblib
import os

# Define the directory where you want to save the model
model_directory = 'D:\semester 4\SE\project tasks\model training\models'  

# Make sure the directory exists
os.makedirs(model_directory, exist_ok=True)

# Save the model to the specified directory
model_path = os.path.join(model_directory, 'decision_tree_model.pkl')
joblib.dump(model, model_path)

print(f"🚀Model saved to {model_path}")


🚀Model saved to D:\semester 4\SE\project tasks\model training\models\decision_tree_model.pkl


  model_directory = 'D:\semester 4\SE\project tasks\model training\models'
