Assignment Week 6 :-  Train multiple machine learning models and evaluate their performance using metrics such as accuracy, precision, recall, and F1-score. Implement hyperparameter tuning techniques like GridSearchCV and RandomizedSearchCV to optimize model parameters. Analyze the results to select the best-performing model.?

In [2]:
# Import Libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd

# Load Wine Dataset
wine = load_wine()
X, y = wine.data, wine.target

# Preprocessing: Standardization
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split Data
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Define Models
models = {
    "Logistic Regression": LogisticRegression(max_iter=500),
    "Random Forest": RandomForestClassifier(),
    "SVM": SVC(),
    "KNN": KNeighborsClassifier()
}

# Train and Evaluate Each Model
results = []

for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    results.append({
        "Model": name,
        "Accuracy": accuracy_score(y_test, y_pred),
        "Precision": precision_score(y_test, y_pred, average='weighted'),
        "Recall": recall_score(y_test, y_pred, average='weighted'),
        "F1 Score": f1_score(y_test, y_pred, average='weighted')
    })

results_df = pd.DataFrame(results)
print("\n📊 Model Performance (Before Tuning):\n")
print(results_df)

# Hyperparameter Tuning

# GridSearchCV for Random Forest
param_grid_rf = {
    'n_estimators': [50, 100, 150],
    'max_depth': [None, 5, 10]
}
grid_rf = GridSearchCV(RandomForestClassifier(), param_grid_rf, cv=3, scoring='f1_weighted')
grid_rf.fit(X_train, y_train)

# RandomizedSearchCV for SVM
param_dist_svc = {
    'C': [0.1, 1, 10, 100],
    'gamma': ['scale', 'auto', 0.01, 0.1],
    'kernel': ['rbf', 'linear']
}
rand_svc = RandomizedSearchCV(SVC(), param_distributions=param_dist_svc, n_iter=10, cv=3, scoring='f1_weighted', random_state=42)
rand_svc.fit(X_train, y_train)

# Evaluate Tuned Models
tuned_models = {
    "Tuned Random Forest": grid_rf.best_estimator_,
    "Tuned SVM": rand_svc.best_estimator_
}

for name, model in tuned_models.items():
    y_pred = model.predict(X_test)
    results.append({
        "Model": name,
        "Accuracy": accuracy_score(y_test, y_pred),
        "Precision": precision_score(y_test, y_pred, average='weighted'),
        "Recall": recall_score(y_test, y_pred, average='weighted'),
        "F1 Score": f1_score(y_test, y_pred, average='weighted')
    })

# Final Results
final_results_df = pd.DataFrame(results)
final_sorted = final_results_df.sort_values(by='F1 Score', ascending=False)
print("\n🏆 Final Model Comparison (After Tuning):\n")
print(final_sorted)

# Best Model Summary
# F1-score (weighted) is preferred in multi-class classification like the wine dataset.
best_model = final_sorted.iloc[0]
print(f"\n✅ Best Performing Model: {best_model['Model']} with F1 Score: {best_model['F1 Score']:.4f}")



📊 Model Performance (Before Tuning):

                 Model  Accuracy  Precision    Recall  F1 Score
0  Logistic Regression  1.000000   1.000000  1.000000  1.000000
1        Random Forest  1.000000   1.000000  1.000000  1.000000
2                  SVM  1.000000   1.000000  1.000000  1.000000
3                  KNN  0.944444   0.949383  0.944444  0.943604

🏆 Final Model Comparison (After Tuning):

                 Model  Accuracy  Precision    Recall  F1 Score
0  Logistic Regression  1.000000   1.000000  1.000000  1.000000
1        Random Forest  1.000000   1.000000  1.000000  1.000000
2                  SVM  1.000000   1.000000  1.000000  1.000000
4  Tuned Random Forest  1.000000   1.000000  1.000000  1.000000
5            Tuned SVM  1.000000   1.000000  1.000000  1.000000
3                  KNN  0.944444   0.949383  0.944444  0.943604

✅ Best Performing Model: Logistic Regression with F1 Score: 1.0000
