# Task
Train multiple machine learning models on the Iris dataset, evaluate their performance using accuracy, precision, recall, and F1-score, implement hyperparameter tuning with GridSearchCV and RandomizedSearchCV, and select the best-performing model.

## Load and preprocess data

### Subtask:
Load the Iris dataset from `sklearn.datasets`. Split the data into training and testing sets.


**Reasoning**:
Load the Iris dataset and split it into training and testing sets as per the instructions.



In [10]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Train multiple models

### Subtask:
Train several different machine learning models (e.g., Logistic Regression, Support Vector Machine, Decision Tree, Random Forest) on the training data.


**Reasoning**:
Import the necessary model classes and train each model on the training data.



In [11]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

# Instantiate each model
log_reg = LogisticRegression(max_iter=200) # Increased max_iter to prevent convergence warning
svc = SVC()
dt_classifier = DecisionTreeClassifier()
rf_classifier = RandomForestClassifier()

# Train each model
log_reg.fit(X_train, y_train)
svc.fit(X_train, y_train)
dt_classifier.fit(X_train, y_train)
rf_classifier.fit(X_train, y_train)

print("Models trained successfully.")

Models trained successfully.


## Implement hyperparameter tuning

### Subtask:
Use techniques like GridSearchCV or RandomizedSearchCV to find the optimal hyperparameters for each model.


**Reasoning**:
Import the necessary modules for hyperparameter tuning and define the parameter grids for each model.



In [12]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

# Define parameter grids for each model
log_reg_param_grid = {
    'C': [0.1, 1, 10, 100],
    'solver': ['liblinear', 'lbfgs']
}

svc_param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': ['scale', 'auto'],
    'kernel': ['rbf', 'linear']
}

dt_param_grid = {
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

rf_param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

**Reasoning**:
Instantiate and fit GridSearchCV or RandomizedSearchCV for each model using the defined parameter grids and the training data.



In [13]:
# Instantiate and fit GridSearchCV for Logistic Regression
log_reg_grid_search = GridSearchCV(LogisticRegression(max_iter=200), log_reg_param_grid, cv=5)
log_reg_grid_search.fit(X_train, y_train)
best_log_reg = log_reg_grid_search.best_estimator_

# Instantiate and fit GridSearchCV for SVC
svc_grid_search = GridSearchCV(SVC(), svc_param_grid, cv=5)
svc_grid_search.fit(X_train, y_train)
best_svc = svc_grid_search.best_estimator_

# Instantiate and fit GridSearchCV for Decision Tree
dt_grid_search = GridSearchCV(DecisionTreeClassifier(), dt_param_grid, cv=5)
dt_grid_search.fit(X_train, y_train)
best_dt_classifier = dt_grid_search.best_estimator_

# Instantiate and fit RandomizedSearchCV for Random Forest (demonstrating both methods)
rf_random_search = RandomizedSearchCV(RandomForestClassifier(), rf_param_grid, cv=5, n_iter=10) # Use n_iter for RandomizedSearch
rf_random_search.fit(X_train, y_train)
best_rf_classifier = rf_random_search.best_estimator_

print("Hyperparameter tuning completed and best estimators stored.")

Hyperparameter tuning completed and best estimators stored.


## Evaluate model performance

### Subtask:
Evaluate the performance of each tuned model on the testing data using metrics such as accuracy, precision, recall, and F1-score.


**Reasoning**:
Import the necessary evaluation metrics and initialize a dictionary to store the performance metrics.



In [14]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

performance_metrics = {}

**Reasoning**:
Evaluate each best model on the test set and store the metrics in the performance dictionary.



In [15]:
# Evaluate Logistic Regression
log_reg_pred = best_log_reg.predict(X_test)
performance_metrics['Logistic Regression'] = {
    'Accuracy': accuracy_score(y_test, log_reg_pred),
    'Precision': precision_score(y_test, log_reg_pred, average='weighted'),
    'Recall': recall_score(y_test, log_reg_pred, average='weighted'),
    'F1-score': f1_score(y_test, log_reg_pred, average='weighted')
}

# Evaluate SVC
svc_pred = best_svc.predict(X_test)
performance_metrics['SVC'] = {
    'Accuracy': accuracy_score(y_test, svc_pred),
    'Precision': precision_score(y_test, svc_pred, average='weighted'),
    'Recall': recall_score(y_test, svc_pred, average='weighted'),
    'F1-score': f1_score(y_test, svc_pred, average='weighted')
}

# Evaluate Decision Tree
dt_pred = best_dt_classifier.predict(X_test)
performance_metrics['Decision Tree'] = {
    'Accuracy': accuracy_score(y_test, dt_pred),
    'Precision': precision_score(y_test, dt_pred, average='weighted'),
    'Recall': recall_score(y_test, dt_pred, average='weighted'),
    'F1-score': f1_score(y_test, dt_pred, average='weighted')
}

# Evaluate Random Forest
rf_pred = best_rf_classifier.predict(X_test)
performance_metrics['Random Forest'] = {
    'Accuracy': accuracy_score(y_test, rf_pred),
    'Precision': precision_score(y_test, rf_pred, average='weighted'),
    'Recall': recall_score(y_test, rf_pred, average='weighted'),
    'F1-score': f1_score(y_test, rf_pred, average='weighted')
}

**Reasoning**:
Print the performance metrics for each model to display the evaluation results.



In [16]:
for model_name, metrics in performance_metrics.items():
    print(f"Performance Metrics for {model_name}:")
    for metric_name, value in metrics.items():
        print(f"  {metric_name}: {value:.4f}")
    print("-" * 30)

Performance Metrics for Logistic Regression:
  Accuracy: 1.0000
  Precision: 1.0000
  Recall: 1.0000
  F1-score: 1.0000
------------------------------
Performance Metrics for SVC:
  Accuracy: 1.0000
  Precision: 1.0000
  Recall: 1.0000
  F1-score: 1.0000
------------------------------
Performance Metrics for Decision Tree:
  Accuracy: 1.0000
  Precision: 1.0000
  Recall: 1.0000
  F1-score: 1.0000
------------------------------
Performance Metrics for Random Forest:
  Accuracy: 1.0000
  Precision: 1.0000
  Recall: 1.0000
  F1-score: 1.0000
------------------------------


## Analyze and select the best model

### Subtask:
Compare the performance metrics of all the models and select the one that performs best based on the chosen evaluation criteria.


**Reasoning**:
Iterate through the performance_metrics dictionary and print the performance metrics for each model. Based on the metrics, acknowledge that all models performed equally well.



In [17]:
for model_name, metrics in performance_metrics.items():
    print(f"Performance Metrics for {model_name}:")
    for metric_name, value in metrics.items():
        print(f"  {metric_name}: {value:.4f}")
    print("-" * 30)

print("Based on the performance metrics, all models achieved perfect scores on the test set.")
print("Therefore, based purely on these metrics, all models perform equally well on this specific test set.")

Performance Metrics for Logistic Regression:
  Accuracy: 1.0000
  Precision: 1.0000
  Recall: 1.0000
  F1-score: 1.0000
------------------------------
Performance Metrics for SVC:
  Accuracy: 1.0000
  Precision: 1.0000
  Recall: 1.0000
  F1-score: 1.0000
------------------------------
Performance Metrics for Decision Tree:
  Accuracy: 1.0000
  Precision: 1.0000
  Recall: 1.0000
  F1-score: 1.0000
------------------------------
Performance Metrics for Random Forest:
  Accuracy: 1.0000
  Precision: 1.0000
  Recall: 1.0000
  F1-score: 1.0000
------------------------------
Based on the performance metrics, all models achieved perfect scores on the test set.
Therefore, based purely on these metrics, all models perform equally well on this specific test set.


## Summary:

### Data Analysis Key Findings

*   All four machine learning models (Logistic Regression, SVC, Decision Tree, and Random Forest), after hyperparameter tuning using GridSearchCV or RandomizedSearchCV, achieved perfect scores (1.0000) for Accuracy, Precision, Recall, and F1-score on the test set of the Iris dataset.

