#### For this example, let's use the popular Iris flower dataset. The task is to classify iris flowers into three different species based on their measurements.

##### Step 1: Importing Required Libraries.

We will start by importing the necessary libraries for the data processing, model training, and evaluation. In this case, we'll use scikit-learn, one of the most commonly used machine learning libraries.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

##### Step 2: Load and Explore the DataSet

Load the dataset into a pandas DataFrame and explore its structure to gain insights.

In [2]:
# Load the dataset
iris_data = pd.read_csv('Iris.csv')

In [3]:
# View the first few rows of the dataset
print(iris_data.head())

   Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species
0   1            5.1           3.5            1.4           0.2  Iris-setosa
1   2            4.9           3.0            1.4           0.2  Iris-setosa
2   3            4.7           3.2            1.3           0.2  Iris-setosa
3   4            4.6           3.1            1.5           0.2  Iris-setosa
4   5            5.0           3.6            1.4           0.2  Iris-setosa


In [4]:
# Check the shape of the dataset
print(iris_data.shape)

(150, 6)


In [5]:
# Check the distribution of the target classes
print(iris_data['Species'].value_counts())

Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: Species, dtype: int64


##### Step 3: Data Preprocessing

Prepare the data for model training by splitting it into input features (x) and the target variables (y). Additionally, split the dataset into training and testing set.

In [6]:
# Splitting the data into features (x) and target variable (y)
x = iris_data.drop('Species', axis=1)
y = iris_data['Species']

In [7]:
# Splitting the dataset into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

##### Step 4: Feature Selection

To ensure that all the features are on similar scale, we will apply feature scaling using the 'StandardScaler' class.

In [8]:
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

##### Step 5: Train the machine learning model

In this example, we've used three different algorithms: Logistic Regression, Decision Tree, and Random Forest. Each algorithm is trained on the preprocessed data, and the performance metrics (classification report and confusion matrix) are printed for each algorithm.

In [9]:
# Logistic Regression
logreg = LogisticRegression()
logreg.fit(x_train, y_train)

In [10]:
# Decision Tree
dt = DecisionTreeClassifier()
dt.fit(x_train, y_train)

In [11]:
# Random Forest
rf = RandomForestClassifier()
rf.fit(x_train, y_train)

##### Step 6: Evaluate the model

Finally, let us evaluate the trained model on the test set and examine the performance metrics.

In [12]:
# Logistic Regression
y_pred_logreg = logreg.predict(x_test)
print("Logistic Regression:")
print(classification_report(y_test, y_pred_logreg))
print(confusion_matrix(y_test, y_pred_logreg))
print()

Logistic Regression:
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30

[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]



In [13]:
# Decision Tree
y_pred_dt = dt.predict(x_test)
print("Decision Tree:")
print(classification_report(y_test, y_pred_dt))
print(confusion_matrix(y_test, y_pred_dt))
print()

Decision Tree:
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30

[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]



In [14]:
# Random Forest
y_pred_rf = rf.predict(x_test)
print("Random Forest:")
print(classification_report(y_test, y_pred_rf))
print(confusion_matrix(y_test, y_pred_rf))

Random Forest:
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30

[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
