<a href="https://colab.research.google.com/github/AbelAdissu/iris-dataset-analysis/blob/main/Iris_Dataset_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Cell 1: Loading Data and Data Preprocessing

In [4]:
from sklearn.datasets import load_iris
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder

# Load the Iris dataset
iris = load_iris()
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)
data['target'] = iris.target

# Split the dataset into features (X) and the target variable (y)
X = data.drop('target', axis=1)
y = data['target']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Perform feature scaling (standardization) on the training and testing data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Encode the target variable (if it's not already encoded)
label_encoder = LabelEncoder()
y_train = label_encoder.fit_transform(y_train)
y_test = label_encoder.transform(y_test)

Explanation 1:

In this cell, we import the required libraries and load the Iris dataset.
We create a DataFrame from the dataset, separate features (X) and the target variable (y).
The dataset is split into training and testing sets using train_test_split.
Feature scaling is performed with standardization.
The target variable is encoded using LabelEncoder.
Please note that data preprocessing is an essential step in machine learning to ensure that the data is in a suitable format for modeling.



Cell 2: K-Nearest Neighbors (KNN) and Cross-Validation

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score, KFold
from sklearn.model_selection import GridSearchCV

k_range = list(range(1, 32))
param_grid = dict(n_neighbors=k_range, weights=["uniform", "distance"])
knn = KNeighborsClassifier()
grid = GridSearchCV(knn, param_grid, cv=10, scoring="accuracy")
grid.fit(X, y)


Explanation 2:

In this cell, we import the KNeighborsClassifier and relevant modules for model evaluation.
We define a range of k values (number of neighbors) and create a parameter grid including weight options (uniform and distance).
A KNN classifier is instantiated for grid searching.
GridSearchCV is used to perform hyperparameter tuning with 10-fold cross-validation.


Cell 3: KNN Model Testing

In [None]:
knn = KNeighborsClassifier(n_neighbors=1, weights="uniform")
knn.fit(X, y)
knn.predict([[3, 5, 4, 2]])


Explanation 3:

In this cell, we create a KNN classifier with n_neighbors=1 and weights="uniform".
The model is trained on the entire dataset.
We make a prediction for a sample input [3, 5, 4, 2] to demonstrate how the model works.
This section is for understanding the model behavior with specific hyperparameters and data. It's essential for educational purposes to see the model in action.