# The Tale of the Iris Flower: A Classification Adventure

**Introduction:** Once upon a time, in the world of data, there existed a famous dataset known as the Iris dataset. It holds the measurements of three different species of Iris flowers: Setosa, Versicolor, and Virginica. Our quest is to build a magical model that can predict the species of an Iris flower based on its measurements.


## Chapter 1: Gathering the Seeds (Loading the Data)

First, we need to import our tools and load the dataset. These libraries are our trusty companions on this journey.


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os

input_path = os.path.join('..', 'data', 'raw')
output_path = os.path.join('..', 'data', 'processed')


Now, let's plant our seeds by reading the `IRIS.csv` file. We'll take a peek at the first few rows to see what our data looks like.


In [None]:
flower_df = pd.read_csv(os.path.join(input_path, 'IRIS.csv'))
flower_df.head()


## Chapter 2: The First Bloom (Exploring the Data)

Our seeds have sprouted! Let's examine our data to understand its structure and characteristics.


In [None]:
flower_df.info()


We have 150 flowers in our garden, with 50 of each species. A perfectly balanced dataset!


In [None]:
flower_df.value_counts('species')


## Chapter 3: Painting a Picture (Data Visualization)

Let's create a beautiful visualization to see how the different species are distinguished by their features. The pairplot will help us see the relationships between all pairs of features.


In [None]:
sns.pairplot(flower_df, hue='species')
plt.show()


From the pairplot, we can see that the Iris-setosa species is easily separable from the other two. Iris-versicolor and Iris-virginica are more similar to each other, but we can still see some separation.


## Chapter 4: Preparing the Soil (Data Preprocessing)

Before we train our model, we need to prepare the data. We'll separate the features (X) from the target (y), which is the species of the flower.


In [None]:
# Separate Features (X) and Target (y)
X = flower_df.drop('species', axis=1)
y = flower_df['species']


Next, we'll split our data into a training set and a testing set. The training set will be used to teach our model, and the testing set will be used to evaluate its performance.


In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## Chapter 5: The Oracle's Prediction (Training the Model)

Now it's time to build our predictive model. We will use the K-Nearest Neighbors (KNN) algorithm, which is a simple yet powerful classification algorithm.


In [None]:
from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)


## Chapter 6: The Moment of Truth (Evaluating the Model)

Our model has been trained! Let's see how well it performs on the test data.


In [None]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")


A perfect score! Our model is a true oracle. Let's look at the classification report for a more detailed breakdown.


In [None]:
print("\nClassification Report:")
print(classification_report(y_test, y_pred))


The confusion matrix will show us if there were any misclassifications.


In [None]:
print("\nConfusion Matrix:")
cm = confusion_matrix(y_test, y_pred)
print(cm)


In [None]:
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues",
            xticklabels=model.classes_, yticklabels=model.classes_)
plt.xlabel("Predicted Species")
plt.ylabel("Actual Species")
plt.title("Confusion Matrix")
plt.show()


**Conclusion:** Our journey has been a success! We've built a model that can perfectly classify Iris flowers based on their measurements. This notebook tells the story of our adventure, from the first seeds of data to the final bloom of a predictive model.
