<a href="https://www.kaggle.com/code/imharshkashyap/iris-flower-data-visualization-python?scriptVersionId=149401040" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
# First, we'll import pandas, a data processing and CSV file I/O library
import pandas as pd

# We'll also import seaborn, a Python graphing library
import warnings # current version of seaborn generates a bunch of warnings that we'll ignore
warnings.filterwarnings("ignore")
import seaborn as sb
import matplotlib.pyplot as plt
sb.set(style="dark", color_codes=True)

 
# the iris dataset is now a Pandas DataFrame

# Press shift+enter to execute this cell


# **Let's explore the iris data set**

In [None]:
# Next, we'll load the Iris flower dataset, which is in the "../input/" directory
iris = pd.read_csv("/kaggle/input/iris-flower-visualization-using-python/Iris.csv")

In [None]:
# Let's see what's in the iris data - Jupyter notebooks print the result of the last thing you do
iris.head()

In [None]:
iris["Species"].value_counts()

# **Data visualization**

In [None]:
# Let's see how many examples we have of each species
import matplotlib.pyplot as plt

# Assuming we have a pandas DataFrame named 'iris' with a column 'Species'
species_counts = iris["Species"].value_counts()
#creat bar_chart
plt.bar(species_counts.index, species_counts.values)

#insert lable & title
plt.xlabel("Species")
plt.ylabel("Count")
plt.title("Count of each species")

#display chart
plt.show()

In [None]:
# The first way we can plot things is using the .plot extension from Pandas dataframes
# We'll use this to make a scatterplot of the Iris features.
iris.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm")

#insert lable & title
plt.xlabel("Sepal Length")
plt.ylabel("Sepal Width")
# title decoration
title_txt= "Scatter Plot of Sepal Length vs Sepal Width (in Cm)"
title_font={'fontsize': 12, 'fontweight':'bold','color': 'blue'}

plt.title(title_txt, fontdict=title_font)

#display 
plt.show()


In [None]:
# We can also use the seaborn library to make a similar plot
# A seaborn jointplot shows bivariate scatterplots and univariate histograms in the same figure
sb.jointplot(x="SepalLengthCm", y="SepalWidthCm", data=iris, height=7)
plt.xlabel("Sepal Length")
title_txt="Scater Plot of Sepal Length vs Sepal Width (in Cm)"
plt.ylabel("Sepal Width")
plt.title(title_txt, title_font, pad=20)
# adjusting graph layout to avoid overlapping 
plt.tight_layout()
plt.show()


In [None]:
# One piece of information missing in the plots above is what species each plant is
# We'll use seaborn's FacetGrid to color the scatterplot by species
sb.FacetGrid(iris, hue="Species", height=5) \
   .map(plt.scatter, "SepalLengthCm", "SepalWidthCm") \
   .add_legend()
plt.xlabel("Sepal Length")
plt.ylabel("Sepal Width")
plt.title(title_txt, title_font, pad= 20)


In [None]:
# We can look at an individual feature in Seaborn through a boxplot
sb.boxplot(x="Species", y="PetalLengthCm", data=iris)
plt.xlabel("Species")
plt.ylabel("Petal Length")
#plt.title("Box plot of Seble Species vs Petal Length (in Cm)")
title_txt= "Box Plot of Species vs Petal Length (in Cm)"
title_font = {'fontsize': 12, 'fontweight': 'bold', 'color': 'blue'}
plt.title(title_txt, fontdict=title_font, pad=20)




In [None]:
# One cool more sophisticated technique pandas has available is called Andrews Curves
# Andrews Curves involve using attributes of samples as coefficients for Fourier series
# and then plotting these
from pandas.plotting import andrews_curves
andrews_curves(iris.drop("Id", axis=1), "Species")
plt.xlabel("Instances")
plt.ylabel("Attributes")
plt.title("Andrews Curves Plot of Iris")
plt.show()

In [None]:
# Another multivariate visualization technique pandas has is parallel_coordinates
# Parallel coordinates plots each feature on a separate column & then draws lines
# connecting the features for each data sample
import matplotlib.pyplot as plt
import pandas as pd
from pandas.plotting import parallel_coordinates
parallel_coordinates(iris.drop("Id", axis=1), "Species")
plt.xlabel("Features of Iris Species")
plt.ylabel("Feature Values")

title_txt = "Parallel Coordinates Plot of Iris Dataset"
title_font = {'fontsize': 12, 'fontweight': 'bold', 'color': 'blue'}
plt.title(title_txt, title_font, pad= 20)
plt.show()



In [None]:
"""A final multivariate visualization technique pandas has is radviz
Which puts each feature as a point on a 2D plane, and then simulates
having each sample attached to those points through a spring weighted
by the relative value for that feature"""
from pandas.plotting import radviz
radviz(iris.drop("Id", axis=1), "Species")
plt.title("2D Visualization of Iris Dataset using Radviz Plot")
plt.tight_layout()
plt.show()

In [None]:
# Assuming you have a pandas DataFrame named 'iris' with columns 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', and 'Species'

# Create a figure and a 3D subplot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

# Scatter plot with three features
ax.scatter(iris['SepalLengthCm'], iris['SepalWidthCm'], iris['PetalLengthCm'], c='b', marker='o')

"""# Set labels for each axis
ax.set_xlabel('Sepal Length (cm)')
ax.set_ylabel('Sepal Width (cm)')
ax.set_zlabel('Petal Length (cm)')

# Set title
plt.title('3D Scatter Plot of Iris Dataset')"""

# Display the plot
plt.show()


In [None]:
import plotly.express as px

# Create a 3D scatter plot
fig = px.scatter_3d(iris, x='SepalLengthCm', y='SepalWidthCm', z='PetalLengthCm', color='Species')

# Update layout for rotation
fig.update_layout(scene=dict(camera=dict(eye=dict(x=1.5, y=1.5, z=1.5))))

# Display the plot in full-size window
fig.show(fullscreen=True)


In [None]:
import plotly.express as px

# Create a 3D scatter plot with a different color attribute
fig = px.scatter_3d(iris, x='SepalLengthCm', y='SepalWidthCm', z='PetalLengthCm', color='PetalWidthCm')

# Update layout for rotation
fig.update_layout(scene=dict(camera=dict(eye=dict(x=1.5, y=1.5, z=1.5)))
,title = "Color indication visualize the width of Petal (in Cm)"
)
# Display the plot in full-size window
fig.show(fullscreen=True)



In [None]:
iris.describe()

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

# Specify the path to the image file
image_path = "/kaggle/input/3d-plot/newplot.png"

# Read the image file
image = mpimg.imread(image_path)

# Display the image
plt.imshow(image)
plt.axis('off')  # Optional: Turn off axis labels
plt.show()



In [None]:
import graphviz
from sklearn.datasets import load_iris
from sklearn import tree

iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

dot_data = tree.export_graphviz(clf, out_file=None,
                                feature_names=iris.feature_names,
                                class_names=iris.target_names,
                                filled=True, rounded=True,
                                special_characters=True)

graph = graphviz.Source(dot_data)
graph.render("iris")


In [None]:
# Adapted from https://scikit-learn.org/stable/modules/tree.html#tree
import plotly.express as px
from sklearn.datasets import load_iris
from sklearn import tree
import graphviz 
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
dot_data = tree.export_graphviz(clf, out_file=None) 
graph = graphviz.Source(dot_data) 
graph.render("iris")
dot_data = tree.export_graphviz(clf, out_file=None,
                                feature_names=iris.feature_names,  
                                class_names=iris.target_names,  
                                filled=True, rounded=True,  
                                special_characters=True) 
graph = graphviz.Source(dot_data)  
graph


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import LabelEncoder

# Assuming you have a dataframe 'df' containing the iris dataset
# and a column 'predicted' containing the predicted labels

# Preprocess the dataset by encoding the categorical variable
label_encoder = LabelEncoder()
iris['Species_encoded'] = label_encoder.fit_transform(iris['Species'])

# Perform PCA on the dataset
pca = PCA(n_components=3)
X_reduced = pca.fit_transform(iris.drop(['Species'], axis=1))

# Create a scatter plot to visualize the reduced data
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(111, projection="3d", elev=-150, azim=110)
ax.scatter(
    X_reduced[:, 0],
    X_reduced[:, 1],
    X_reduced[:, 2],
    c=iris['Species_encoded'],
    cmap='viridis',
    edgecolor='k',
    s=40
)
ax.set_title("PCA Scatter Plot")
ax.set_xlabel("PC1")
ax.set_ylabel("PC2")
ax.set_zlabel("PC3")
plt.show()


In [None]:
iris.columns

In [None]:
# im just making a function in order not to repeat the same code
%matplotlib inline
def plot_violin(y2,i):
    plt.subplot(2,2,i)
    
    sns.violinplot(x='Species',y= y2, data=iris, saturation = 1, bw='scott', scale='area')

In [None]:

plt.figure(figsize=(17,12))
i = 1
for measurement in iris.columns[:-2]:
    plt.subplot(2, 2, i)  # Create a subplot
    plot_violin(measurement,i)
    sns.despine(offset=10, trim=True)
    i += 1


From the above violin plots we can notice hight density of the length and width of sentosa species, especialy for sepal length, petal length and petal width. Also we can observe that the mean values and the interquartile range for the petal measurements are easily distinguish, althought the values of virginica species are more spreaded.

In [None]:
iris.head()

In [None]:
iris.columns

In [None]:
sns.pairplot(iris, hue = 'Species', vars = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm'], palette = 'Set1' );

From the above plots we can notice that the three different types can easily been spotted by their petal and sepal measurements. Thus a ML model could learn how to separate them.

Lets also produce a heatmap above to find out the correlations between the measurements

In [None]:
iris.iloc[:,:4].corr()

In [None]:
fig, axes = plt.subplots(figsize=(6,6))
sns.heatmap(iris.iloc[:,:4].corr(), annot = True, cbar=False)
axes.tick_params(labelrotation=45)
plt.title('Correlation heatmap', fontsize = 15);

The sepal width and petal width are moderately negatively correlated, with a correlation coefficient of -0.420516. This suggests that as the sepal width increases, the petal width tends to decrease, and vice versa.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Create a scatter plot to visualize the correlation between sepal width and petal width
plt.figure(figsize=(8, 6))
sns.scatterplot(data=iris, x='SepalWidthCm', y='PetalWidthCm')
plt.title('Correlation between Sepal Width and Petal Width')
plt.xlabel('Sepal Width (cm)')
plt.ylabel('Petal Width (cm)')
plt.show()


# **Training The ML model**

Let's go to use and test the k-Nearest Neighbors model and since out data does not seem "noisy" we can choose a small value of k. We will set the k to 3.

Although we noticed that high correlaton between the petal width and length measurements, we will use all the mesurements available at the moment, and later check which gives the better accuracy.

Furthermore keep in mind that KNN is calculating the euclidean distance between the point we want to predict and the nearest(s) training data point(s) (neighbor). To this end scaling (normalizing) the data before applying the alogirthm usually is a good approach. However in our case all the data use the same unit of measurement (cm) so this is not necessary.

Let's call the train_test_split to split our data

In [None]:
from sklearn.model_selection import train_test_split


In [None]:
iris.columns

In [None]:
x = iris[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']]
y = iris['Species']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state = 0)

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)

In [None]:
knn.fit(X_train, y_train)

# **Evaluating The Model**

In [None]:
y_pred = knn.predict(X_test)

In [None]:
y_pred

Let's calculate the accuracy with knn.score()

In [None]:
print(f'Our model accuracy with k=3 is: {knn.score(X_test, y_test)}')

In [None]:
def print_confusion_matrix(confusion_matrix, class_names, figsize=(9, 7), fontsize=14):
    df_cm = pd.DataFrame(confusion_matrix, index=class_names, columns=class_names)
    fig = plt.figure(figsize=figsize)
    try:
        heatmap = sns.heatmap(df_cm, annot=True, fmt="d", cbar=False)
    except ValueError:
        raise ValueError("Confusion matrix values must be integers.")
    heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0, ha='right', fontsize=fontsize)
    heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(), rotation=45, ha='right', fontsize=fontsize)
    plt.ylabel('True label', fontsize=12)
    plt.xlabel('Predicted label', fontsize=12)
    plt.title('Confusion Matrix', fontsize=16)
    plt.show()


In [None]:
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
print(classification_report(y_test, y_pred))

In [None]:
print_confusion_matrix(confusion_matrix(y_test, y_pred), ['Iris-virginica', 'Iris-versicolor', 'Iris-setosa'])

In [None]:
print(classification_report(y_test, y_pred))

Based on the provided precision, recall, f1-score, and support values, we can draw the following insights from the table:

**1. Accuracy:** The overall accuracy of the model is 0.97, indicating that it correctly predicted the class labels for 97% of the samples in the dataset.

**2. Class-wise Performance:**
   - Iris-setosa: The model achieved perfect precision, recall, and f1-score of 1.00 for predicting Iris-setosa class. This means that all the samples predicted as Iris-setosa were actually Iris-setosa.
   - Iris-versicolor: The model achieved a precision of 1.00, indicating that all the samples predicted as Iris-versicolor were correct. The recall of 0.94 suggests that the model correctly identified 94% of the Iris-versicolor samples. The f1-score of 0.97 indicates a good balance between precision and recall for this class.
   - Iris-virginica: The model achieved a precision of 0.90, indicating that 90% of the samples predicted as Iris-virginica were correct. The recall of 1.00 suggests that the model correctly identified all the Iris-virginica samples. The f1-score of 0.95 indicates a good overall performance for this class.

**3. Macro Average:** The macro average of precision, recall, and f1-score is 0.97, indicating a high overall performance across all classes.

**4. Weighted Average:** The weighted average of precision, recall, and f1-score is 0.98, indicating that the model performed slightly better on classes with larger support.

*1. Precision means how many predictions were correct out of the number of the predicted class. Precision = TP/(TP + FP) For sentosa and versicolor the KNN achieved perfect precision, while for virginica 90%, meaning that out of all the predicted labels assigned as virginica the 90% were correct. More precisely the model predicted 10 flowers as virginica, while the 9 were correct predictions (TP), and 1 was wrong (FP).*

**Precision_virginica 9/(9+1) = 0.9**

*2. Recall means how many predictions were correct out of the actual number of the specific class. Recall = TP/(TP + FN). For versicolor the recalll score was 94%, meaning that the model predicted correct 15 versicolor flowers (TP), while 1 of them was assigned incorrectly as virginica (FN).*

**Recall_versicolor = 15/(15 + 1) = 0.94.**


Overall, the model shows high accuracy and performs well in predicting the different classes of the iris dataset. It achieves near-perfect performance for Iris-setosa and Iris-versicolor classes, while maintaining a good balance between precision and recall for Iris-virginica class.

In [None]:
iris.columns

In [None]:
fig = plt.figure(figsize=(15,7))

ax1 = fig.add_subplot(1,2,1)
ax1 = sns.scatterplot(x = X_test['PetalLengthCm'], y = X_test['PetalWidthCm'], hue = y_pred, alpha = 0.5)
plt.title('Predicted')
plt.legend(title='Species')

ax2 = fig.add_subplot(1,2,2)
ax2 = sns.scatterplot(x = X_test['PetalLengthCm'], y = X_test['PetalWidthCm'], hue = y_test, alpha = 0.5, )
plt.title('Actual');

# **Thanks for watching model till here.**