# Iris Classification with Random Forest

This notebook demonstrates a classic machineâ€‘learning workflow on the Iris flower dataset. We'll perform exploratory data analysis, train a Random Forest classifier and evaluate its performance.


In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Set plots to display inline
%matplotlib inline


In [None]:
# Load the Iris dataset and create a DataFrame
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target
df['target_name'] = df['target'].map({i: name for i, name in enumerate(iris.target_names)})

df.head()


## Visualize the data

A pairplot helps to see relationships between features and the class labels.


In [None]:
# Pairwise scatter plots
sns.pairplot(df, hue='target_name', vars=iris.feature_names)
plt.show()


In [None]:
# Split the data into training and test sets
X = df[iris.feature_names]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# Train a Random Forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Make predictions on the test set
preds = clf.predict(X_test)

# Evaluate accuracy
print(f'Accuracy: {accuracy_score(y_test, preds):.3f}')


In [None]:
# Confusion matrix
cm = confusion_matrix(y_test, preds)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

# Detailed classification report
print(classification_report(y_test, preds, target_names=iris.target_names))


## Conclusion

The Random Forest classifier achieves high accuracy on the Iris dataset. While this problem is relatively easy, the notebook demonstrates essential steps: data loading, exploration, model training, evaluation, and visualization.
