## Classification algorithms: 
In this analysis, I use the Iris dataset to build a classification model that predicts the species of a flower based on its physical measurements. I trained a Random Forest Classifier and evaluated its performance using a classification report and confusion matrix.

In [29]:
# Importing necessary libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix


In [31]:
# Load the Iris dataset
iris = load_iris()

# Convert it to a pandas DataFrame for easier manipulation
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

# Add the target (species) to the DataFrame
df['species'] = iris.target

# Show the first few rows of the dataset
df.head()


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [5]:
# Define features (X) and target variable (y)
X = df[iris.feature_names]  # Features: Sepal length, Sepal width, Petal length, Petal width
y = df['species']  # Target: Species of the flower


In [7]:
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


In [9]:
# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


In [11]:
# Create and train the Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)


In [13]:
# Make predictions on the test set
y_pred = clf.predict(X_test)


In [15]:
X_sample = X_test[:5]
print("Sample Features (X_test):")
print(X_sample)

Sample Features (X_test):
[[ 0.3100623  -0.50256349  0.484213   -0.05282593]
 [-0.17225683  1.89603497 -1.26695916 -1.27039917]
 [ 2.23933883 -0.98228318  1.76840592  1.43531914]
 [ 0.18948252 -0.26270364  0.36746819  0.35303182]
 [ 1.15412078 -0.50256349  0.54258541  0.2177459 ]]


In [17]:
y_sample_pred = clf.predict(X_sample)
print("\nPredictions (Species):")
print(y_sample_pred)


Predictions (Species):
[1 0 2 1 1]


In [19]:
species_names = iris.target_names
print("\nPredicted Species (names):")
print(species_names[y_sample_pred])



Predicted Species (names):
['versicolor' 'setosa' 'virginica' 'versicolor' 'versicolor']


In [21]:
# Evaluate the model
print("Classification Report:")
print(classification_report(y_test, y_pred))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))


Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

Confusion Matrix:
[[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]


### Conclusions
This classification analysis using the Iris dataset allowed me to apply a complete machine learning workflow—from loading and exploring the data, to preprocessing, training a model, and evaluating its performance. The Random Forest Classifier achieved excellent results, demonstrating the effectiveness of this algorithm for well-structured and clearly separated data. 