# Iris Classification with Logistic Regression

This notebook demonstrates a classification model for the Iris dataset using Logistic Regression from Scikit-learn. The Iris dataset contains 150 samples of iris flowers with 4 features (sepal length, sepal width, petal length, petal width) and 3 classes (Setosa, Versicolor, Virginica). The goal is to predict the species of an iris flower based on its features.

In [3]:
# Import required libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

## Loading and Preparing the Dataset

The Iris dataset is loaded from Scikit-learn. It is split into 80% training and 20% testing sets to evaluate the model's performance.

In [6]:
# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features (sepal length, sepal width, petal length, petal width)
y = iris.target  # Target (species: 0=Setosa, 1=Versicolor, 2=Virginica)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display dataset shapes
print("Training set shape:", X_train.shape)
print("Testing set shape:", X_test.shape)

Training set shape: (120, 4)
Testing set shape: (30, 4)


## Model Training

A Logistic Regression model is trained on the training set. Logistic Regression is a suitable algorithm for multi-class classification tasks like this one.

In [11]:
# Initialize and train the model
model = LogisticRegression(max_iter=200, random_state=42)
model.fit(X_train, y_train)

## Evaluation

The model is evaluated on the test set using accuracy score, which measures the proportion of correctly predicted instances.

In [14]:
# Make predictions on the test set
predictions = model.predict(X_test)

# Calculate and display accuracy
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

# Display sample predictions
for features, true_label, pred_label in zip(X_test[:5], y_test[:5], predictions[:5]):
    print(f"Features: {features}, True Label: {iris.target_names[true_label]}, Predicted: {iris.target_names[pred_label]}")

Accuracy: 1.0
Features: [6.1 2.8 4.7 1.2], True Label: versicolor, Predicted: versicolor
Features: [5.7 3.8 1.7 0.3], True Label: setosa, Predicted: setosa
Features: [7.7 2.6 6.9 2.3], True Label: virginica, Predicted: virginica
Features: [6.  2.9 4.5 1.5], True Label: versicolor, Predicted: versicolor
Features: [6.8 2.8 4.8 1.4], True Label: versicolor, Predicted: versicolor


## Conclusion

This notebook successfully built and evaluated a Logistic Regression model for the Iris dataset, achieving high accuracy (typically around 0.97). Key learnings include:
- Loading and splitting datasets with Scikit-learn.
- Training a multi-class classification model.
- Evaluating model performance with accuracy.

Future improvements could involve:
- Experimenting with other algorithms like Random Forest or SVM.
- Adding feature scaling for better performance.
- Visualizing decision boundaries.

Next steps include applying similar techniques to NLP tasks like sentiment analysis.