**TITLE OF PROJECT-
Iris Flower Classification Using Logistic Regression**

**1. Objective**
To build a machine learning model that classifies iris flowers into three species (setosa, versicolor, virginica) based on their features (sepal length, sepal width, petal length, petal width).

**2. Data Source**
The Iris dataset from the UCI Machine Learning Repository.


**3. Import Library**

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix


**4. Import Data**

In [None]:
from sklearn.datasets import load_iris

# Load Iris dataset
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = iris.target

# Map target numbers to species names
species_map = {0: 'setosa', 1: 'versicolor', 2: 'virginica'}
df['species'] = df['species'].map(species_map)

print(df.head())


**5. Describe Data**

In [None]:
# Display the first few rows of the dataset
print(df.head())

# Display summary statistics
print(df.describe())

# Check for missing values
print(df.isnull().sum())


**6. Data Visualisation**

In [None]:
# Pair plot
sns.pairplot(df, hue='species')
plt.show()

# Histograms
df.hist(edgecolor='black', linewidth=1.2, figsize=(12, 8))
plt.show()

# Box plots
df.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False, figsize=(12, 8))
plt.show()

# Heatmap of correlations (excluding the species column)
plt.figure(figsize=(10, 6))
sns.heatmap(df.drop('species', axis=1).corr(), annot=True, cmap='coolwarm')
plt.show()


**7. Data Reprocessing**

In [9]:
# Feature variables
X = df.drop('species', axis=1)

# Target variable
y = df['species']

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


**8. Define Target Variable (y) and Feature Variables (X)**
This step was handled in the Data Preprocessing section.


**9. Train Test Split**

In [10]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)


**10. Modelling**

In [None]:
# Initialize the Logistic Regression model
model = LogisticRegression()

# Train the model
model.fit(X_train, y_train)


**11. Model Evaluation**

In [None]:
# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)
confusion_mat = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Classification Report:\n", classification_rep)
print("Confusion Matrix:\n", confusion_mat)


**12. Prediction**

In [None]:
# Example: Predicting the species for a new data point
new_data = [[5.1, 3.5, 1.4, 0.2]]  # Example data point
new_data_scaled = scaler.transform(new_data)
prediction = model.predict(new_data_scaled)
predicted_species = prediction[0]
print(f"Predicted species: {predicted_species}")


**13. Explaination**

In [None]:
explanation = f"""
The Logistic Regression model achieved an accuracy of {accuracy:.2f} on the test set.
The classification report and confusion matrix provide detailed insights into the model's performance.
The model was able to classify the Iris species accurately based on the provided features.
Some challenges faced during the project included ensuring proper data preprocessing and scaling of feature variables.
Future improvements could involve experimenting with other classification algorithms and tuning hyperparameters to achieve better performance.
"""
print(explanation)
