# Iris Logistic Regression


In [4]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score


iris_data = pd.read_csv('/Users/jv/desktop/Iris.csv')
iris_data.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


## Step 1: Load the Dataset

We load the Iris dataset and display the first few rows to understand its structure.

In [2]:
from sklearn.model_selection import train_test_split
iris_data['Encoded_Species'] = (iris_data['Species'] != 'Iris-setosa').astype(int)
X = iris_data[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']]
y = iris_data['Encoded_Species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train.head(), y_train.head()

(     SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm
 81             5.5           2.4            3.7           1.0
 133            6.3           2.8            5.1           1.5
 137            6.4           3.1            5.5           1.8
 75             6.6           3.0            4.4           1.4
 109            7.2           3.6            6.1           2.5,
 81     1
 133    1
 137    1
 75     1
 109    1
 Name: Encoded_Species, dtype: int64)

## Step 2: Preprocessing

Encode the 'Species' column where 'Iris-setosa' is 0 and other species are 1. 
Split the dataset into independent variables (features) and dependent variable (target), and then into training and testing sets.

In [3]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score
log_reg = LogisticRegression(random_state=42)
log_reg.fit(X_train, y_train)
y_pred = log_reg.predict(X_test)
conf_matrix = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
conf_matrix, accuracy, precision, recall

(array([[19,  0],
        [ 0, 26]]),
 1.0,
 1.0,
 1.0)

## Step 3: Logistic Regression Model

Fit a logistic regression model using the training data and make predictions on the test data.

In [7]:
# Initialize and fit the logistic regression model
log_reg = LogisticRegression(random_state=42)
log_reg.fit(X_train, y_train)

# Predict on the test set
y_pred = log_reg.predict(X_test)

## Step 4: Evaluation

Generate a confusion matrix and calculate accuracy, precision, and recall to evaluate the model.

In [8]:
# Generate the confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)

# Calculate accuracy, precision, and recall
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

print('Accuracy:', accuracy)
print('Precision:', precision)
print('Recall:', recall)

Accuracy: 1.0
Precision: 1.0
Recall: 1.0


### Confusion Matrix
True Positives (TP): 26 (correctly predicted as not Iris-setosa)
True Negatives (TN): 19 (correctly predicted as Iris-setosa)
False Positives (FP): 0 (no instances of Iris-setosa incorrectly classified)
False Negatives (FN): 0 (no instances of other species incorrectly classified as Iris-setosa)

### Evaluation Metrics
Accuracy: 1.0 (100% of predictions were correct)
Precision: 1.0 (When the model predicted a class other than Iris-setosa, it was always correct)
Recall: 1.0 (It correctly identified all instances of classes other than Iris-setosa)

### Analysis
The model achieved perfect scores in accuracy, precision, and recall, indicating that it was able to distinguish between Iris-setosa and other species without any errors. This result suggests a balanced and highly effective model for this dataset. In real-world scenarios, achieving such results might be rare and could also indicate overfitting, especially with more complex or noisy datasets. However, the Iris dataset is known for being relatively clean and well-behaved, which often allows for excellent model performance