# Task 2: Logistic Regression with Preprocessing and Evaluation – CodTech Internship
This notebook demonstrates the complete ML workflow using **Logistic Regression** on the Breast Cancer Wisconsin dataset. We perform data preprocessing, train a model, evaluate its performance, and analyze results both quantitatively and visually.

In [None]:
# 📦 Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler


In [None]:
# 📊 Load Breast Cancer dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)
target_names = data.target_names
X.head()

In [None]:
# ℹ️ Dataset Overview
print(f'Total samples: {X.shape[0]}')
print(f'Total features: {X.shape[1]}')
print(f'Target classes: {target_names.tolist()}')

In [None]:
# 🔍 Data Preprocessing – Standardization
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled = pd.DataFrame(X_scaled, columns=X.columns)

In [None]:
# ✂️ Train–Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42, stratify=y
)
print('Training samples:', X_train.shape[0], '| Test samples:', X_test.shape[0])

In [None]:
# 🧠 Train Logistic Regression Model
model = LogisticRegression(max_iter=10000, random_state=42)
model.fit(X_train, y_train)

In [None]:
# 📈 Predictions and Accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'🔹 Accuracy: {accuracy:.4f}')

In [None]:
# 📝 Classification Report
print(classification_report(y_test, y_pred, target_names=target_names))

In [None]:
# 🔍 Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Purples', xticklabels=target_names, yticklabels=target_names)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

### ✅ Summary:
- Standardized the input features for better model performance.
- Trained Logistic Regression on Breast Cancer data.
- Achieved high accuracy and well-distributed precision/recall.
- Visualized confusion matrix for error analysis.