
# Logistic Regression Project Report

This notebook demonstrates the complete workflow for a Logistic Regression binary classification task, including:
- Data generation
- Preprocessing (handling missing values, normalization)
- Model training
- Evaluation metrics (accuracy, precision, recall, F1-score, confusion matrix)
- Hyperparameter tuning

Let's begin!


## Data Preparation and Preprocessing

In [None]:

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
import seaborn as sns

# Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=5, n_informative=3, n_redundant=0, 
                           weights=[0.6, 0.4], random_state=42)
df = pd.DataFrame(X, columns=['Feature1', 'Feature2', 'Feature3', 'Feature4', 'Feature5'])
df['Target'] = y
df.loc[::10, 'Feature2'] = np.nan  # Introduce missing values

# Preprocessing
df['Feature2'].fillna(df['Feature2'].mean(), inplace=True)
X = df.drop('Target', axis=1)
y = df['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


## Model Training and Evaluation

In [None]:

# Train initial model
model = LogisticRegression(random_state=42)
model.fit(X_train_scaled, y_train)
y_pred = model.predict(X_test_scaled)

# Evaluation
print("Default Model Performance:")
print(classification_report(y_test, y_pred))
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix (Default Model)")
plt.show()


## Hyperparameter Tuning

In [None]:

# Hyperparameter Tuning
param_grid = {'C': [0.01, 0.1, 1, 10, 100], 'solver': ['liblinear', 'lbfgs']}
grid = GridSearchCV(LogisticRegression(random_state=42), param_grid, cv=5, scoring='f1')
grid.fit(X_train_scaled, y_train)
print("Best Parameters:", grid.best_params_)

# Tuned Model
best_model = grid.best_estimator_
y_pred_best = best_model.predict(X_test_scaled)
print("Tuned Model Performance:")
print(classification_report(y_test, y_pred_best))
cm_best = confusion_matrix(y_test, y_pred_best)
sns.heatmap(cm_best, annot=True, fmt='d', cmap='Greens')
plt.title("Confusion Matrix (Tuned Model)")
plt.show()



## Conclusion

✅ Logistic Regression is a robust, interpretable model for binary classification.  
✅ Data preprocessing (handling missing values, normalization) is critical for performance.  
✅ Hyperparameter tuning (especially C) can improve recall and overall F1-score.  
✅ Confusion matrix analysis provides valuable insights into model strengths and weaknesses.

---

✅ Future Work:
- Explore ROC-AUC and PR curves
- Test on real-world datasets
- Perform feature selection for further optimization
