# Logistic Regression Assignment

# ==============================
# Question 1
# ==============================

# What is Logistic Regression, and how does it differ from Linear Regression?
# Answer:
"""
Logistic Regression is a classification algorithm used to predict categorical outcomes.
Unlike Linear Regression, which predicts continuous values, Logistic Regression maps
predicted values through a sigmoid function to produce probabilities between 0 and 1.
These probabilities are then used to classify outcomes (e.g., 0 or 1).
"""

# ==============================
# Question 2
# ==============================

# Explain the role of the Sigmoid function in Logistic Regression.
# Answer:
"""
The Sigmoid function transforms the linear combination of inputs into a probability value
between 0 and 1. This allows Logistic Regression to model binary outcomes effectively.
"""

# ==============================
# Question 3
# ==============================

# What is Regularization in Logistic Regression and why is it needed?
# Answer:
"""
Regularization helps prevent overfitting by penalizing large coefficient values in the model.
In Logistic Regression, L1 (Lasso) and L2 (Ridge) regularization are commonly used to
improve generalization and reduce variance.
"""

# ==============================
# Question 4
# ==============================

# What are some common evaluation metrics for classification models, and why are they important?
# Answer:
"""
Common metrics:
- Accuracy: Measures overall correctness.
- Precision: Fraction of relevant instances among retrieved ones.
- Recall: Fraction of relevant instances retrieved from all relevant ones.
- F1-score: Harmonic mean of Precision and Recall.
- ROC-AUC: Measures model's ability to distinguish classes.
These metrics are important because they give deeper insights beyond accuracy,
especially when dealing with imbalanced datasets.
"""



# ==============================
# Question 5
# ==============================
Write a Python program that loads a CSV file into a Pandas DataFrame, splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.


In [1]:
# Python program: Logistic Regression basic training
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd

# Load dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Split train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.956140350877193


# ==============================
# Question 6
# ==============================
Write a Python program to train a Logistic Regression model using L2 regularization (Ridge) and print the model coefficients and accuracy.

In [2]:
from sklearn.linear_model import LogisticRegression

# Train model with L2 regularization
ridge_model = LogisticRegression(penalty='l2', C=1.0, max_iter=5000)
ridge_model.fit(X_train, y_train)

print("Model coefficients:", ridge_model.coef_)
print("Accuracy:", ridge_model.score(X_test, y_test))

Model coefficients: [[ 1.0274368   0.22145051 -0.36213488  0.0254667  -0.15623532 -0.23771256
  -0.53255786 -0.28369224 -0.22668189 -0.03649446 -0.09710208  1.3705667
  -0.18140942 -0.08719575 -0.02245523  0.04736092 -0.04294784 -0.03240188
  -0.03473732  0.01160522  0.11165329 -0.50887722 -0.01555395 -0.016857
  -0.30773117 -0.77270908 -1.42859535 -0.51092923 -0.74689363 -0.10094404]]
Accuracy: 0.956140350877193


# ==============================
# Question 7
# ==============================
Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr' and print the classification report.


In [3]:
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report

# Load multiclass dataset
iris = load_iris()
X_iris, y_iris = iris.data, iris.target

# Train One-vs-Rest Logistic Regression
ovr_model = LogisticRegression(multi_class='ovr', max_iter=5000)
ovr_model.fit(X_iris, y_iris)

y_pred = ovr_model.predict(X_iris)
print(classification_report(y_iris, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        50
           1       0.96      0.90      0.93        50
           2       0.91      0.96      0.93        50

    accuracy                           0.95       150
   macro avg       0.95      0.95      0.95       150
weighted avg       0.95      0.95      0.95       150





# ==============================
# Question 8
# ==============================
Write a Python program to apply GridSearchCV to tune C and penalty hyperparameters for Logistic Regression and print the best parameters and validation accuracy.


In [4]:
from sklearn.model_selection import GridSearchCV

# Hyperparameter tuning with GridSearchCV
param_grid = {
    'C': [0.01, 0.1, 1, 10],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']
}
grid = GridSearchCV(LogisticRegression(max_iter=5000), param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

print("Best parameters:", grid.best_params_)
print("Best CV accuracy:", grid.best_score_)

Best parameters: {'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}
Best CV accuracy: 0.9626373626373628


# ==============================
# Question 9
# ==============================
 Write a Python program to standardize the features before training LogisticRegression and compare the model's accuracy with and without scaling.


In [5]:
from sklearn.preprocessing import StandardScaler

# Without scaling
model_no_scaling = LogisticRegression(max_iter=5000)
model_no_scaling.fit(X_train, y_train)
print("Accuracy without scaling:", model_no_scaling.score(X_test, y_test))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaling = LogisticRegression(max_iter=5000)
model_scaling.fit(X_train_scaled, y_train)
print("Accuracy with scaling:", model_scaling.score(X_test_scaled, y_test))

Accuracy without scaling: 0.956140350877193
Accuracy with scaling: 0.9736842105263158


# ==============================
# Question 10
# ==============================
 Imagine you are working at an e-commerce company that wants to predict which customers will respond to a marketing campaign. Given an imbalanced dataset (only 5% of customers respond), describe the approach you’d take to build a
Logistic Regression model — including data handling, feature scaling, balancing classes, hyperparameter tuning, and evaluating the model for this real-world business
use case.

# Answer:

"""
For imbalanced datasets (like only 5% positive responses in marketing campaigns):
1. Data Handling: Collect and preprocess data carefully.
2. Feature Scaling: Standardize/normalize features for better convergence.
3. Class Balancing: Use techniques like SMOTE (oversampling minority class), undersampling,
   or assign class weights to Logistic Regression.
4. Hyperparameter Tuning: Tune C, penalty type, solver, and class_weight.
5. Evaluation: Use metrics like Precision, Recall, F1-score, and ROC-AUC rather than accuracy.
   Precision-Recall Curve is especially valuable in imbalanced data.
6. Business Impact: Optimize for Recall if the goal is to capture as many responders as possible,
   or Precision if cost per contact is high.
"""