# ***Logistic Regression***

---



**Question 1: What is Logistic Regression, and how does it differ from Linear Regression?**

**Answer:**
Logistic Regression is a statistical model used for classification tasks (binary or multiclass). It predicts the probability of a class using the logistic (sigmoid) function.

**Linear Regression** → predicts continuous values (e.g., house prices).

**Logistic Regression** → predicts probabilities between 0 and 1, then classifies into categories.



**Question 2: Explain the role of the Sigmoid function in Logistic Regression.**
**Answer:**

The Sigmoid function maps the model’s linear output (which can be any real number) into a probability between 0 and 1:

σ
(
z
)
=
1
1
+
e
−
z
σ(z)=
1+e
−z

1
​

Ensures predictions are probabilities.

Thresholding (e.g., ≥0.5 = Class 1, <0.5 = Class 0) helps classify outcomes.

**Question 3: What is Regularization in Logistic Regression and why is it needed?**
**Answer:**
Regularization adds a penalty to large coefficients in the model to reduce overfitting.

L1 (Lasso): encourages sparsity (some coefficients = 0).

L2 (Ridge): shrinks coefficients but keeps all features.
It helps improve generalization to unseen data.

**Question 4: What are some common evaluation metrics for classification models, and why are they important?**
**Answer:**

**Accuracy** → % of correct predictions.

**Precision **→ fraction of true positives among predicted positives.

**Recall** (Sensitivity) → fraction of true positives among actual positives.

**F1-score** → harmonic mean of precision & recall.

**ROC-AUC** → measures discrimination ability.

These are important because relying only on accuracy can be misleading, especially with imbalanced datasets.

In [1]:
#Question 5: Write a Python program that loads a CSV file into a Pandas DataFrame,splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = LogisticRegression(max_iter=500)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 0.9707602339181286


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [2]:
#Question 6: Write a Python program to train a Logistic Regression model using L2regularization (Ridge) and print the model coefficients and accuracy
model = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=500)
model.fit(X_train, y_train)

print("Coefficients:", model.coef_)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))


Coefficients: [[ 2.37436834  0.16536488 -0.34074222  0.009497   -0.1759965  -0.367409
  -0.77856024 -0.49205038 -0.25026609 -0.01511331 -0.10328272  1.14760081
   0.25829795 -0.11373373 -0.02606671  0.06305927 -0.01242942 -0.057764
  -0.04708287  0.0103125   1.34726796 -0.39195314 -0.01971088 -0.0271816
  -0.33407715 -0.78851603 -1.69319834 -0.80679759 -0.87058696 -0.06330855]]
Accuracy: 0.9707602339181286


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [3]:
#Question 7: Write a Python program to train a Logistic Regression model for multiclassclassification using multi_class='ovr' and print the classification report.
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report

# Load multiclass dataset
iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = LogisticRegression(multi_class='ovr', max_iter=500)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.85      0.92        13
           2       0.87      1.00      0.93        13

    accuracy                           0.96        45
   macro avg       0.96      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45





In [4]:
#Question 8: Write a Python program to apply GridSearchCV to tune C and penaltyhyperparameters for Logistic Regression and print the best parameters and validationaccuracy.
from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': [0.01, 0.1, 1, 10],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']  # supports both L1 and L2
}

grid = GridSearchCV(LogisticRegression(max_iter=500), param_grid, cv=5)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Validation Accuracy:", grid.best_score_)


Best Parameters: {'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}
Validation Accuracy: 0.9523809523809523


In [5]:
#Question 9: Write a Python program to standardize the features before training LogisticRegression and compare the model's accuracy with and without scaling.
from sklearn.preprocessing import StandardScaler

# Without scaling
model = LogisticRegression(max_iter=500)
model.fit(X_train, y_train)
print("Accuracy without scaling:", accuracy_score(y_test, model.predict(X_test)))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model.fit(X_train_scaled, y_train)
print("Accuracy with scaling:", accuracy_score(y_test, model.predict(X_test)))


Accuracy without scaling: 1.0
Accuracy with scaling: 0.28888888888888886


**Question 10**:** Imagine you are working at an e-commerce company that wants to**
predict which customers will respond to a marketing campaign. Given an imbalanced
dataset (only 5% of customers respond), describe the approach you’d take to build a
Logistic Regression model — including data handling, feature scaling, balancing
classes, hyperparameter tuning, and evaluating the model for this real-world business
use case.
**ANSWER**

If only 5% of customers respond (highly imbalanced):

Data Handling: Clean nulls, encode categorical features, scale numerical ones.

Feature Scaling: Apply StandardScaler for better optimization.

Balancing Classes: Use SMOTE (oversampling), undersampling, or class weights (class_weight='balanced').

Hyperparameter Tuning: Use GridSearchCV to optimize C, penalty.

Evaluation: Don’t rely on accuracy — instead, use Precision, Recall, F1-score, ROC-AUC to evaluate.

Deployment: Threshold tuning (e.g., instead of 0.5, pick 0.3 if recall is more important).