1. What is Logistic Regression, and how does it differ from Linear
Regression?
- Logistic Regression is a statistical method used for binary classification problems where the outcome (dependent variable) is categorical (e.g., Yes/No, 0/1).

- It predicts the probability of an event occurring using the Sigmoid function and outputs values between 0 and 1.

- In contrast, Linear Regression predicts continuous numerical values by fitting a line (or hyperplane) using the formula:
y = b0 + b1*x1 + b2*x2 + ... + bn*xn
- Key difference:
    - Linear Regression:-
       - 1.Predicts continuous values
       - 2.No restriction on output range
       - 3.Uses Mean Squared Error (MSE) as loss function
    - Logistic Regression:-
       - 1.Predicts probabilities of classes (binary or multiclass)
       - 2.Output constrained between 0 and 1
       - 3.Uses Log Loss (Cross-Entropy) as loss function

2. Explain the role of the Sigmoid function in Logistic Regression.
- The Sigmoid function maps any real-valued number into the range (0, 1), making it perfect for representing probabilities.
Formula:

𝜎(𝑧)=1/(1+𝑒^z)

Where z = b0 + b1*x1 + b2*x2 + ... + bn*xn (linear combination of features).

When z is large and positive → output close to 1 (high probability of class 1).

When z is large and negative → output close to 0 (high probability of class 0).

Thus, it helps convert linear predictions into probabilities for classification.

3. What is Regularization in Logistic Regression and why is it needed?
- Regularization is a technique used to prevent overfitting by penalizing large coefficients in the model.
Two common types:

  1. L1 Regularization (Lasso): Adds absolute value of coefficients to the loss function.

  2. L2 Regularization (Ridge): Adds squared value of coefficients to the loss function.

- Why needed?

 - In high-dimensional data or noisy datasets, models tend to overfit, capturing noise instead of the signal.

 - Regularization forces the model to keep coefficients small, improving generalization on unseen data.

 4. What are some common evaluation metrics for classification models, and
why are they important?
- Some common evaluation metrics:
   - Accuracy: Fraction of correctly predicted instances.
   - Precision: Proportion of true positive predictions among all positive predictions.
   Precision=TP/(TP+FP)
   - Recall (Sensitivity): Proportion of true positive predictions among all actual positives.
   Recall=TP/(TP+FN)
   - F1 Score: Harmonic mean of Precision and Recall.
   F1 Score=2×(Precision×Recall)/(Precision+Recall​​)
- Importance:
   - Especially important in imbalanced datasets where accuracy alone is misleading.
   - Precision and Recall focus on minimizing false positives/negatives, essential for business decisions.
	​


In [1]:
#5 Write a Python program that loads a CSV file into a Pandas DataFrame, splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
(Use Dataset from sklearn package)
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")


Accuracy: 0.9561


In [3]:
#6 Write a Python program to train a Logistic Regression model using L2 regularization (Ridge) and print the model coefficients and accuracy.
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(penalty='l2', solver='liblinear', max_iter=10000)
model.fit(X_train, y_train)
print("Model Coefficients:", model.coef_)
accuracy = model.score(X_test, y_test)
print(f"Accuracy with L2 Regularization: {accuracy:.4f}")


Model Coefficients: [[ 2.13248406e+00  1.52771940e-01 -1.45091255e-01 -8.28669349e-04
  -1.42636015e-01 -4.15568847e-01 -6.51940282e-01 -3.44456106e-01
  -2.07613380e-01 -2.97739324e-02 -5.00338038e-02  1.44298427e+00
  -3.03857384e-01 -7.25692126e-02 -1.61591524e-02 -1.90655332e-03
  -4.48855442e-02 -3.77188737e-02 -4.17516190e-02  5.61347410e-03
   1.23214996e+00 -4.04581097e-01 -3.62091502e-02 -2.70867580e-02
  -2.62630530e-01 -1.20898539e+00 -1.61796947e+00 -6.15250835e-01
  -7.42763610e-01 -1.16960181e-01]]
Accuracy with L2 Regularization: 0.9561


In [4]:
#7 Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr' and print the classification report. (Use Dataset from sklearn package)
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(multi_class='ovr', max_iter=10000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
report = classification_report(y_test, y_pred)
print(report)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30





In [5]:
#8 Write a Python program to apply GridSearchCV to tune C and penalty hyperparameters for Logistic Regression and print the best parameters and validation accuracy.
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
param_grid = {
    'C': [0.1, 1, 10],
    'penalty': ['l1', 'l2']
}
grid = GridSearchCV(LogisticRegression(solver='liblinear', max_iter=10000),
                    param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)
print("Best Parameters:", grid.best_params_)
print(f"Validation Accuracy: {grid.best_score_:.4f}")


Best Parameters: {'C': 10, 'penalty': 'l1'}
Validation Accuracy: 0.9583


In [6]:
#9 Write a Python program to standardize the features before training Logistic Regression and compare the model's accuracy with and without scaling
from sklearn.preprocessing import StandardScaler
model1 = LogisticRegression(max_iter=10000)
model1.fit(X_train, y_train)
acc_without_scaling = model1.score(X_test, y_test)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model2 = LogisticRegression(max_iter=10000)
model2.fit(X_train_scaled, y_train)
acc_with_scaling = model2.score(X_test_scaled, y_test)
print(f"Accuracy without Scaling: {acc_without_scaling:.4f}")
print(f"Accuracy with Scaling: {acc_with_scaling:.4f}")


Accuracy without Scaling: 1.0000
Accuracy with Scaling: 1.0000


10. : Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced
dataset (only 5% of customers respond), describe the approach you’d take to build a
Logistic Regression model — including data handling, feature scaling, balancing
classes, hyperparameter tuning, and evaluating the model for this real-world business
use case.

- Data Handling: Collect customer behavior features (demographics, previous purchases, interactions, etc.).

- Target variable: Response to marketing campaign (1 = Respond, 0 = No response).

- Feature Scaling: Apply StandardScaler or MinMaxScaler to normalize data for better convergence.

- Handling Imbalance: Use techniques like SMOTE (Synthetic Minority Over-sampling Technique) to balance the classes.

  Alternatively, use class weighting in LogisticRegression (class_weight='balanced').

- Hyperparameter Tuning: Apply GridSearchCV to optimize regularization strength (C) and penalty type.

- Model Evaluation:
  Do not rely on accuracy (since 95% of customers are non-responders).
  Focus on Precision, Recall, F1 Score, ROC-AUC Score.
  High recall is important to catch as many responders as possible.

- Deployment Consideration: Periodically retrain model with fresh data to adapt to changing customer behavior.