1] What is Logistic Regression, and how does it differ from Linear
Regression?
- Logistic Regression (LR) models the log-odds of an event as a linear combination of input features:
                log(p/ 1-p) = B0 + B1 + B2 +...+ BnXn

Then applies the sigmoid function:
                p = (1/(1+ e ⌃ -z))

to map predictions to probabilities between 0 and 1, which are then thresholded (e.g., ≥0.5 → class 1).

It’s called regression because it estimates parameters like linear regression, but output is categorical.

- Difference from Linear Regression:

 - Linear Regression → predicts continuous values.

 - Logistic Regression → predicts discrete classes (0/1) by mapping output to [0,1].

2] Explain the role of the Sigmoid function in Logistic Regression.
- The sigmoid function in logistic regression takes the model’s linear output (a real number) and maps it to a range between 0 and 1.
This transformation makes the output interpretable as the probability of the positive class in binary classification.
It is mathematically defined as:

                       𝜎(z) = 1 / (1+e ⌃ z)

Values near 0 indicate low probability, values near 1 indicate high probability



3] What is Regularization in Logistic Regression and why is it needed?
- Regularization in logistic regression is a technique used to prevent overfitting by adding a penalty term to the cost function, discouraging overly complex models with large coefficients.

- Why it’s needed:

 - Without regularization, the model may fit noise in the training data, leading to poor generalization.

 - By shrinking large weights, regularization improves stability and prediction accuracy on unseen data.

Common types:

L1 (Lasso): Encourages sparsity (some weights become zero).

L2 (Ridge): Penalizes large weights but keeps all features.

4] What are some common evaluation metrics for classification models, and why are they important?
- Common evaluation metrics for classification models:

 - Accuracy – Proportion of correct predictions out of total predictions.

 - Precision – Of all predicted positives, how many are actually positive.

 - Recall (Sensitivity) – Of all actual positives, how many were correctly predicted.

 - F1-score – Harmonic mean of precision and recall, useful for imbalanced data.

 - ROC-AUC – Measures the model’s ability to distinguish between classes.

They are important as they help assess how well the model performs, reveal trade-offs between different types of errors, and guide model selection based on the problem’s priorities (e.g., accuracy vs. recall)

In [1]:
#5] Write a Python program that loads a CSV file into a Pandas DataFrame, splits into train/test sets, trains a Logistic Regression model, and prints its accuracy. (Use Dataset from sklearn package)
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset from sklearn
data = load_breast_cancer()

# Create DataFrame from dataset
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Split into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Logistic Regression model
model = LogisticRegression(max_iter=500)
model.fit(X_train, y_train)

# Predict and calculate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")


Accuracy: 0.96


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [3]:
 # 6] Write a Python program to train a Logistic Regression model using L2 regularization (Ridge) and print the model coefficients and accuracy.
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Logistic Regression with L2 regularization
model = LogisticRegression(penalty='l2', solver='liblinear', max_iter=500)
model.fit(X_train, y_train)

# Model coefficients
print("Model Coefficients:")
print(model.coef_)
print("\nIntercept:", model.intercept_)

# Accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"\nAccuracy: {accuracy:.2f}")


Model Coefficients:
[[ 2.13248406e+00  1.52771940e-01 -1.45091255e-01 -8.28669349e-04
  -1.42636015e-01 -4.15568847e-01 -6.51940282e-01 -3.44456106e-01
  -2.07613380e-01 -2.97739324e-02 -5.00338038e-02  1.44298427e+00
  -3.03857384e-01 -7.25692126e-02 -1.61591524e-02 -1.90655332e-03
  -4.48855442e-02 -3.77188737e-02 -4.17516190e-02  5.61347410e-03
   1.23214996e+00 -4.04581097e-01 -3.62091502e-02 -2.70867580e-02
  -2.62630530e-01 -1.20898539e+00 -1.61796947e+00 -6.15250835e-01
  -7.42763610e-01 -1.16960181e-01]]

Intercept: [0.40847797]

Accuracy: 0.96


In [4]:
# 7] Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr' and print the classification report. (Use Dataset from sklearn package)
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Load dataset (Iris dataset - multiclass)
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Logistic Regression for multiclass classification (One-vs-Rest)
model = LogisticRegression(multi_class='ovr', solver='liblinear', max_iter=500)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))


Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30





In [5]:
# 8] Write a Python program to apply GridSearchCV to tune C and penalty hyperparameters for Logistic Regression and print the best parameters and validation accuracy.
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression

# Load dataset
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Logistic Regression model
model = LogisticRegression(solver='liblinear', max_iter=500)

# Parameter grid for C and penalty
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2']
}

# Grid search with 5-fold cross-validation
grid = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

# Best parameters and score
print("Best Parameters:", grid.best_params_)
print(f"Validation Accuracy: {grid.best_score_:.2f}")


Best Parameters: {'C': 100, 'penalty': 'l1'}
Validation Accuracy: 0.97


In [6]:
# 9] Write a Python program to standardize the features before training Logistic Regression and compare the model's accuracy with and without scaling.
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Logistic Regression without scaling
model_no_scaling = LogisticRegression(max_iter=500, solver='liblinear')
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Logistic Regression with scaling
model_scaling = LogisticRegression(max_iter=500, solver='liblinear')
model_scaling.fit(X_train_scaled, y_train)
y_pred_scaling = model_scaling.predict(X_test_scaled)
accuracy_scaling = accuracy_score(y_test, y_pred_scaling)

# Results
print(f"Accuracy without scaling: {accuracy_no_scaling:.2f}")
print(f"Accuracy with scaling:    {accuracy_scaling:.2f}")


Accuracy without scaling: 0.96
Accuracy with scaling:    0.97


10] Imagine you are working at an e-commerce company that wants to predict which customers will respond to a marketing campaign. Given an imbalanced dataset (only 5% of customers respond), describe the approach you’d take to build a Logistic Regression model — including data handling, feature scaling, balancing classes, hyperparameter tuning, and evaluating the model for this real-world business use case.

- For a 5% positive response rate:

 - Preprocess data – handle missing values, encode categoricals, scale numerical features.

 - Address imbalance – use class_weight='balanced' and/or oversampling (SMOTE).

 - Train Logistic Regression – start with L2 regularization, scale features.

 - Tune hyperparameters – optimize C and penalty via stratified cross-validation.

 - Evaluate properly – focus on precision, recall, F1, ROC-AUC, PR-AUC instead of accuracy.

 - Set business threshold – adjust decision threshold to maximize marketing ROI