Question 1: What is Logistic Regression, and how does it differ from Linear Regression?

Answer: Logistic Regression is a statistical method used for binary classification problems, where the output is categorical (e.g., yes/no, true/false). It models the probability that a given input belongs to a particular category using the Sigmoid function.

Differences from Linear Regression:

Output: Linear Regression predicts continuous values; Logistic Regression predicts probabilities between 0 and 1.
Function Used: Linear uses a straight-line equation; Logistic uses the Sigmoid function.
Application: Linear is used for regression tasks; Logistic is used for classification tasks.

Question 2: Explain the role of the Sigmoid function in Logistic Regression.

Answer:

The Sigmoid function is a mathematical function used in Logistic Regression to convert the output of a linear equation into a probability value between 0 and 1. This makes it ideal for binary classification tasks.

Formula:

The Sigmoid function is defined as:

  σ(z) = 1 / (1 + e^(−z))

Where:

z = wᵀx + b
w is the weight vector
x is the input feature vector
b is the bias term

Role in Classification:

The output of the Sigmoid function is interpreted as the probability that the input belongs to the positive class. A threshold (commonly 0.5) is applied:

If σ(z) ≥ 0.5, classify as positive
If σ(z) < 0.5, classify as negative
This enables Logistic Regression to make decisions based on probability rather than raw scores.

Question 3: What is Regularization in Logistic Regression and why is it needed?

Answer: Regularization is a technique used to prevent overfitting by penalizing large coefficients in the model. In Logistic Regression, L1 (Lasso) and L2 (Ridge) regularization are commonly used.

L1 encourages sparsity (some coefficients become zero).
L2 discourages large weights but doesn’t eliminate them.
It helps improve generalization and model performance on unseen data.

Question 4: What are some common evaluation metrics for classification models, and why are they important?

Answer: Common metrics include:

Accuracy: Proportion of correct predictions.
Precision: True positives / (True positives + False positives).
Recall: True positives / (True positives + False negatives).
F1 Score: Harmonic mean of precision and recall.
ROC-AUC: Measures the ability to distinguish between classes.
These metrics help assess model performance beyond simple accuracy, especially in imbalanced datasets.

In [9]:
#Question 5: Python program to train Logistic Regression and print accuracy.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 1.0


In [None]:
# Question 6: Train a Logistic Regression model using L2 regularization (Ridge) and print the model coefficients and accuracy.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model (no need to set multi_class)
model = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=200)
model.fit(X_train, y_train)

# Output
print("Coefficients:", model.coef_)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))


Coefficients: [[-0.39347744  0.96248927 -2.37513361 -0.99874691]
 [ 0.50844553 -0.2548109  -0.21300984 -0.77574616]
 [-0.11496809 -0.70767836  2.58814346  1.77449307]]
Accuracy: 1.0


In [14]:
#Question 7: Train Logistic Regression for multiclass classification using multi_class='ovr'.

from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Wrap Logistic Regression in OneVsRestClassifier
model = OneVsRestClassifier(LogisticRegression(solver='lbfgs', max_iter=200))
model.fit(X_train, y_train)

# Predict and report
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))




              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30



In [15]:
#Question 8: Apply GridSearchCV to tune C and penalty.

from sklearn.model_selection import GridSearchCV

# Define parameter grid (only 'l2' is supported by 'lbfgs')
param_grid = {
    'C': [0.1, 1, 10],
    'penalty': ['l2'],
    'solver': ['lbfgs']
}

# Grid search
grid = GridSearchCV(LogisticRegression(max_iter=200), param_grid, cv=5)
grid.fit(X_train, y_train)

# Output best parameters and score
print("Best Parameters:", grid.best_params_)
print("Validation Accuracy:", grid.best_score_)


Best Parameters: {'C': 1, 'penalty': 'l2', 'solver': 'lbfgs'}
Validation Accuracy: 0.9666666666666666


In [8]:
#Question 9: Standardize features before training and compare accuracy.

from sklearn.preprocessing import StandardScaler

# Without scaling
model = LogisticRegression()
model.fit(X_train, y_train)
print("Accuracy without scaling:", accuracy_score(y_test, model.predict(X_test)))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaled = LogisticRegression()
model_scaled.fit(X_train_scaled, y_train)
print("Accuracy with scaling:", accuracy_score(y_test, model_scaled.predict(X_test_scaled)))


Accuracy without scaling: 1.0
Accuracy with scaling: 1.0


Question 10: Approach for imbalanced dataset in marketing campaign prediction.

Answer:

Data Handling:

Use techniques like SMOTE or undersampling to balance classes.
Handle missing values and outliers.
Feature Scaling:

Apply standardization or normalization to ensure features are on the same scale.
Balancing Classes:

Use class weights in Logistic Regression (class_weight='balanced').
Hyperparameter Tuning:

Use GridSearchCV to find optimal C, penalty, and solver.
Evaluation:

Use metrics like Precision, Recall, F1 Score, and ROC-AUC.
Confusion matrix to understand false positives/negatives.
This approach ensures the model is robust and effective for real-world business decisions.