Question 1: What is Logistic Regression, and how does it differ from Linear
Regression?
- Ans: Logistic Regression is a method used when we want to predict categories, usually two types like yes/no, pass/fail, or spam/not spam. It works by giving a probability between 0 and 1, and then we decide which category it belongs to.
It is different from Linear Regression because Linear Regression predicts numbers, while Logistic Regression predicts probabilities for categories. Also, Linear Regression can give any value, but Logistic Regression uses a special curve called the sigmoid function to keep predictions between 0 and 1.


Question 2: Explain the role of the Sigmoid function in Logistic Regression.
Answer:
- The Sigmoid function in Logistic Regression changes any number into a value between 0 and 1. This is important because we want our model to predict probabilities for categories.
If the output is close to 1, it means high probability for one class (like “yes”), and if it’s close to 0, it means high probability for the other class (like “no”). The sigmoid curve is S-shaped, so small changes in input near the middle can cause big changes in probability, which helps in decision-making.



Question 3: What is Regularization in Logistic Regression and why is it needed?
- Answer:
Regularization in Logistic Regression is a technique used to prevent the model from overfitting. Overfitting happens when the model learns too much from the training data, including the noise, and performs badly on new data.
Regularization works by adding a penalty to the model’s coefficients (weights) so they don’t become too large. This keeps the model simpler and improves its ability to generalize to new, unseen data.

Question 4: What are some common evaluation metrics for classification models, and
why are they important?
- Answer:Some common evaluation metrics for classification models are Accuracy, Precision, Recall, F1-score, and ROC-AUC.
They are important because they help us understand how well the model is performing. For example, Accuracy tells how many predictions are correct, Precision shows how many predicted positives are actually positive, Recall shows how many actual positives the model found, F1-score balances Precision and Recall, and ROC-AUC measures the model’s ability to distinguish between classes. Using these metrics gives a complete picture of model performance instead of relying on just one measure.



(Include your Python code and output in the code box below.)
Answer:
Understand and prepare the data


Load and explore the dataset to check missing values, outliers, and feature types.


Encode categorical variables (e.g., OneHotEncoding).


Split into training and testing sets to evaluate performance properly.


Feature scaling


Use StandardScaler or MinMaxScaler so that all features are on the same scale, which helps Logistic Regression perform better.


Handle class imbalance


Use class_weight='balanced' in Logistic Regression to give more weight to the minority class.


Or apply oversampling (SMOTE) or undersampling techniques to balance the classes in the training set.


Hyperparameter tuning


Use GridSearchCV or RandomizedSearchCV to tune parameters like:


C (regularization strength)


penalty (L1 or L2)


solver


Optimize for metrics like F1-score or ROC-AUC rather than accuracy.


Model evaluation


Since accuracy can be misleading with imbalanced data, I’d use:


Confusion matrix (to see TP, FP, FN, TN)


Precision, Recall, F1-score (to balance false positives and false negatives)


ROC curve & AUC (to measure overall model performance).


Final deployment plan


Train the final model on the full training set with best-found hyperparameters.


Test on unseen data to validate.


Deploy, then monitor performance over time and retrain when needed.




In [3]:
'''Q5. Write a Python program that loads a CSV file into a Pandas DataFrame,
splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
'''
# Import libraries
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset from sklearn
data = load_breast_cancer()

# Create DataFrame from data
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Features (X) and Target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create Logistic Regression model
model = LogisticRegression(max_iter=5000)

# Train model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print("Accuracy of Logistic Regression model:", accuracy)


Accuracy of Logistic Regression model: 0.956140350877193


In [4]:
'''Q6. Write a Python program to train a Logistic Regression model using L2 regularization
(Ridge) and print the model coefficients and accuracy.'''

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features for better convergence
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create Logistic Regression model with L2 regularization (Ridge)
model = LogisticRegression(penalty='l2', C=1.0, solver='lbfgs', multi_class='auto', max_iter=1000)

# Train model
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)

# Output results
print("Model Coefficients:")
print(model.coef_)
print("\nAccuracy of Logistic Regression (L2 regularization):", accuracy)


Model Coefficients:
[[-1.00316587  1.14487318 -1.8113482  -1.69251025]
 [ 0.52799044 -0.28319987 -0.34060665 -0.72013959]
 [ 0.47517543 -0.86167331  2.15195485  2.41264984]]

Accuracy of Logistic Regression (L2 regularization): 1.0




In [5]:
'''Q7. Write a Python program to train a Logistic Regression model for multiclass
classification using multi_class='ovr' and print the classification report.
(Use Dataset from sklearn package)'''

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create Logistic Regression model with One-vs-Rest strategy
model = LogisticRegression(multi_class='ovr', solver='lbfgs', max_iter=1000)

# Train model
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Print classification report
print("Classification Report for Logistic Regression (One-vs-Rest):")
print(classification_report(y_test, y_pred, target_names=iris.target_names))


Classification Report for Logistic Regression (One-vs-Rest):
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      0.89      0.94         9
   virginica       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30





In [7]:
'''Question 8: Write a Python program to apply GridSearchCV to tune C and penalty
hyperparameters for Logistic Regression and print the best parameters and validation
accuracy.
(Use Dataset from sklearn package)'''
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define model
log_reg = LogisticRegression(solver='liblinear', multi_class='ovr')

# Define hyperparameters to tune
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2']
}

# Apply GridSearchCV
grid = GridSearchCV(log_reg, param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

# Print results
print("Best Parameters:", grid.best_params_)
print("Best Cross-Validation Accuracy:", grid.best_score_)

# Test set accuracy
test_accuracy = grid.score(X_test, y_test)
print("Test Accuracy with Best Parameters:", test_accuracy)




Best Parameters: {'C': 10, 'penalty': 'l1'}
Best Cross-Validation Accuracy: 0.9583333333333334
Test Accuracy with Best Parameters: 1.0




In [8]:
'''Question 9: Write a Python program to standardize the features before training Logistic
Regression and compare the model's accuracy with and without scaling.
(Use Dataset from sklearn package)
(Include your Python code and output in the code box below.)'''

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Logistic Regression without scaling
model_no_scaling = LogisticRegression(max_iter=200)
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

# Logistic Regression with scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaled = LogisticRegression(max_iter=200)
model_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = model_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# Print results
print("Accuracy without scaling:", accuracy_no_scaling)
print("Accuracy with scaling:", accuracy_scaled)


Accuracy without scaling: 1.0
Accuracy with scaling: 1.0


In [None]:
'''Question 10: Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced
dataset (only 5% of customers respond), describe the approach you’d take to build a
Logistic Regression model — including data handling, feature scaling, balancing
classes, hyperparameter tuning, and evaluating the model for this real-world business
use case.'''

--Understand and prepare the data

Load and explore the dataset to check missing values, outliers, and feature types.

Encode categorical variables (e.g., OneHotEncoding).

Split into training and testing sets to evaluate performance properly.

--Feature scaling

Use StandardScaler or MinMaxScaler so that all features are on the same scale, which helps Logistic Regression perform better.

--Handle class imbalance

Use class_weight='balanced' in Logistic Regression to give more weight to the minority class.

Or apply oversampling (SMOTE) or undersampling techniques to balance the classes in the training set.

--Hyperparameter tuning

Use GridSearchCV or RandomizedSearchCV to tune parameters like:

C (regularization strength)

penalty (L1 or L2)

solver

Optimize for metrics like F1-score or ROC-AUC rather than accuracy.

--Model evaluation

Since accuracy can be misleading with imbalanced data, I’d use:

Confusion matrix (to see TP, FP, FN, TN)

Precision, Recall, F1-score (to balance false positives and false negatives)

ROC curve & AUC (to measure overall model performance).

--Final deployment plan

Train the final model on the full training set with best-found hyperparameters.

Test on unseen data to validate.

Deploy, then monitor performance over time and retrain when needed.