# Theory & Practical Questions

Question 1:  What is Logistic Regression, and how does it differ from Linear Regression?

Answer: Logistic Regression is a statistical method we use for predicting categorical outcomes, especially binary ones like yes/no or 0/1. Unlike Linear Regression, which predicts continuous values, Logistic Regression applies the sigmoid function to output probabilities between 0 and 1, making it suitable for classification problems.

Question 2: Explain the role of the Sigmoid function in Logistic Regression.

Answer: In Logistic Regression, we use the Sigmoid function to transform the linear combination of inputs into a value between 0 and 1. This output represents the probability of a data point belonging to a certain class, helping us make clear classification decisions.

Question 3: What is Regularization in Logistic Regression and why is it needed?

Answer: Regularization in Logistic Regression is a technique we use to prevent overfitting by adding a penalty term to the loss function. It helps keep the model simpler by controlling large coefficient values, which improves generalization to new, unseen data.

Question 4: What are some common evaluation metrics for classification models, and why are they important?

Answer: Common evaluation metrics for classification models include the Confusion Matrix, Accuracy, Misclassification Rate, Precision, Recall, F1-Score, and F-Beta Score. We use these metrics to evaluate various aspects of performance like how well the model predicts each class, balances false positives and false negatives, and performs under different error priorities helping us choose the most reliable model.

Question 5: Write a Python program that loads a CSV file into a Pandas DataFrame,
splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
(Use Dataset from sklearn package)

In [18]:
# Answer:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.metrics import accuracy_score

# Load dataset from sklearn (Breast Cancer dataset)
data = datasets.load_breast_cancer()

# Create a DataFrame from the dataset
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target  # Add target column

# Separate features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# Create and train Logistic Regression model
model = LogisticRegression(solver='liblinear',max_iter=1000)  # max_iter increased to ensure convergence
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate and print accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 0.956140350877193


Question 6:  Write a Python program to train a Logistic Regression model using L2
regularization (Ridge) and print the model coefficients and accuracy.
(Use Dataset from sklearn package)
(Include your Python code and output in the code box below.)

In [14]:
# Answer:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.metrics import accuracy_score

# Load dataset from sklearn (Breast Cancer dataset)
data = datasets.load_breast_cancer()

# Create a DataFrame from the dataset
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target  # Add target column

# Separate features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# Create Logistic Regression model with L2 regularization (default)
model = LogisticRegression(solver='liblinear',penalty='l2', max_iter=1000)  # L2 = Ridge regularization
model.fit(X_train, y_train)

# Print model coefficients
print("Model Coeff:", model.coef_)
print("Intercept:", model.intercept_)

# Predict on the test set
y_pred = model.predict(X_test)

# Print accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))


Model Coeff: [[ 1.82137734  0.07112385 -0.12306166  0.00587456 -0.07595978 -0.32257247
  -0.49620872 -0.22286157 -0.11884256 -0.01391574  0.07814557  0.48484736
   0.34149697 -0.07132055 -0.01143128 -0.04997087 -0.10056425 -0.03328301
  -0.03276574 -0.00328304  1.82550882 -0.23308136 -0.15962553 -0.02971517
  -0.13358016 -0.87022834 -1.22100716 -0.41859203 -0.31544016 -0.06521608]]
Intercept: [0.3402689]
Accuracy: 0.956140350877193


Question 7: Write a Python program to train a Logistic Regression model for multiclass
classification using multi_class='ovr' and print the classification report.
(Use Dataset from sklearn package)
(Include your Python code and output in the code box below.)

In [15]:
# Answer
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.metrics import classification_report

# Load a multiclass dataset from sklearn (Iris dataset)
data = datasets.load_iris()

# Create a DataFrame from the dataset
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target  # Add target column

# Separate features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create Logistic Regression model for multiclass classification (One-vs-Rest)
model = LogisticRegression(solver='liblinear',multi_class='ovr', max_iter=1000)
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)

# Print classification report
print("Classification Report:\n", classification_report(y_test, y_pred))


Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30





Question 8: Write a Python program to apply GridSearchCV to tune C and penalty
hyperparameters for Logistic Regression and print the best parameters and validation
accuracy.
(Use Dataset from sklearn package)
(Include your Python code and output in the code box below.)

In [16]:
# Answer
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.metrics import accuracy_score

# Load dataset from sklearn (Breast Cancer dataset)
data = datasets.load_breast_cancer()

# Create DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target  # Add target column

# Separate features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# Create Logistic Regression model
log_reg = LogisticRegression(solver='liblinear',max_iter=1000)

# Define hyperparameter grid
param_grid = {
    'C': [0.01, 0.1, 1, 10],       # Regularization strength
    'penalty': ['l1', 'l2'],       # Regularization type
    'solver': ['liblinear']        # Supports both l1 and l2
}

# Apply GridSearchCV
grid_search = GridSearchCV(log_reg, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Print best parameters
print("Best Parameters:", grid_search.best_params_)

# Predict on test set using best model
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

# Print validation accuracy
print("Validation Accuracy:", accuracy_score(y_test, y_pred))


Best Parameters: {'C': 10, 'penalty': 'l1', 'solver': 'liblinear'}
Validation Accuracy: 0.9649122807017544


Question 9: Write a Python program to standardize the features before training Logistic
Regression and compare the model's accuracy with and without scaling.
(Use Dataset from sklearn package)
(Include your Python code and output in the code box below.)

In [17]:
# Answer
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn import datasets
from sklearn.metrics import accuracy_score

# Load dataset from sklearn (Breast Cancer dataset)
data = datasets.load_breast_cancer()

# Create DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target  # Add target column

# Separate features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# Logistic Regression without scaling
model_no_scaling = LogisticRegression(solver='liblinear',max_iter=1000)
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Logistic Regression with scaling
model_scaling = LogisticRegression(solver='liblinear',max_iter=1000)
model_scaling.fit(X_train_scaled, y_train)
y_pred_scaling = model_scaling.predict(X_test_scaled)
accuracy_scaling = accuracy_score(y_test, y_pred_scaling)

# Print results
print("Accuracy without Scaling:", accuracy_no_scaling)
print("Accuracy with Scaling:", accuracy_scaling)


Accuracy without Scaling: 0.956140350877193
Accuracy with Scaling: 0.9824561403508771


Question 10: Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced
dataset (only 5% of customers respond), describe the approach you’d take to build a
Logistic Regression model — including data handling, feature scaling, balancing
classes, hyperparameter tuning, and evaluating the model for this real-world business
use case.

Answer:f we get this kind of imbalanced dataset, first we would clean the data and remove or fix any missing values. Since only 5% of customers respond, we would balance the data by either adding more samples of the smaller class oversampling or reducing the bigger class undersampling. We would also scale the features so the model can learn better. Then we would train a Logistic Regression model and try different values of C and penalty to see which works best. For checking the model, we would not only look at accuracy but also Precision, Recall, F1-Score, and ROC-AUC so we know how well it is finding the customers who respond.