1)  What is Logistic Regression, and how does it differ from Linear
Regression?
Ans. Logistic Regression is a statistical method used primarily for binary classification tasks, where the goal is to predict the probability of an outcome belonging to one of two categories—such as spam vs. not spam or disease vs. no disease. It works by applying a logistic (sigmoid) function to a linear combination of input features, which transforms the output into a probability value between 0 and 1. In contrast, Linear Regression is used for predicting continuous numerical values, like house prices or temperature, based on input variables. It models the relationship between the dependent and independent variables using a straight-line equation and can produce any real number as output. The key difference lies in their output and application: Linear Regression predicts quantities, while Logistic Regression predicts probabilities for classification. Additionally, Logistic Regression uses a log loss (cross-entropy) as its cost function, whereas Linear Regression uses mean squared error.

2) Explain the role of the Sigmoid function in Logistic Regression.
Ans The sigmoid function in logistic regression transforms linear outputs into probabilities between 0 and 1, enabling binary classification. It ensures that predictions are interpretable as likelihoods of belonging to a specific class

3) What is Regularization in Logistic Regression and why is it needed?
Ans Regularization in logistic regression is a technique used to prevent overfitting by adding a penalty to the model’s complexity. It helps ensure that the model generalizes well to unseen data rather than memorizing the training set.

4) What are some common evaluation metrics for classification models, and why are they important?
Ans. Common evaluation metrics for classification models include accuracy, precision, recall, F1-score, ROC-AUC, and confusion matrix. These metrics are essential for understanding how well a model performs, especially in real-world scenarios with imbalanced data or varying costs of misclassification


In [1]:
#5)  Write a Python program that loads a CSV file into a Pandas DataFrame,
#splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
#(Use Dataset from sklearn package)
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Step 1: Load dataset from sklearn
iris = load_iris()
X = iris.data
y = iris.target

# Step 2: Convert to Pandas DataFrame
df = pd.DataFrame(X, columns=iris.feature_names)
df['target'] = y

# Step 3: Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    df.drop('target', axis=1),
    df['target'],
    test_size=0.2,
    random_state=42
)

# Step 4: Train Logistic Regression model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Step 5: Predict and calculate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Logistic Regression Accuracy: {accuracy:.2f}")

Logistic Regression Accuracy: 1.00


In [2]:
#6)  Write a Python program to train a Logistic Regression model using L2
#regularization (Ridge) and print the model coefficients and accuracy. (Use Dataset from sklearn package)
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Step 1: Load dataset from sklearn
iris = load_iris()
X = iris.data
y = iris.target

# Step 2: Convert to Pandas DataFrame
df = pd.DataFrame(X, columns=iris.feature_names)
df['target'] = y

# Step 3: Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    df.drop('target', axis=1),
    df['target'],
    test_size=0.2,
    random_state=42
)

# Step 4: Train Logistic Regression model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Step 5: Predict and calculate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Logistic Regression Accuracy: {accuracy:.2f}")

Logistic Regression Accuracy: 1.00


In [3]:
#7)  Write a Python program to train a Logistic Regression model for multiclass
#classification using multi_class='ovr' and print the classification report. (Use Dataset from sklearn package)
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Step 1: Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Step 2: Convert to DataFrame (optional, for clarity)
df = pd.DataFrame(X, columns=iris.feature_names)
df['target'] = y

# Step 3: Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    df.drop('target', axis=1),
    df['target'],
    test_size=0.2,
    random_state=42
)

# Step 4: Train Logistic Regression with One-vs-Rest strategy
model = LogisticRegression(multi_class='ovr', solver='liblinear', max_iter=200)
model.fit(X_train, y_train)

# Step 5: Predict and print classification report
y_pred = model.predict(X_test)
report = classification_report(y_test, y_pred, target_names=iris.target_names)
print("Classification Report:\n")
print(report)

Classification Report:

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30





In [4]:
#8)  Write a Python program to apply GridSearchCV to tune C and penalty
#hyperparameters for Logistic Regression and print the best parameters and validation
#accuracy.


import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Step 1: Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 3: Define parameter grid
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']  # 'liblinear' supports both l1 and l2
}

# Step 4: Apply GridSearchCV
grid = GridSearchCV(LogisticRegression(max_iter=200), param_grid, cv=5)
grid.fit(X_train, y_train)

# Step 5: Print best parameters and validation accuracy
print("Best Parameters:", grid.best_params_)
print(f"Best Cross-Validation Accuracy: {grid.best_score_:.2f}")

# Step 6: Evaluate on test set
y_pred = grid.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
print(f"Test Set Accuracy: {test_accuracy:.2f}")



Best Parameters: {'C': 100, 'penalty': 'l1', 'solver': 'liblinear'}
Best Cross-Validation Accuracy: 0.97
Test Set Accuracy: 0.98


In [5]:
#9): Write a Python program to standardize the features before training Logistic
#Regression and compare the model's accuracy with and without scaling.
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Step 1: Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 3: Train Logistic Regression WITHOUT scaling
model_no_scaling = LogisticRegression(max_iter=200)
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

# Step 4: Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 5: Train Logistic Regression WITH scaling
model_scaled = LogisticRegression(max_iter=200)
model_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = model_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# Step 6: Compare results
print(f"Accuracy without scaling: {accuracy_no_scaling:.2f}")
print(f"Accuracy with scaling:    {accuracy_scaled:.2f}")

Accuracy without scaling: 0.96
Accuracy with scaling:    0.97


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


10) Imagine you are working at an e-commerce company that wants to predict which customers will respond to a marketing campaign. Given an imbalanced dataset (only 5% of customers respond), describe the approach you’d take to build a Logistic Regression model — including data handling, feature scaling, balancing classes, hyperparameter tuning, and evaluating the model for this real-world business
use case.
Ans. To build a robust Logistic Regression model for predicting customer response in an imbalanced e-commerce dataset (5% responders), here's a step-by-step approach:
 1. Data Handling & Preprocessing
 2. Addressing Class Imbalance
 3. Feature Scaling
 4. Hyperparameter Tuning
 5. Model Evaluation
