##Question 1: What is Logistic Regression, and how does it differ from Linear Regression?

Answer:
Logistic Regression is a statistical method used for classification problems. It predicts the probability that an instance belongs to a particular class. Unlike Linear Regression, which outputs continuous values, Logistic Regression outputs probabilities between 0 and 1 using the sigmoid (logistic) function, and applies a threshold (commonly 0.5) to decide class labels.

Linear Regression → Predicts numeric continuous outcomes.

Logistic Regression → Predicts categorical outcomes (binary or multi-class).


###Explain the role of the Sigmoid function in Logistic Regression.

Answer:
The sigmoid function transforms the linear combination of input features into a probability between 0 and 1

Ensures outputs are valid probabilities.

Enables mapping linear regression outputs to classification tasks.

Provides a decision boundary (e.g., probability ≥ 0.5 → Class 1, otherwise Class 0).

###Question 3: What is Regularization in Logistic Regression and why is it needed?

Answer:
Regularization is a technique used to prevent overfitting by adding a penalty term to the loss function.

L1 (Lasso) → Encourages sparsity (feature selection).
L2 (Ridge) → Shrinks coefficients, reducing variance.

It’s needed because:
Logistic Regression models may overfit when too many features exist.
Regularization improves generalization and model stability.

###Question 4: What are some common evaluation metrics for classification models, and why are they important?

Answer:

Accuracy: Proportion of correct predictions. (Good when classes are balanced.)

Precision: Of all predicted positives, how many are correct. (Important in fraud detection, spam filtering.)

Recall (Sensitivity): Of all actual positives, how many are detected. (Important in medical diagnosis.)

F1-Score: Harmonic mean of precision and recall (balances both).

ROC-AUC: Measures discrimination power across thresholds


 Write a Python program that loads a CSV file into a Pandas DataFrame,
splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
(Use Dataset from sklearn package)
(Include your Python code and output in the code box below.)

In [6]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load the breast cancer CSV file
df = pd.read_csv("/content/breast_cancer.csv")

# Separate features and target
X = df.drop("target", axis=1)
y = df["target"]

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Logistic Regression model
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracy


0.956140350877193

Question 7 .Write a Python program to train a Logistic Regression model for multiclass
classification using multi_class='ovr' and print the classification report.
(Use Dataset from sklearn package)

In [7]:
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train logistic regression with OvR
model = LogisticRegression(multi_class='ovr', max_iter=1000)
model.fit(X, y)

# Predict and report
y_pred = model.predict(X)
print(classification_report(y, y_pred))


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        50
           1       0.96      0.90      0.93        50
           2       0.91      0.96      0.93        50

    accuracy                           0.95       150
   macro avg       0.95      0.95      0.95       150
weighted avg       0.95      0.95      0.95       150





Question 8: Write a Python program to apply GridSearchCV to tune C and penalty
hyperparameters for Logistic Regression and print the best parameters and validation
accuracy.


In [8]:
from sklearn.model_selection import GridSearchCV

# Define model and parameter grid
param_grid = {
    'C': [0.01, 0.1, 1, 10],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']
}
grid = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5)
grid.fit(X_train, y_train)

print("Best parameters:", grid.best_params_)
print("Validation accuracy:", grid.best_score_)


Best parameters: {'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}
Validation accuracy: 0.9626373626373628


Question 9: Write a Python program to standardize the features before training Logistic
Regression and compare the model's accuracy with and without scaling.


In [9]:
from sklearn.preprocessing import StandardScaler

# Without scaling
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
print("Accuracy without scaling:", model.score(X_test, y_test))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaled = LogisticRegression(max_iter=1000)
model_scaled.fit(X_train_scaled, y_train)
print("Accuracy with scaling:", model_scaled.score(X_test_scaled, y_test))


Accuracy without scaling: 0.956140350877193
Accuracy with scaling: 0.9736842105263158


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


##Question 10: Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced
dataset (only 5% of customers respond), describe the approach you’d take to build a
Logistic Regression model — including data handling, feature scaling, balancing
classes, hyperparameter tuning, and evaluating the model for this real-world business
use case.

Ans:
For predicting marketing campaign response (5% positive rate):
Data Handling: Handle missing values, encode categorical features, normalize numeric features.

Feature Scaling: Standardize features to improve convergence.
Balancing Classes: Use SMOTE (oversampling), undersampling, or class weights (class_weight='balanced') in Logistic Regression.
Hyperparameter Tuning: Tune C, penalty, and solver with GridSearchCV.
Evaluation Metrics: Use Precision, Recall, F1, and ROC-AUC instead of accuracy (since data is imbalanced).
Final Approach:

Train Logistic Regression with scaled features.
Apply class weighting to handle imbalance.
Evaluate using Recall/F1 to ensure campaign captures maximum true positives.