# Logistic Regression | Assignment

# Question 1: What is Logistic Regression, and how does it differ from Linear  Regression?

1. **Logistic Regression**

Predicts a category/class (Yes/No, 0/1, True/False).

Example: Predicting if a student will pass or fail based on study hours.


2. **Linear Regression**

Predicts a continuous value (numbers).

Example: Predicting a person’s salary based on their years of experience.



            



# Question 2: Explain the role of the Sigmoid function in Logistic Regression.

Converts linear predictions into probabilities.

Ensures output values lie between 0 and 1.

Allows classification based on a probability threshold.

Provides a smooth, differentiable function for model training.


# Question 3: What is Regularization in Logistic Regression and why is it needed?


Prevents overfitting.

Penalizes large coefficients.

Improves model stability and generalization.

Commonly applied via L1 (Lasso), L2 (Ridge), or Elastic Net.

# Question 4: What are some common evaluation metrics for classification models and why are they important?


1. Accuracy → Good for balanced datasets.

2. Precision & Recall → Important when costs of FP or FN are high.

3. F1 Score → Balances precision & recall.

4. Confusion Matrix → Gives detailed breakdown.

5. ROC-AUC → Compares models regardless of threshold.

6. Log Loss → Evaluates predicted probabilities.


# Question 5: Write a Python program that loads a CSV file into a Pandas DataFrame, splits into train/test sets, trains a Logistic Regression model, and prints its accuracy. (Use Dataset from sklearn package)


In [1]:
# Import libraries
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset from sklearn
cancer = load_breast_cancer()

# Convert to Pandas DataFrame
df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['target'] = cancer.target

# Split into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Train/Test split (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Logistic Regression model
model = LogisticRegression(max_iter=10000)  # increase iterations to ensure convergence
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Logistic Regression Model Accuracy:", accuracy)


Logistic Regression Model Accuracy: 0.956140350877193


# Question 6: Write a Python program to train a Logistic Regression model using L2 regularization (Ridge) and print the model coefficients and accuracy.

# (Use Dataset from sklearn package)

In [2]:
# Import libraries
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset from sklearn
cancer = load_breast_cancer()

# Convert to Pandas DataFrame
df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['target'] = cancer.target

# Split into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Logistic Regression model with L2 regularization
model = LogisticRegression(penalty='l2', solver='liblinear', max_iter=1000)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print("Logistic Regression Model (L2 Regularization)")
print("Accuracy:", accuracy)
print("\nModel Coefficients:")
print(model.coef_)
print("\nIntercept:")
print(model.intercept_)


Logistic Regression Model (L2 Regularization)
Accuracy: 0.956140350877193

Model Coefficients:
[[ 2.13248406e+00  1.52771940e-01 -1.45091255e-01 -8.28669349e-04
  -1.42636015e-01 -4.15568847e-01 -6.51940282e-01 -3.44456106e-01
  -2.07613380e-01 -2.97739324e-02 -5.00338038e-02  1.44298427e+00
  -3.03857384e-01 -7.25692126e-02 -1.61591524e-02 -1.90655332e-03
  -4.48855442e-02 -3.77188737e-02 -4.17516190e-02  5.61347410e-03
   1.23214996e+00 -4.04581097e-01 -3.62091502e-02 -2.70867580e-02
  -2.62630530e-01 -1.20898539e+00 -1.61796947e+00 -6.15250835e-01
  -7.42763610e-01 -1.16960181e-01]]

Intercept:
[0.40847797]


# Question 7: Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr' and print the classification report.


# (Use Dataset from sklearn package)


In [3]:
# Import libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Load dataset from sklearn
iris = load_iris()

# Convert to Pandas DataFrame
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Logistic Regression model with One-vs-Rest strategy
model = LogisticRegression(multi_class='ovr', solver='liblinear', max_iter=1000)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Classification report
print("Logistic Regression Model (One-vs-Rest)")
print("\nClassification Report:\n")
print(classification_report(y_test, y_pred, target_names=iris.target_names))


Logistic Regression Model (One-vs-Rest)

Classification Report:

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30





# Question 8: Write a Python program to apply GridSearchCV to tune C and penalty hyperparameters for Logistic Regression and print the best parameters and validation accuracy.

# (Use Dataset from sklearn package)

In [4]:
# Import libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Logistic Regression model
log_reg = LogisticRegression(solver='liblinear', max_iter=1000)

# Define parameter grid
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],   # Regularization strength
    'penalty': ['l1', 'l2']         # L1 = Lasso, L2 = Ridge
}

# GridSearchCV
grid = GridSearchCV(log_reg, param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

# Best parameters
print("Best Parameters:", grid.best_params_)
print("Best Cross-Validation Accuracy:", grid.best_score_)

# Evaluate on test set
y_pred = grid.best_estimator_.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
print("Test Accuracy with Best Parameters:", test_accuracy)


Best Parameters: {'C': 10, 'penalty': 'l1'}
Best Cross-Validation Accuracy: 0.9583333333333334
Test Accuracy with Best Parameters: 1.0


# Question 9: Write a Python program to standardize the features before training Logistic Regression and compare the model's accuracy with and without scaling.


# (Use Dataset from sklearn package)



In [5]:
# Import libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# -------- Model WITHOUT scaling --------
model_no_scaling = LogisticRegression(max_iter=1000)
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

# -------- Model WITH scaling --------
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaling = LogisticRegression(max_iter=1000)
model_scaling.fit(X_train_scaled, y_train)
y_pred_scaling = model_scaling.predict(X_test_scaled)
accuracy_scaling = accuracy_score(y_test, y_pred_scaling)

# -------- Results --------
print("Logistic Regression Accuracy (Without Scaling):", accuracy_no_scaling)
print("Logistic Regression Accuracy (With Scaling):   ", accuracy_scaling)


Logistic Regression Accuracy (Without Scaling): 1.0
Logistic Regression Accuracy (With Scaling):    1.0


**Question 10:** Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced dataset (only 5% of customers respond), describe the approach you’d take to build a Logistic Regression model — including data handling, feature scaling, balancing classes, hyperparameter tuning, and evaluating the model for this real-world business use case.


1. Clean and preprocess dataset.

2. Standardize features.

3. Stratified split (train/test).

4. Apply resampling (SMOTE) or class weights.

5. Train Logistic Regression with hyperparameter tuning (GridSearchCV).

6. Evaluate using ROC-AUC & PR-AUC.

7. Adjust decision threshold based on business trade-offs.






