#Theory

Question 1:  What is Logistic Regression, and how does it differ from Linear
Regression?
  - Logistic Regression is a classification algorithm used to predict the probability of a binary outcome.
  - It makes the relationship between a dependent variable and one or more independent variables.
  - The output of a logistic regression model is a probability value between 0 and 1.
  - Logistic Regression is used for classification problems whereas Linear Regression is used for regression problems.
  - In Logistic Regression the output is a probability between 0 and 1 whereas in Linear Regression the output is a continuous value.

Question 2: Explain the role of the Sigmoid function in Logistic Regression.
  - The Sigmoid function converts real numbers into a range between 0 and 1.
  - The output of the Sigmoid function can be interpreted as the probability
  - The Sigmoid function plays an important role in Logistic Regression as it transforms the linear output of the model into a probability.
  - Which makes it easy for classification.

Question 3: What is Regularization in Logistic Regression and why is it needed?
  - Regularization is a technique used to prevent overfitting by adding a penalty term to the model’s loss function.
  - It stops the model from fitting too closely to the training data.
  - It is used to minimize loss function.

Question 4: What are some common evaluation metrics for classification models, and why are they important?
  - Evaluation metrics:-
    - Confusion Matrix - It compares the model’s predictions with the actual outcomes.
    - Accuracy - It measures the percentage of correct predictions made by the model.
    - Precision - The measuse of positive predict upon actual positive.
    - Recall - The measure of all actual positive how many are predicted by the model correctly.
    - F1-Score - It is the mean of Precision and Recall.

In [1]:
#Question 5: Write a Python program that loads a CSV file into a Pandas DataFrame, splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = load_breast_cancer()

df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print("Logistic Regression Model Accuracy:", round(accuracy * 100, 2), "%")


Logistic Regression Model Accuracy: 94.74 %


In [3]:
# Question 6:  Write a Python program to train a Logistic Regression model using L2 regularization (Ridge) and print the model coefficients and accuracy.

data = load_breast_cancer()

df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=10000)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy: {:.2f}%".format(accuracy * 100))
print("\nModel Coefficients:")
for feature, coef in zip(X.columns, model.coef_[0]):
    print(f"{feature}: {coef:.4f}")


Accuracy: 95.61%

Model Coefficients:
mean radius: 1.0274
mean texture: 0.2215
mean perimeter: -0.3621
mean area: 0.0255
mean smoothness: -0.1562
mean compactness: -0.2377
mean concavity: -0.5326
mean concave points: -0.2837
mean symmetry: -0.2267
mean fractal dimension: -0.0365
radius error: -0.0971
texture error: 1.3706
perimeter error: -0.1814
area error: -0.0872
smoothness error: -0.0225
compactness error: 0.0474
concavity error: -0.0429
concave points error: -0.0324
symmetry error: -0.0347
fractal dimension error: 0.0116
worst radius: 0.1117
worst texture: -0.5089
worst perimeter: -0.0156
worst area: -0.0169
worst smoothness: -0.3077
worst compactness: -0.7727
worst concavity: -1.4286
worst concave points: -0.5109
worst symmetry: -0.7469
worst fractal dimension: -0.1009


In [4]:
# Question 7: Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr' and print the classification report.

from sklearn.datasets import load_iris
from sklearn.metrics import classification_report

data = load_iris()

df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(multi_class='ovr', solver='lbfgs', max_iter=10000)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred, target_names=data.target_names))


              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      0.89      0.94         9
   virginica       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30





In [5]:
# Question 8: Write a Python program to apply GridSearchCV to tune C and penalty hyperparameters for Logistic Regression and print the best parameters and validation accuracy.

from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score

data = load_breast_cancer()

df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

log_reg = LogisticRegression(max_iter=10000, solver='liblinear')

param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2']
}

grid_search = GridSearchCV(estimator=log_reg, param_grid=param_grid, cv=5, scoring='accuracy')

grid_search.fit(X_train, y_train)

print("Best Parameters:", grid_search.best_params_)
print("Best Cross-Validation Accuracy: {:.2f}%".format(grid_search.best_score_ * 100))

best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
print("Test Set Accuracy: {:.2f}%".format(test_accuracy * 100))


Best Parameters: {'C': 100, 'penalty': 'l1'}
Best Cross-Validation Accuracy: 96.70%
Test Set Accuracy: 98.25%


In [6]:
# Question 9: Write a Python program to standardize the features before training Logistic Regression and compare the model's accuracy with and without scaling.

from sklearn.preprocessing import StandardScaler

data = load_breast_cancer()

df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model_no_scale = LogisticRegression(max_iter=10000, solver='lbfgs')
model_no_scale.fit(X_train, y_train)
y_pred_no_scale = model_no_scale.predict(X_test)
acc_no_scale = accuracy_score(y_test, y_pred_no_scale)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaled = LogisticRegression(max_iter=10000, solver='lbfgs')
model_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = model_scaled.predict(X_test_scaled)
acc_scaled = accuracy_score(y_test, y_pred_scaled)

print(f"Without Scaling: {acc_no_scale * 100:.2f}%")
print(f"With Scaling:    {acc_scaled * 100:.2f}%")


Without Scaling: 95.61%
With Scaling:    97.37%


Question 10: Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced dataset (only 5% of customers respond), describe the approach you’d take to build a Logistic Regression model — including data handling, feature scaling, balancing classes, hyperparameter tuning, and evaluating the model for this real-world business use case.

  - Aim is to build a Logistic Regression model for predicting customer responses in an imbalanced dataset.
  - Step 1:- Data Handeling - Remove duplicate, add missing values, encode categorical variables.
  - Step 2:- Feature Scaling - Apply standardization to numerical features to ensure they have zero mean and unit variance.
  - Step 3:- Handling Class Imbalance - Apply scaling technique like SMOTE to scale the data.
  - Step 4:- Model Traning - Train the model on the prepared dataset.
  - Step 5:- Model Evaluation - Evaluate the model using metrics suited for imbalanced classification like precision, recall and F1-score.