#**Theoritical** **Questions**

1. What is Logistic Regression, and how does it differ from Linear Regression?

Answer:
Logistic Regression is a classification algorithm used to predict categorical outcomes (e.g., binary classification: yes/no, spam/not spam). It differs from Linear Regression in that:

Linear Regression predicts continuous values, while Logistic Regression predicts probabilities.

Logistic Regression uses the sigmoid function to map predicted values between 0 and 1, while Linear Regression does not.

Linear Regression minimizes Mean Squared Error (MSE), whereas Logistic Regression minimizes Log Loss (Binary Cross-Entropy Loss).



---

2. What is the mathematical equation of Logistic Regression?

Answer:
The equation for Logistic Regression is:

p = \frac{1}{1 + e^{-(b_0 + b_1X_1 + b_2X_2 + ... + b_nX_n)}}

where :
is the probability of the outcome being 1.

 is the intercept (bias).

 are the model coefficients.

 are the input features.



---

3. Why do we use the Sigmoid function in Logistic Regression?

Answer:
The sigmoid function is used because it maps any real-valued number to a probability between 0 and 1, making it suitable for classification. The function is:

\sigma(z) = \frac{1}{1 + e^{-z}}

where  is the linear combination of input features. The sigmoid function helps in:

Converting raw scores into probabilities.

Ensuring the output remains interpretable as a probability.



---

4. What is the cost function of Logistic Regression?

Answer:
Logistic Regression uses the Log Loss (Binary Cross-Entropy Loss) function:

J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log (p_i) + (1 - y_i) \log (1 - p_i) \right]

where:

 is the number of samples.

 is the actual class label (0 or 1).

 is the predicted probability.


Log Loss penalizes incorrect predictions more strongly than correct ones.


---

5. What is Regularization in Logistic Regression? Why is it needed?

Answer:
Regularization prevents overfitting by adding a penalty to the cost function. It ensures that the model does not become too complex by forcing small coefficients.

There are two main types of regularization:

L1 Regularization (Lasso): Shrinks some coefficients to zero, performing feature selection.

L2 Regularization (Ridge): Shrinks coefficients but does not make them zero, preventing high variance.



---

6. Explain the difference between Lasso, Ridge, and Elastic Net regression.

Answer:

Lasso Regression (L1): Uses absolute values for penalty. It removes some coefficients, leading to feature selection.

Ridge Regression (L2): Uses squared values for penalty. It reduces coefficients but keeps all features.

Elastic Net: Combines Lasso and Ridge, balancing feature selection and shrinkage.



---

7. When should we use Elastic Net instead of Lasso or Ridge?

Answer:
Elastic Net should be used when:

There are many correlated features (Ridge helps in correlation).

We need feature selection (Lasso removes unimportant features).

We want to balance between Ridge and Lasso for better generalization.



---

8. What is the impact of the regularization parameter (λ) in Logistic Regression?

Answer:

High λ (strong regularization): Shrinks coefficients, reducing overfitting but possibly underfitting.

Low λ (weak regularization): Allows large coefficients, increasing model complexity and overfitting risk.



---

9. What are the key assumptions of Logistic Regression?

Answer:

Linear relationship between independent variables and the log-odds.

No multicollinearity (correlated predictors can cause instability).

No extreme outliers (they can heavily influence coefficients).

Large sample size for reliable probability estimates.



---

10. What are some alternatives to Logistic Regression for classification tasks?

Answer:

Decision Trees (handles non-linearity better).

Random Forests (ensemble approach, reduces overfitting).

Support Vector Machines (SVM) (good for high-dimensional data).

Neural Networks (suitable for complex patterns).



---

11. What are Classification Evaluation Metrics?

Answer:

Accuracy = Correct Predictions / Total Predictions.

Precision = TP / (TP + FP) → Measures how many predicted positives are actually positive.

Recall = TP / (TP + FN) → Measures how many actual positives are captured.

F1-Score = Harmonic mean of Precision & Recall.

ROC-AUC Score = Measures model performance across thresholds.



---

12. How does class imbalance affect Logistic Regression?

Answer:
If one class dominates, the model may predict the majority class always, reducing recall for the minority class.
Solutions:

Class weighting (class_weight='balanced' in scikit-learn).

Oversampling/undersampling.

SMOTE (Synthetic Minority Over-sampling Technique).



---

13. What is Hyperparameter Tuning in Logistic Regression?

Answer:
Finding the best values for parameters like:

C (inverse of λ): Controls regularization strength.

Solver: Optimization algorithm (e.g., liblinear, saga).

Penalty type: L1, L2, Elastic Net.


Methods:

GridSearchCV (exhaustive search).

RandomizedSearchCV (random sampling).



---

14. What are different solvers in Logistic Regression? Which one should be used?

Answer:

liblinear: Small datasets, L1/L2 regularization.

lbfgs: Multiclass, large datasets.

saga: L1, L2, Elastic Net, supports large datasets.

newton-cg: Works well with large datasets, L2 regularization.



---

15. How is Logistic Regression extended for multiclass classification?

Answer:

One-vs-Rest (OvR): Trains multiple binary classifiers.

Softmax Regression (Multinomial): Assigns probabilities to all classes directly.



---

16. What are the advantages and disadvantages of Logistic Regression?

Advantages:

Simple, interpretable, and computationally efficient.

Works well for linearly separable data.

Can be regularized to prevent overfitting.


Disadvantages:

Struggles with non-linearity.

Sensitive to outliers and correlated features.



---

17. What are some use cases of Logistic Regression?

Answer:

Medical diagnosis (disease prediction).

Spam detection.

Credit risk analysis.

Customer churn prediction.



---

18. What is the difference between Softmax Regression and Logistic Regression?

Answer:

Logistic Regression is for binary classification.

Softmax Regression generalizes it for multiclass classification.



---

19. How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification?

Answer:

OvR: Simple, works well for many classes.

Softmax: Better for mutually exclusive classes.



---

20. How do we interpret coefficients in Logistic Regression?

Answer:
Each coefficient represents the log-odds change for a unit increase in the feature.

\text{Odds Ratio} = e^{\beta_i}





#**Practical** **Questions**

1. Write a Python program that loads a dataset, splits it into training and testing sets, applies Logistic
Regression, and prints the model accuracy?

In [5]:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score


df = pd.read_csv("diabetes.csv")
X = df.drop("target", axis=1)
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")



FileNotFoundError: [Errno 2] No such file or directory: 'diabetes.csv'

2.Write a Python program to apply L1 regularization (Lasso) on a dataset using LogisticRegression(penalty='l1')
and print the model accuracy.

In [6]:
model_l1 = LogisticRegression(penalty='l1', solver='liblinear')
model_l1.fit(X_train, y_train)
accuracy_l1 = model_l1.score(X_test, y_test)
print(f"L1 Regularization Accuracy: {accuracy_l1:.2f}")


NameError: name 'X_train' is not defined

3.Write a Python program to train Logistic Regression with L2 regularization (Ridge) using
LogisticRegression(penalty='l2'). Print model accuracy and coefficients.

In [7]:
model_l2 = LogisticRegression(penalty='l2', max_iter=1000)
model_l2.fit(X_train, y_train)

y_pred_l2 = model_l2.predict(X_test)
accuracy_l2 = accuracy_score(y_test, y_pred_l2)
print(f"L2 Regularization Accuracy: {accuracy_l2:.2f}")

print("Coefficients:")
model_l2.coef_


NameError: name 'X_train' is not defined

4.Write a Python program to train Logistic Regression with Elastic Net Regularization (penalty='elasticnet').

In [8]:
model_en = LogisticRegression(penalty='elasticnet', solver='saga', l1_ratio=0.5)
model_en.fit(X_train, y_train)
accuracy_en = model_en.score(X_test, y_test)
print(f"Elastic Net Accuracy: {accuracy_en:.2f}")



NameError: name 'X_train' is not defined

5.Write a Python program to train a Logistic Regression model for multiclass classification using
multi_class='ovr.

In [9]:
model_ovr = LogisticRegression(multi_class='ovr', solver='liblinear')
model_ovr.fit(X_train, y_train)
print(f"Multiclass OvR Accuracy: {model_ovr.score(X_test, y_test):.2f}")


NameError: name 'X_train' is not defined

6.Write a Python program to apply GridSearchCV to tune the hyperparameters (C and penalty) of Logistic
Regression. Print the best parameters and accuracy.

In [10]:
from sklearn.model_selection import GridSearchCV

param_grid = {'C': [0.01, 0.1, 1, 10], 'penalty': ['l1', 'l2'], 'solver': ['liblinear']}
grid_search = GridSearchCV(LogisticRegression(), param_grid, cv=5)
grid_search.fit(X_train, y_train)

print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Accuracy: {grid_search.best_score_:.2f}")


NameError: name 'X_train' is not defined

7. Write a Python program to evaluate Logistic Regression using Stratified K-Fold Cross-Validation. Print the
average accuracy.

In [11]:
from sklearn.model_selection import StratifiedKFold, cross_val_score

cv = StratifiedKFold(n_splits=5)
scores = cross_val_score(model, X, y, cv=cv)
print(f"Cross-Validation Accuracy: {scores.mean():.2f}")


NameError: name 'model' is not defined

8. Write a Python program to load a dataset from a CSV file, apply Logistic Regression, and evaluate its
accuracy.

In [13]:
df = pd.read_csv("data.csv")
X = df.drop("target", axis=1)
y = df["target"]

df = pd.read_csv("data.csv")
X = df.drop("target", axis=1)
y = df["target"]

model.fit(X, y)
accuracy = model.score(X, y)
print(f"Model Accuracy: {accuracy:.2f}")


FileNotFoundError: [Errno 2] No such file or directory: 'data.csv'

9. Write a Python program to apply RandomizedSearchCV for tuning hyperparameters (C, penalty, solver) in
Logistic Regression. Print the best parameters and accuracy.

In [14]:
from sklearn.model_selection import RandomizedSearchCV

param_dist = {'C': [0.01, 0.1, 1, 10], 'penalty': ['l1', 'l2'], 'solver': ['liblinear']}
random_search = RandomizedSearchCV(LogisticRegression(), param_dist, n_iter=5, cv=5, random_state=42)
random_search.fit(X_train, y_train)

print(f"Best Parameters: {random_search.best_params_}")
print(f"Best Accuracy: {random_search.best_score_:.2f}")



NameError: name 'X_train' is not defined

10. Write a Python program to apply RandomizedSearchCV for tuning hyperparameters (C, penalty, solver) in
Logistic Regression. Print the best parameters and accuracy.

In [15]:
param_dist = {'C': [0.01, 0.1, 1, 10], 'penalty': ['l1', 'l2'], 'solver': ['liblinear']}
random_search = RandomizedSearchCV(LogisticRegression(), param_dist, n_iter=5, cv=5, random_state=42)
random_search.fit(X_train, y_train)

print(f"Best Parameters: {random_search.best_params_}")
print(f"Best Accuracy: {random_search.best_score_:.2f}")


NameError: name 'X_train' is not defined

11. Write a Python program to train a Logistic Regression model and visualize the confusion matrix for binary
classification.

In [16]:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

sns.heatmap(cm, annot=True, fmt='d', cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()


NameError: name 'model' is not defined

12. Write a Python program to train a Logistic Regression model and evaluate its performance using Precision,
Recall, and F1-Score.

In [17]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))


NameError: name 'y_test' is not defined

13. Write a Python program to train a Logistic Regression model on imbalanced data and apply class weights to
improve model performance.

In [18]:
model_balanced = LogisticRegression(class_weight='balanced')
model_balanced.fit(X_train, y_train)
print(f"Balanced Model Accuracy: {model_balanced.score(X_test, y_test):.2f}")


NameError: name 'X_train' is not defined

14. Write a Python program to train Logistic Regression on the Titanic dataset, handle missing values, and
evaluate performance.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score

# Load the Titanic dataset (replace with your dataset path)
titanic_df = pd.read_csv("titanic.csv")

# Select features and target variable
features = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
target = "Survived"
X = titanic_df[features]
y = titanic_df[target]

# Handle missing values (replace with mean for numerical and mode for categorical)
imputer_num = SimpleImputer(strategy="mean")
imputer_cat = SimpleImputer(strategy="most_frequent")

numerical_cols = ["Age", "Fare"]
categorical_cols = ["Sex", "Embarked"]

X[numerical_cols] = imputer_num.fit_transform(X[numerical_cols])
X[categorical_cols] = imputer_cat.fit_transform(X[categorical_cols])

# Convert categorical features to numerical using one-hot encoding
X = pd.get_dummies(X, columns=["Sex", "Embarked"], drop_first=True)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate model performance (accuracy)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")


15. Write a Python program to apply feature scaling (Standardization) before training a Logistic Regression
model. Evaluate its accuracy and compare results with and without scaling.

In [19]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model.fit(X_train_scaled, y_train)
print(f"Accuracy with Scaling: {model.score(X_test_scaled, y_test):.2f}")


NameError: name 'X_train' is not defined

16. Write a Python program to train Logistic Regression and evaluate its performance using ROC-AUC score.

In [None]:
from sklearn.metrics import roc_auc_score

y_prob = model.predict_proba(X_test)[:, 1]
print(f"ROC-AUC Score: {roc_auc_score(y_test, y_prob):.2f}")


17. Write a Python program to train Logistic Regression using a custom learning rate (C=0.5) and evaluate
accuracy.

In [None]:
model_c = LogisticRegression(C=0.5)
model_c.fit(X_train, y_train)
print(f"Custom C Model Accuracy: {model_c.score(X_test, y_test):.2f}")


18. Write a Python program to train Logistic Regression and identify important features based on model
coefficients.

In [None]:
feature_importance = pd.Series(model.coef_[0], index=X.columns)
print("Feature Importance:")
print(feature_importance.sort_values(ascending=False))


19.  Write a Python program to train Logistic Regression and evaluate its performance using Cohen’s Kappa
Score.

In [None]:
from sklearn.metrics import cohen_kappa_score
print(f"Cohen’s Kappa Score: {cohen_kappa_score(y_test, y_pred):.2f}"

20. Write a Python program to train Logistic Regression and visualize the Precision-Recall Curve for binary
classificatio.

In [None]:
from sklearn.metrics import precision_recall_curve

precision, recall, _ = precision_recall_curve(y_test, y_prob)
plt.plot(recall, precision, marker='.')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.show()


21. Write a Python program to train Logistic Regression with different solvers (liblinear, saga, lbfgs) and compare
their accuracy.

22. Write a Python program to train Logistic Regression and evaluate its performance using Matthews
Correlation Coefficient (MCC).

23. Write a Python program to train Logistic Regression on both raw and standardized data. Compare their
accuracy to see the impact of feature scaling.

24. Write a Python program to train Logistic Regression and find the optimal C (regularization strength) using
cross-validation.

25. Write a Python program to train Logistic Regression, save the trained model using joblib, and load it again to
make predictions.