In [2]:
"""
1. What is Logistic Regression, and how does it differ from Linear Regression?
Logistic Regression is a classification algorithm used to predict binary or categorical outcomes.

Linear Regression predicts continuous values.

Logistic Regression uses the sigmoid function to map predicted values between 0 and 1, representing probabilities.

2. What is the mathematical equation of Logistic Regression?
𝑃(𝑦=1∣𝑥)=𝜎(𝑧)=11+−𝑧where𝑧=𝑤𝑇𝑥+𝑏

3. Why do we use the Sigmoid function in Logistic Regression?
The sigmoid function maps any real-valued number into the (0, 1) interval.
It outputs probabilities, which are ideal for binary classification.

4. What is the cost function of Logistic Regression?
The log loss or binary cross-entropy:

𝐽(𝑤)=−1𝑚∑𝑖=1𝑚[𝑦(𝑖)log⁡(𝑦^(𝑖))+(1−𝑦(𝑖))
log⁡(1−𝑦^(𝑖))]

5. What is Regularization in Logistic Regression? Why is it needed?
Regularization prevents overfitting by penalizing large weights.
It adds a penalty term to the cost function:

L1 (Lasso): 𝜆∑∣𝑤𝑖∣

L2 (Ridge): 𝜆∑𝑤𝑖2

6. Explain the difference between Lasso, Ridge, and Elastic Net regression.
Lasso (L1): Shrinks some coefficients to zero → performs feature selection.

Ridge (L2): Shrinks coefficients but keeps all features.

Elastic Net: Combines L1 + L2, good when features are correlated.

7. When should we use Elastic Net instead of Lasso or Ridge?
Use Elastic Net when:

You have many correlated features.

You need both feature selection (from Lasso) and stability (from Ridge).

8. What is the impact of the regularization parameter (λ) in Logistic Regression?
Higher λ: More regularization → smaller weights → less overfitting.

Lower λ: Less regularization → model may overfit.

λ is a hyperparameter and must be tuned.

9. What are the key assumptions of Logistic Regression?
Linearity of log-odds (not features and output directly).

No or little multicollinearity among features.

Independence of observations.

Large sample size for reliable estimates.

10. What are some alternatives to Logistic Regression for classification tasks?
Decision Trees

Random Forest

Support Vector Machines (SVM)

k-Nearest Neighbors (k-NN)

Naive Bayes

Neural Networks

11. What are Classification Evaluation Metrics?
Accuracy

Precision

Recall

F1 Score

ROC-AUC

Confusion Matrix

12. How does class imbalance affect Logistic Regression?
It causes the model to bias toward the majority class.

Evaluation metrics like accuracy become misleading.

Use techniques like:

Resampling (oversample minority, undersample majority)

Class weights

Use precision-recall or F1 score instead of accuracy.

13. What is Hyperparameter Tuning in Logistic Regression?
It involves optimizing parameters like:

λ (regularization strength)

Penalty type (L1, L2)

Solver

Techniques: Grid Search, Random Search, Cross-validation

14. What are different solvers in Logistic Regression? Which one should be used?
liblinear: Good for small datasets, supports L1 and L2.

saga: Supports both L1 and Elastic Net, efficient for large datasets.

lbfgs: Good for multiclass problems, fast, supports L2.

newton-cg: Efficient for L2-regularization.

Recommendation: Use saga for large datasets and lbfgs for multiclass.

15. How is Logistic Regression extended for multiclass classification?
One-vs-Rest (OvR): Trains a binary classifier for each class.

Softmax Regression (Multinomial Logistic Regression): Generalizes logistic regression to handle multiple classes directly.

16. What are the advantages and disadvantages of Logistic Regression?
Advantages:

Simple and interpretable

Fast to train

Works well with linearly separable data

Disadvantages:

Assumes linearity in log-odds

Not good for complex patterns

Sensitive to outliers and multicollinearity

17. What are some use cases of Logistic Regression?
Email spam detection

Disease prediction (e.g., diabetes, cancer)

Credit scoring

Marketing (likelihood of purchase)

Customer churn prediction

18. What is the difference between Softmax Regression and Logistic Regression?
Logistic Regression: For binary classification.

Softmax Regression: For multiclass classification, outputs probability distribution over classes using softmax.

19. How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification?
OvR: Simpler, faster, works well if classes are not highly overlapping.

Softmax: More accurate for mutually exclusive classes, better when class dependencies exist.

20. How do we interpret coefficients in Logistic Regression?
Coefficients represent the change in log-odds of the outcome with a unit change in the predictor.

exp⁡(𝑤𝑖)
exp(wi): Tells how the odds change (e.g., if
exp(𝑤𝑖)=2
exp(w i)=2, odds double with 1 unit increase in feature 𝑥𝑖).
"""

'\n1. What is Logistic Regression, and how does it differ from Linear Regression?\nLogistic Regression is a classification algorithm used to predict binary or categorical outcomes.\n\nLinear Regression predicts continuous values.\n\nLogistic Regression uses the sigmoid function to map predicted values between 0 and 1, representing probabilities.\n\n2. What is the mathematical equation of Logistic Regression?\n𝑃(𝑦=1∣𝑥)=𝜎(𝑧)=11+−𝑧where𝑧=𝑤𝑇𝑥+𝑏\n\n3. Why do we use the Sigmoid function in Logistic Regression?\nThe sigmoid function maps any real-valued number into the (0, 1) interval.\nIt outputs probabilities, which are ideal for binary classification.\n\n4. What is the cost function of Logistic Regression?\nThe log loss or binary cross-entropy:\n\n𝐽(𝑤)=−1𝑚∑𝑖=1𝑚[𝑦(𝑖)log\u2061(𝑦^(𝑖))+(1−𝑦(𝑖))\nlog\u2061(1−𝑦^(𝑖))]\n\n5. What is Regularization in Logistic Regression? Why is it needed?\nRegularization prevents overfitting by penalizing large weights.\nIt adds a penalty term to the cost function

In [3]:
#1.Write a Python program that loads a dataset, splits it into training and testing sets, applies Logistic
#Regression, and prints the model accuracyC
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Logistic Regression
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Model Accuracy:", accuracy)


Model Accuracy: 0.956140350877193


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [4]:
#2. Write a Python program to apply L1 regularization (Lasso) on a dataset using LogisticRegression(penalty='l1')
#and print the model accuracy
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(penalty='l1', solver='liblinear', max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print("L1 Regularized Logistic Regression Accuracy:", accuracy_score(y_test, y_pred))


L1 Regularized Logistic Regression Accuracy: 0.956140350877193


In [5]:
#3.Write a Python program to train Logistic Regression with L2 regularization (Ridge) using
#LogisticRegression(penalty='l2'). Print model accuracy and coefficients
model = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=1000)
model.fit(X_train, y_train)

print("L2 Regularized Logistic Regression Accuracy:", accuracy_score(y_test, model.predict(X_test)))
print("Model Coefficients:", model.coef_)



L2 Regularized Logistic Regression Accuracy: 0.956140350877193
Model Coefficients: [[ 2.09981182  0.13248576 -0.10346836 -0.00255646 -0.17024348 -0.37984365
  -0.69120719 -0.4081069  -0.23506963 -0.02356426 -0.0854046   1.12246945
  -0.32575716 -0.06519356 -0.02371113  0.05960156  0.00452206 -0.04277587
  -0.04148042  0.01425051  0.96630267 -0.37712622 -0.05858253 -0.02395975
  -0.31765956 -1.00443507 -1.57134711 -0.69351401 -0.84095566 -0.09308282]]


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [6]:
#4.Write a Python program to train Logistic Regression with Elastic Net Regularization (penalty='elasticnet')
model = LogisticRegression(penalty='elasticnet', solver='saga', l1_ratio=0.5, max_iter=1000)
model.fit(X_train, y_train)

print("Elastic Net Logistic Regression Accuracy:", accuracy_score(y_test, model.predict(X_test)))



Elastic Net Logistic Regression Accuracy: 0.9649122807017544




In [7]:
#5.Write a Python program to train a Logistic Regression model for multiclass classification using
#multi_class='ovr'C
from sklearn.datasets import load_digits

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

model = LogisticRegression(multi_class='ovr', solver='liblinear', max_iter=1000)
model.fit(X_train, y_train)

print("Multiclass Logistic Regression (OvR) Accuracy:", accuracy_score(y_test, model.predict(X_test)))


Multiclass Logistic Regression (OvR) Accuracy: 0.9666666666666667




In [8]:
#6.Write a Python program to apply GridSearchCV to tune the hyperparameters (C and penalty) of Logistic
#Regression. Print the best parameters and accuracy
from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': [0.01, 0.1, 1, 10],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']  # Compatible with both L1 and L2
}

grid = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Accuracy:", grid.best_score_)


Best Parameters: {'C': 0.1, 'penalty': 'l1', 'solver': 'liblinear'}
Best Accuracy: 0.961393363623847


In [9]:
#7.Write a Python program to evaluate Logistic Regression using Stratified K-Fold Cross-Validation. Print the
#average accuracy
from sklearn.model_selection import StratifiedKFold, cross_val_score

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
model = LogisticRegression(max_iter=1000)

scores = cross_val_score(model, X, y, cv=skf, scoring='accuracy')
print("Stratified K-Fold Accuracy Scores:", scores)
print("Average Accuracy:", scores.mean())


Stratified K-Fold Accuracy Scores: [0.96666667 0.96111111 0.97214485 0.96657382 0.96935933]
Average Accuracy: 0.9671711544413494


In [None]:
#8.Write a Python program to load a dataset from a CSV file, apply Logistic Regression, and evaluate its accuracy.
import pandas as pd

# Replace with your CSV file path
df = pd.read_csv('your_dataset.csv')

# Assuming last column is target
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))


In [None]:
#9.Write a Python program to apply RandomizedSearchCV for tuning hyperparameters (C, penalty, solver) in
#Logistic Regression. Print the best parameters and accuracy
from sklearn.model_selection import RandomizedSearchCV
import numpy as np

param_dist = {
    'C': np.logspace(-3, 3, 10),
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear', 'saga']
}

random_search = RandomizedSearchCV(LogisticRegression(max_iter=1000), param_distributions=param_dist, n_iter=10, cv=5)
random_search.fit(X_train, y_train)

print("Best Parameters (Random Search):", random_search.best_params_)
print("Best Accuracy:", random_search.best_score_)



In [None]:
#10.Write a Python program to implement One-vs-One (OvO) Multiclass Logistic Regression and print accuracy
from sklearn.multiclass import OneVsOneClassifier

model = OneVsOneClassifier(LogisticRegression(max_iter=1000))
model.fit(X_train, y_train)

print("OvO Multiclass Accuracy:", accuracy_score(y_test, model.predict(X_test)))



In [None]:
#11.Write a Python program to train a Logistic Regression model and visualize the confusion matrix for binaryclassification
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)
ConfusionMatrixDisplay(cm).plot()
plt.title("Confusion Matrix")
plt.show()



In [None]:
#12.Write a Python program to train a Logistic Regression model and evaluate its performance using Precision,
#Recall, and F1-Score
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))



In [None]:
#13.Write a Python program to train a Logistic Regression model on imbalanced data and apply class weights to
#improve model performance
model = LogisticRegression(class_weight='balanced', max_iter=1000)
model.fit(X_train, y_train)
print("Accuracy with Class Weights:", accuracy_score(y_test, model.predict(X_test)))



In [None]:
#14.Write a Python program to train Logistic Regression on the Titanic dataset, handle missing values, and
#evaluate performanceM
import seaborn as sns

# Load and preprocess Titanic dataset
df = sns.load_dataset('titanic')
df = df[['sex', 'age', 'fare', 'class', 'survived']].dropna()
df['sex'] = df['sex'].map({'male': 0, 'female': 1})
df['class'] = df['class'].map({'First': 1, 'Second': 2, 'Third': 3})

X = df.drop('survived', axis=1)
y = df['survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

print("Titanic Logistic Regression Accuracy:", accuracy_score(y_test, model.predict(X_test)))



In [None]:
#15.Write a Python program to apply feature scaling (Standardization) before training a Logistic Regression model.
#Evaluate its accuracy and compare results with and without scalingM
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, random_state=42)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

print("Accuracy with Scaling:", accuracy_score(y_test, model.predict(X_test)))




In [None]:
#16.Write a Python program to train Logistic Regression and evaluate its performance using ROC-AUC score
from sklearn.metrics import roc_auc_score

y_prob = model.predict_proba(X_test)[:, 1]
print("ROC-AUC Score:", roc_auc_score(y_test, y_prob))



In [None]:
#17.Write a Python program to train Logistic Regression using a custom learning rate (C=0.5) and evaluate accuracy
model = LogisticRegression(C=0.5, max_iter=1000)
model.fit(X_train, y_train)

print("Accuracy with C=0.5:", accuracy_score(y_test, model.predict(X_test)))



In [None]:
#18.Write a Python program to train Logistic Regression and identify important features based on model coefficients
import numpy as np

feature_names = data.feature_names if hasattr(data, "feature_names") else X.columns
coeffs = model.coef_[0]
for name, coef in zip(feature_names, coeffs):
    print(f"{name}: {coef:.4f}")



In [None]:
#19.Write a Python program to train Logistic Regression and evaluate its performance using Cohen’s Kappa Score
from sklearn.metrics import cohen_kappa_score

print("Cohen's Kappa Score:", cohen_kappa_score(y_test, y_pred))


In [None]:
#20.Write a Python program to train Logistic Regression and visualize the Precision-Recall Curve for binary classificatio:from sklearn.metrics import precision_recall_curve, PrecisionRecallDisplay

precision, recall, _ = precision_recall_curve(y_test, y_prob)
disp = PrecisionRecallDisplay(precision=precision, recall=recall)
disp.plot()
plt.title("Precision-Recall Curve")
plt.show()


In [None]:

#21.Write a Python program to train Logistic Regression with different solvers (liblinear, saga, lbfgs) and compare their accuracy
for solver in ['liblinear', 'saga', 'lbfgs']:
    try:
        model = LogisticRegression(solver=solver, max_iter=1000)
        model.fit(X_train, y_train)
        acc = accuracy_score(y_test, model.predict(X_test))
        print(f"Solver = {solver}, Accuracy = {acc:.4f}")
    except Exception as e:
        print(f"Solver = {solver} failed: {e}")


In [None]:

#22.Write a Python program to train Logistic Regression and evaluate its performance using MatthewsCorrelation Coefficient (MCC)
from sklearn.metrics import matthews_corrcoef

print("MCC:", matthews_corrcoef(y_test, y_pred))


In [None]:

#23.Write a Python program to train Logistic Regression on both raw and standardized data. Compare their accuracy to see the impact  of feature scaling
# Without scaling
model_raw = LogisticRegression(max_iter=1000)
model_raw.fit(X_train, y_train)
acc_raw = accuracy_score(y_test, model_raw.predict(X_test))

# With scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train_scaled, X_test_scaled, y_train, y_test = train_test_split(X_scaled, y, random_state=42)

model_scaled = LogisticRegression(max_iter=1000)
model_scaled.fit(X_train_scaled, y_train)
acc_scaled = accuracy_score(y_test, model_scaled.predict(X_test_scaled))

print(f"Raw Accuracy: {acc_raw:.4f}")
print(f"Scaled Accuracy: {acc_scaled:.4f}")


In [None]:
#24.Write a Python program to train Logistic Regression and find the optimal C (regularization strength) using cross-validation
from sklearn.model_selection import cross_val_score

for c in [0.01, 0.1, 1, 10, 100]:
    model = LogisticRegression(C=c, max_iter=1000)
    scores = cross_val_score(model, X, y, cv=5)
    print(f"C = {c}, CV Accuracy = {scores.mean():.4f}")


In [None]:

#25.Write a Python program to train Logistic Regression, save the trained model using joblib, and load it again to make predictions
import joblib

# Train and save
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
joblib.dump(model, 'logistic_model.pkl')

# Load and predict
loaded_model = joblib.load('logistic_model.pkl')
y_pred_loaded = loaded_model.predict(X_test)

print("Accuracy from Loaded Model:", accuracy_score(y_test, y_pred_loaded))
