Q1) What is Logistic Regression, and how does it differ from Linear
Regression?

ans-Logistic regression predicts categorical outcomes (like Yes/No, Spam/Not Spam) by estimating probabilities (0-1) using a sigmoid curve, while linear regression predicts continuous numerical values (like price, temperature) with a straight line

Q2) Explain the role of the Sigmoid function in Logistic Regression?

ans-The sigmoid function is crucial in logistic regression because it "squashes" the output of a linear equation into a probability (0 to 1), allowing for binary classification by mapping any real number to a value between 0 and 1, representing the likelihood of an event, and its S-shaped curve smoothly handles extreme values, preventing misclassification and modeling the odds ratio effectively.  

Q3) What is Regularization in Logistic Regression and why is it needed?

ans-Regularization in Logistic Regression adds a penalty term to the loss function, preventing overfitting by shrinking large coefficient values, making the model simpler and more generalizable to new data, which is crucial because logistic models can easily capture noise and perform poorly on unseen data if not regularized

Q4) What are some common evaluation metrics for classification models, and why are they important?

ans-Common Classification Metrics
Accuracy: (Correct Predictions / Total Predictions) - Good for balanced datasets, but misleading when classes are uneven (e.g., 99% accuracy in detecting rare disease is useless).
Precision: (True Positives / (True Positives + False Positives)) - How many predicted positives were actually positive? Important when False Positives are costly (e.g., flagging a benign tumor as malignant).
Recall (Sensitivity/True Positive Rate): (True Positives / (True Positives + False Negatives)) - How many actual positives did the model find? Key when False Negatives are severe (e.g., missing a cancer diagnosis).
F1-Score: Harmonic mean of Precision & Recall - Provides a single score balancing both, ideal for imbalanced datasets.
AUC-ROC (Area Under the ROC Curve): Plots True Positive Rate vs. False Positive Rate - Measures overall model discriminative ability, useful across thresholds and insensitive to class imbalance.
Log Loss (Cross-Entropy): Measures probability-based errors - Penalizes confident wrong predictions more heavily, good for models outputting probabilities (like logistic regression).
Why They Are Important
Contextual Performance: No single metric tells the whole story; different metrics highlight different strengths and weaknesses (e.g., high recall but low precision).
Cost of Errors: They help quantify the business or real-world cost of different errors (False Positives vs. False Negatives).
Class Imbalance: Metrics like Accuracy fail with imbalanced data; Precision, Recall, F1, and AUC become vital to ensure the minority class isn't ignored.
Threshold Tuning: AUC-ROC helps evaluate performance independent of a specific decision threshold, while metrics like Precision/Recall guide threshold selection to meet specific needs (e.g., prioritize precision for search results, recall for fraud detection).

In [1]:
'''Q5 Write a Python program that loads a CSV file into a Pandas DataFrame,
splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
(Use Dataset from sklearn package)
'''
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd
data=load_iris()
df=pd.DataFrame(data.data,columns=data.feature_names)
df['target']=data.target
X=df.drop('target',axis=1)
y=df['target']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
model=LogisticRegression()
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
accuracy=accuracy_score(y_test,y_pred)
print(f"Accuracy: {accuracy}")

Accuracy: 1.0


In [2]:
''' Q6 Write a Python program to train a Logistic Regression model using L2
regularization (Ridge) and print the model coefficients and accuracy.
'''
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd
data=load_iris()
df=pd.DataFrame(data.data,columns=data.feature_names)
df['target']=data.target
X=df.drop('target',axis=1)
y=df['target']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
model=LogisticRegression(penalty='l2',C=1.0)
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
accuracy=accuracy_score(y_test,y_pred)
print(f"Accuracy: {accuracy}")
print(f"Coefficients: {model.coef_}")

Accuracy: 1.0
Coefficients: [[-0.39345607  0.96251768 -2.37512436 -0.99874594]
 [ 0.50843279 -0.25482714 -0.21301129 -0.77574766]
 [-0.11497673 -0.70769055  2.58813565  1.7744936 ]]


In [3]:
''' Q7 Write a Python program to train a Logistic Regression model for multiclass
classification using multi_class='ovr' and print the classification report.
(Use Dataset from sklearn package)
'''
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
import pandas as pd
data=load_iris()
df=pd.DataFrame(data.data,columns=data.feature_names)
df['target']=data.target
X=df.drop('target',axis=1)
y=df['target']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
model=LogisticRegression(multi_class='ovr')
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
report=classification_report(y_test,y_pred)
print(report)


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30





In [6]:
''' Q8 Write a Python program to apply GridSearchCV to tune C and penalty
hyperparameters for Logistic Regression and print the best parameters and validation
accuracy.
(Use Dataset from sklearn package)
'''
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
data=load_iris()
df=pd.DataFrame(data.data,columns=data.feature_names)
df['target']=data.target
X=df.drop('target',axis=1)
y=df['target']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
param_grid={'C':[0.001,0.01,0.1,1,10,100],'penalty':['l1','l2']}
grid_search=GridSearchCV(LogisticRegression(),param_grid,cv=5)
grid_search.fit(X_train,y_train)
best_params=grid_search.best_params_
best_model=grid_search.best_estimator_
y_pred=best_model.predict(X_test)
accuracy=accuracy_score(y_test,y_pred)
print(f"Best Parameters: {best_params}")
print(f"Validation Accuracy: {accuracy}")


Best Parameters: {'C': 1, 'penalty': 'l2'}
Validation Accuracy: 1.0


In [7]:
''' Q9 Write a Python program to standardize the features before training Logistic
Regression and compare the model's accuracy with and without scaling.
'''
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import pandas as pd
data=load_iris()
df=pd.DataFrame(data.data,columns=data.feature_names)
df['target']=data.target
X=df.drop('target',axis=1)
y=df['target']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)
scaler=StandardScaler()
X_train_scaled=scaler.fit_transform(X_train)
X_test_scaled=scaler.transform(X_test)
model_scaled=LogisticRegression()
model_scaled.fit(X_train_scaled,y_train)
y_pred_scaled=model_scaled.predict(X_test_scaled)
accuracy_scaled=accuracy_score(y_test,y_pred_scaled)
model_unscaled=LogisticRegression()
model_unscaled.fit(X_train,y_train)
y_pred_unscaled=model_unscaled.predict(X_test)
accuracy_unscaled=accuracy_score(y_test,y_pred_unscaled)
print(f"Accuracy with Scaling: {accuracy_scaled}")
print(f"Accuracy without Scaling: {accuracy_unscaled}")


Accuracy with Scaling: 1.0
Accuracy without Scaling: 1.0


Q10  Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced
dataset (only 5% of customers respond), describe the approach you’d take to build a
Logistic Regression model — including data handling, feature scaling, balancing
classes, hyperparameter tuning, and evaluating the model for this real-world business
use case

 | Step             | Why It Matters             |
| ---------------- | -------------------------- |
| Stratified split | Preserves imbalance        |
| Feature scaling  | Required for LR            |
| Class weighting  | Handles 5% responders      |
| Proper metrics   | Accuracy is misleading     |
| Threshold tuning | Aligns with business goals |
| Interpretability | Stakeholder trust          |
