#**Question & Answer**

**Q1. What is Logistic Regression, and how does it differ from Linear Regression ?**
- Logistic Regression and Linear Regression are both supervised learning algorithms used in machine learning, but they are used for different types of problems and have key differences in terms of output, purpose, and function.

1. Purpose

| Aspect       | Logistic Regression                         | Linear Regression                                  |
| ------------ | ------------------------------------------- | -------------------------------------------------- |
| **Used For** | Classification problems (e.g., yes/no, 0/1) | Regression problems (predicting continuous values) |

2. Output

| Aspect               | Logistic Regression                           | Linear Regression                            |
| -------------------- | --------------------------------------------- | -------------------------------------------- |
| **Output Type**      | Probability (between 0 and 1)                 | Continuous numerical value (any real number) |
| **Final Prediction** | Usually converted to 0 or 1 using a threshold | Used directly as a continuous value          |

3. Mathematical Function

Linear Regression Equation:
                  
                  y=β0​+β1​x1​+β2​x2​+…+βn​xn​
	​

This outputs any real number.

Logistic Regression Equation:

                    p= 1/ 1+e−(β0​+β1​x1​+β2​x2​+…+βn​xn​)

This uses the sigmoid function to squeeze output between 0 and 1.

4. Interpretation : Linear Regression tries to model the relationship between inputs and a real-valued output.
Logistic Regression models the probability that an instance belongs to a class.

5. Loss Function : Linear Regression: Uses Mean Squared Error (MSE).
Logistic Regression: Uses Log Loss (also called Cross-Entropy Loss).

6. Example

| Problem                             | Suitable Algorithm  |
| ----------------------------------- | ------------------- |
| Predicting house prices             | Linear Regression   |
| Predicting whether an email is spam | Logistic Regression |
| Estimating a person's weight        | Linear Regression   |
| Classifying if a tumor is malignant | Logistic Regression |



**Q2. Explain the role of the Sigmoid function in Logistic Regression?**
-  The Sigmoid function plays a central role in Logistic Regression — it's what allows the model to convert a linear combination of inputs into a probability value between 0 and 1, which is essential for classification tasks.
The Sigmoid function (also called the logistic function) is defined as:

                                   σ(z)= 1/ 1+e−z

Role of the Sigmoid Function in Logistic Regression
1. Transforms Linear Output into Probability:
- In logistic regression, we compute a linear combination of input features:                                
                         z=β0​+β1​x1​+…+βn​xn​
- This value z can be any real number.
- The sigmoid function maps z into the range (0, 1), which can be - interpreted as a probability.                         

2. Helps Make Binary Decisions:
- After converting the output to a probability, we apply a threshold (commonly 0.5):

- If  σ(z)≥0.5: predict class 1   
- If  σ(z)<0.5: predict class 0

3. Supports Gradient-Based Optimization:

- The sigmoid function is differentiable, which means it works well with optimization algorithms like Gradient Descent, used to minimize the log loss in logistic regression.

4. Graph of the Sigmoid Function

- S-shaped (sigmoid curve)

- Output is close to 0 when z≪0

- Output is close to 1 when z≫0

- Output is 0.5 when z=0

In [None]:
     |
  1  |                              *
     |                         *
  0.5|---------------*-------------------> z
     |        *
  0  | *
     |


**Q3. What is Regularization in Logistic Regression and why is it needed?**
-  Regularization is a technique used to prevent overfitting in machine learning models by penalizing large coefficients (weights) in the model.

 In the context of Logistic Regression, regularization adds a penalty term to the loss function to discourage the model from fitting the training data too closely (which could hurt performance on new, unseen data).

Regularization is needed in logistic regression because:
- Without regularization, a logistic regression model can:
- Learn very large weights for some features.
- Overfit the training data (especially if there are many features or noisy data).
- Overfitting means the model performs well on training data but poorly on unseen/test data.
- Regularization helps the model generalize better.

**Q4. What are some common evaluation metrics for classification models, and why are they important?**
-  Metrics help you understand how well your model performs, beyond just accuracy.
- They guide you in choosing the best model for your application.
- Different problems (e.g., fraud detection, medical diagnosis) require different priorities (e.g., minimizing false negatives vs. false positives).
They are important because:
1. Accuracy :- Proportion of correct predictions out of all predictions.

                Accuracy=TP+TN+FP+FNTP+TN​
- Use Case: When classes are balanced.
- Limitation: Misleading when classes are imbalanced.

2. Precision:- Out of all predicted positives, how many were actually positive
                 Precision= TP/ TP+FP

- Use Case: Important when false positives are costly (e.g., spam filters)
3. Recall (Sensitivity or True Positive Rate):- Out of all actual positives, how many were correctly predicted?

                     Recall=TP/ TP+FN
- Use Case: Important when false negatives are costly (e.g., cancer detection).

4. F1 Score:-Harmonic mean of precision and recall.

           F1 Score=2 x Precision x Recall/ Precision+Recall

Use Case: When you want a balance between precision and recall.

Useful for imbalanced classes.

5. Confusion Matrix
A table showing: True Positives (TP), True Negatives (TN), False Positives (FP), False Negatives (FN), Helps understand types of errors the model makes.

6. ROC Curve & AUC (Area Under the Curve)
- ROC Curve: Plots True Positive Rate vs. False Positive Rate at different thresholds.
- AUC (Area Under Curve): Measures the model’s ability to distinguish between classes.
- AUC = 1: Perfect classifier
- AUC = 0.5: Random guessing
- Use Case: Great for comparing models across different threshold settings.


**Q5. Write a Python program that loads a CSV file into a Pandas DataFrame,
splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.(Use Dataset from sklearn package)**

In [1]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Test Accuracy: {accuracy:.4f}")

Test Accuracy: 0.9561


**Q6. Write a Python program to train a Logistic Regression model using L2
regularization (Ridge) and print the model coeffecients and accuracy.**

In [2]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(penalty='l2', C=1.0, solver='liblinear', max_iter=1000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Model Coefficients:")
for feature, coef in zip(X.columns, model.coef_[0]):
    print(f"{feature}: {coef:.4f}")

print(f"\nTest Accuracy: {accuracy:.4f}")

Model Coefficients:
mean radius: 2.1325
mean texture: 0.1528
mean perimeter: -0.1451
mean area: -0.0008
mean smoothness: -0.1426
mean compactness: -0.4156
mean concavity: -0.6519
mean concave points: -0.3445
mean symmetry: -0.2076
mean fractal dimension: -0.0298
radius error: -0.0500
texture error: 1.4430
perimeter error: -0.3039
area error: -0.0726
smoothness error: -0.0162
compactness error: -0.0019
concavity error: -0.0449
concave points error: -0.0377
symmetry error: -0.0418
fractal dimension error: 0.0056
worst radius: 1.2321
worst texture: -0.4046
worst perimeter: -0.0362
worst area: -0.0271
worst smoothness: -0.2626
worst compactness: -1.2090
worst concavity: -1.6180
worst concave points: -0.6153
worst symmetry: -0.7428
worst fractal dimension: -0.1170

Test Accuracy: 0.9561


**Q7. Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr' and print the classification report.(Use Dataset from sklearn package)**

In [3]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(multi_class='ovr', solver='liblinear', max_iter=200)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Classification Report:\n")
print(classification_report(y_test, y_pred, target_names=data.target_names))

Classification Report:

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30





**Q8. Write a Python program to apply GridSearchCV to tune C and penalty
hyperparameters for Logistic Regression and print the best parameters and validation accuracy**

In [None]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model_no_scaling = LogisticRegression(max_iter=10000)
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_with_scaling = LogisticRegression(max_iter=10000)
model_with_scaling.fit(X_train_scaled, y_train)
y_pred_with_scaling = model_with_scaling.predict(X_test_scaled)
accuracy_with_scaling = accuracy_score(y_test, y_pred_with_scaling)

print(f"Accuracy WITHOUT scaling: {accuracy_no_scaling:.4f}")
print(f"Accuracy WITH scaling   : {accuracy_with_scaling:.4f}")


**Q9. Write a Python program to standardize the features before training Logistic Regression and compare the model's accuracy with and without scaling**

In [5]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model_no_scaling = LogisticRegression(max_iter=10000)
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_with_scaling = LogisticRegression(max_iter=10000)
model_with_scaling.fit(X_train_scaled, y_train)
y_pred_with_scaling = model_with_scaling.predict(X_test_scaled)
accuracy_with_scaling = accuracy_score(y_test, y_pred_with_scaling)

print(f"Accuracy WITHOUT scaling: {accuracy_no_scaling:.4f}")
print(f"Accuracy WITH scaling   : {accuracy_with_scaling:.4f}")


Accuracy WITHOUT scaling: 0.9561
Accuracy WITH scaling   : 0.9737


**Q10. Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced dataset (only 5% of customers respond), describe the approach you’d take to build a Logistic Regression model — including data handling, feature scaling, balancing classes, hyperparameter tuning, and evaluating the model for this real-world business use case.**

In [None]:
model = LogisticRegression(class_weight='balanced')
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, stratify=y, random_state=42)

from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': [0.01, 0.1, 1, 10],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear'],
    'class_weight': ['balanced']
}

grid = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, scoring='f1', cv=5)
grid.fit(X_train, y_train)

best_model = grid.best_estimator_

from sklearn.metrics import classification_report, roc_auc_score

y_pred = best_model.predict(X_test)
y_proba = best_model.predict_proba(X_test)[:, 1]

print(classification_report(y_test, y_pred))
print("ROC-AUC:", roc_auc_score(y_test, y_proba))