                    **Theoretical**


1 What is Logistic Regression, and how does it differ from Linear Regression?
-### **Logistic Regression vs. Linear Regression**

Both **Logistic Regression** and **Linear Regression** are supervised learning algorithms used in statistical modeling and machine learning. However, they are used for different types of tasks and have distinct characteristics.

---

### **1. Logistic Regression**
- **Purpose**: Used for **classification problems** (especially binary classification).
- **Output**: Produces probabilities (values between 0 and 1) which are mapped to classes.
- **Mathematical Model**:
  - Instead of modeling a straight line, it applies the **sigmoid function (logistic function)** to restrict output values between 0 and 1:
    \[
    P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}
    \]
  - The decision boundary is determined based on a threshold (e.g., 0.5 for binary classification).
- **Types**:
  - **Binary Logistic Regression** (two classes)
  - **Multinomial Logistic Regression** (multiple classes)
  - **Ordinal Logistic Regression** (ordered classes)

---

### **2. Linear Regression**
- **Purpose**: Used for **regression problems** (predicting continuous values).
- **Output**: Produces real-valued outputs (e.g., predicting house prices, stock prices).
- **Mathematical Model**:
  - Models a straight-line relationship between input \(X\) and output \(Y\):
    \[
    Y = \beta_0 + \beta_1 X + \epsilon
    \]
  - The coefficients (\(\beta_0, \beta_1\)) are estimated using the **least squares method**.

---

### **3. Key Differences**
| Feature             | Logistic Regression           | Linear Regression           |
|---------------------|-----------------------------|-----------------------------|
| **Type of Problem** | Classification (e.g., Yes/No, Spam/Not Spam) | Regression (e.g., Price Prediction) |
| **Output Values** | Probabilities (0 to 1) | Continuous values (any real number) |
| **Function Used** | Sigmoid function | Linear function |
| **Decision Boundary** | Threshold-based (e.g., \( P > 0.5 \Rightarrow 1, P \leq 0.5 \Rightarrow 0 \)) | Directly predicts values |
| **Error Measurement** | Log-loss (cross-entropy loss) | Mean Squared Error (MSE) |

---

### **4. When to Use Which?**
- Use **Logistic Regression** when you need to classify data into categories (e.g., fraud detection, spam classification).
- Use **Linear Regression** when you need to predict continuous outcomes (e.g., predicting sales revenue, house prices).



2 What is the mathematical equation of Logistic Regression?
-The **mathematical equation of Logistic Regression** is based on the **sigmoid function (logistic function)**, which maps any real-valued number to a value between 0 and 1. The equation is:

### **1. Probability Estimation**
For a given input \( X \), the probability that the output \( Y \) belongs to class 1 (instead of class 0) is given by:

\[
P(Y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n)}}
\]

where:
- \( P(Y = 1 | X) \) is the probability that \( Y \) belongs to class 1.
- \( \beta_0 \) is the intercept (bias).
- \( \beta_1, \beta_2, ..., \beta_n \) are the regression coefficients (weights).
- \( X_1, X_2, ..., X_n \) are the input features.
- \( e \) is Euler’s number (\(\approx 2.718\)).

This is the **sigmoid function**:

\[
\sigma(z) = \frac{1}{1 + e^{-z}}
\]

where \( z = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n \).

---

### **2. Decision Boundary**
- If \( P(Y = 1 | X) \geq 0.5 \), we classify \( Y = 1 \).
- If \( P(Y = 1 | X) < 0.5 \), we classify \( Y = 0 \).

This decision threshold can be adjusted based on the problem.

---

### **3. Log-Odds Form (Logit Function)**
Taking the **log-odds (logit function)** of the probability:

\[
\log \left( \frac{P(Y=1 | X)}{1 - P(Y=1 | X)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n
\]

This shows that Logistic Regression models a **linear relationship between the log-odds of the outcome and the input features**.



3 Why do we use the Sigmoid function in Logistic Regression?
-### **Why Do We Use the Sigmoid Function in Logistic Regression?**

The **Sigmoid function** (also called the **logistic function**) is used in Logistic Regression because it helps map any real-valued number to a probability between **0 and 1**, making it ideal for **classification tasks**.

---

### **1. Definition of Sigmoid Function**
The Sigmoid function is mathematically defined as:

\[
\sigma(z) = \frac{1}{1 + e^{-z}}
\]

where:
- \( z \) is the linear combination of input features and weights:
  
  \[
  z = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n
  \]

- \( e \) is Euler’s number (\(\approx 2.718\)).

---

### **2. Key Reasons for Using Sigmoid in Logistic Regression**
#### **(a) Converts Any Input into a Probability (0 to 1)**
- The output of the Sigmoid function is always in the range \( (0,1) \), making it suitable for **probability estimation**.
- A value closer to 1 means **higher confidence** in class **1**, while a value closer to 0 means **higher confidence** in class **0**.

#### **(b) Introduces Non-linearity**
- Logistic Regression models a linear relationship between input features and the **log-odds** of the outcome.
- Applying the **sigmoid function** ensures that the final output is **non-linear**, which helps in classification.

#### **(c) Helps in Decision Making**
- We can set a decision threshold (e.g., **0.5**).
  - If \( \sigma(z) \geq 0.5 \), classify as **1**.
  - If \( \sigma(z) < 0.5 \), classify as **0**.
- This makes it easy to separate two classes.

#### **(d) Differentiable and Enables Gradient Descent Optimization**
- The Sigmoid function is **smooth and differentiable**, which is essential for **gradient descent** to update the model parameters efficiently.
- Its derivative is:

  \[
  \sigma'(z) = \sigma(z) (1 - \sigma(z))
  \]

  This property makes it computationally efficient for training the model.

---

### **3. Visualization of the Sigmoid Function**
If we plot \( \sigma(z) \), we get an S-shaped curve:

\[
\begin{array}{c|c}
z & \sigma(z) \\
\hline
- \infty & 0 \\
0 & 0.5 \\
+ \infty & 1 \\
\end{array}
\]

- For large **negative** values of \( z \), \( \sigma(z) \approx 0 \) (predicts class **0**).
- For large **positive** values of \( z \), \( \sigma(z) \approx 1 \) (predicts class **1**).
- At \( z = 0 \), \( \sigma(z) = 0.5 \), meaning equal probability for both classes.

---

### **Conclusion**
The Sigmoid function is used in Logistic Regression because:
1. It **squashes outputs** into a probability range (0 to 1).
2. It **introduces non-linearity**, making it suitable for classification.
3. It provides a **natural decision boundary** for classification.
4. It allows for **gradient-based optimization**.



4 What is the cost function of Logistic Regression?
-### **Cost Function of Logistic Regression**
The cost function in Logistic Regression measures how well the model's predictions match the actual labels. Instead of using the **Mean Squared Error (MSE)** (as in Linear Regression), we use the **Log Loss (Cross-Entropy Loss)** because Logistic Regression deals with probabilities and classification.

---

### **1. Why Not Use Mean Squared Error (MSE)?**
- If we use **MSE**, the optimization function becomes **non-convex** for Logistic Regression, making it difficult for gradient descent to find the global minimum.
- The Sigmoid function is **non-linear**, so MSE doesn't work well for classification.

Instead, we use a loss function derived from **Maximum Likelihood Estimation (MLE)**, which leads to the **Log Loss (Cross-Entropy Loss)**.

---

### **2. Log Loss (Cross-Entropy Loss)**
For a binary classification problem, where:
- \( y = 1 \) represents the positive class.
- \( y = 0 \) represents the negative class.
- \( \hat{y} \) (or \( P(Y=1 | X) \)) is the predicted probability from the sigmoid function.

The **cost function for a single training example** is:

\[
J(\beta) = - \left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right]
\]

- If **\( y = 1 \)** (positive class):  
  \[
  J(\beta) = - \log(\hat{y})
  \]
  - If \( \hat{y} \) is close to **1** → **loss is small**.
  - If \( \hat{y} \) is close to **0** → **loss is large**.

- If **\( y = 0 \)** (negative class):  
  \[
  J(\beta) = - \log(1 - \hat{y})
  \]
  - If \( \hat{y} \) is close to **0** → **loss is small**.
  - If \( \hat{y} \) is close to **1** → **loss is large**.

---

### **3. Cost Function for the Entire Dataset**
For **\( m \) training examples**, the total cost function is:

\[
J(\beta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)}) \right]
\]

where:
- \( m \) = number of training samples.
- \( y^{(i)} \) = actual label of the \( i^{th} \) example.
- \( \hat{y}^{(i)} \) = predicted probability for the \( i^{th} \) example.

---

### **4. Properties of Log Loss**
- **Convex Function**: Ensures a global minimum, making it suitable for optimization using **Gradient Descent**.
- **Punishes Incorrect Predictions Heavily**: If the model is very confident but wrong, the loss is large.
- **Encourages Probabilistic Confidence**: The model learns to output high probabilities for correct predictions.

---

### **5. Optimization Using Gradient Descent**
To minimize the cost function, we use **Gradient Descent**, updating the weights \( \beta \) using:

\[
\beta_j := \beta_j - \alpha \frac{\partial J}{\partial \beta_j}
\]

where:
- \( \alpha \) = learning rate.
- \( \frac{\partial J}{\partial \beta_j} \) = gradient of the cost function.

---

### **Conclusion**
- The **Log Loss (Cross-Entropy Loss)** is the best cost function for Logistic Regression.
- It ensures that probabilities are well-calibrated and encourages the model to be confident in correct predictions.
- We use **Gradient Descent** to minimize it efficiently.



5  What is Regularization in Logistic Regression? Why is it needed?
-### **Regularization in Logistic Regression**
Regularization is a technique used in Logistic Regression to **prevent overfitting** by adding a penalty term to the cost function. This helps control the complexity of the model and improves its generalization to unseen data.

---

### **1. Why is Regularization Needed?**
Without regularization, Logistic Regression can **overfit** the training data, especially when:
- There are **too many features** (high-dimensional data).
- The dataset is **small** relative to the number of features.
- The model becomes too complex by assigning **large weights** to certain features.

Regularization **shrinks the weights**, preventing them from becoming too large and making the model more robust.

---

### **2. Types of Regularization**
Regularization is applied by adding a **penalty term** to the **cost function** of Logistic Regression.

#### **(a) L1 Regularization (Lasso Regression)**
- Uses the **absolute values** of the coefficients.
- The modified cost function:

  \[
  J(\beta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)}) \right] + \lambda \sum_{j=1}^{n} |\beta_j|
  \]

- **Effect**: Drives some coefficients to **exactly zero**, effectively selecting important features (**feature selection**).
- Useful when we expect **only a few features** to be important.

---

#### **(b) L2 Regularization (Ridge Regression)**
- Uses the **squared values** of the coefficients.
- The modified cost function:

  \[
  J(\beta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)}) \right] + \lambda \sum_{j=1}^{n} \beta_j^2
  \]

- **Effect**: Shrinks all coefficients but **does not set them to zero**.
- Helps in handling **multicollinearity** (high correlation between features).

---

#### **(c) Elastic Net (Combination of L1 and L2)**
- A mix of **L1 and L2 regularization**:

  \[
  J(\beta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)}) \right] + \lambda_1 \sum_{j=1}^{n} |\beta_j| + \lambda_2 \sum_{j=1}^{n} \beta_j^2
  \]

- **Effect**: Selects important features (like L1) while still keeping some small weights (like L2).
- Useful when there are **many correlated features**.

---

### **3. How to Choose the Regularization Parameter (\( \lambda \))?**
- \( \lambda \) controls the amount of regularization:
  - **High \( \lambda \)** → More penalty → **Simpler model (underfitting)**.
  - **Low \( \lambda \)** → Less penalty → **More complex model (overfitting)**.
- The optimal \( \lambda \) is chosen using **cross-validation**.

---

### **4. Conclusion**
- Regularization is crucial in **high-dimensional datasets** to prevent overfitting.
- **L1 Regularization (Lasso)** → Feature selection (some coefficients become zero).
- **L2 Regularization (Ridge)** → Shrinks coefficients but keeps them all.
- **Elastic Net** → Combines both L1 and L2.


6  Explain the difference between Lasso, Ridge, and Elastic Net regression
-### **Difference Between Lasso, Ridge, and Elastic Net Regression**
Lasso, Ridge, and Elastic Net are types of **regularization techniques** used in regression models (including **Logistic Regression** and **Linear Regression**) to prevent **overfitting** by adding a penalty to the model’s coefficients.

---

## **1. Ridge Regression (L2 Regularization)**
- **Penalty Term: Sum of squared coefficients**
  
  \[
  J(\beta) = \text{Loss} + \lambda \sum_{j=1}^{n} \beta_j^2
  \]
  
- **Effect on Coefficients**:
  - Shrinks all coefficients towards zero **but does not make them exactly zero**.
  - Retains all features but reduces their impact.

- **Best For**:
  - **Multicollinear Data** (when features are highly correlated).
  - Situations where **all features contribute to prediction**.

---

## **2. Lasso Regression (L1 Regularization)**
- **Penalty Term: Sum of absolute values of coefficients**
  
  \[
  J(\beta) = \text{Loss} + \lambda \sum_{j=1}^{n} |\beta_j|
  \]

- **Effect on Coefficients**:
  - Some coefficients are **shrunk to exactly zero**, effectively removing them.
  - Performs **feature selection** by eliminating less important features.

- **Best For**:
  - When we suspect **only a few features are important**.
  - High-dimensional datasets with **many irrelevant features**.

---

## **3. Elastic Net Regression (Combination of L1 & L2)**
- **Penalty Term: Combination of L1 and L2**
  
  \[
  J(\beta) = \text{Loss} + \lambda_1 \sum_{j=1}^{n} |\beta_j| + \lambda_2 \sum_{j=1}^{n} \beta_j^2
  \]

- **Effect on Coefficients**:
  - Selects important features like **Lasso (L1)**.
  - Shrinks coefficients like **Ridge (L2)** but avoids some of Ridge’s issues with correlated features.

- **Best For**:
  - When there are **many correlated features**.
  - Situations where we need **feature selection but don’t want too many zeros**.

---

## **Comparison Table**

| Feature           | Ridge (L2) | Lasso (L1) | Elastic Net (L1 + L2) |
|------------------|-----------|------------|------------------|
| **Regularization Type** | L2 (squared sum of coefficients) | L1 (absolute sum of coefficients) | Combination of L1 and L2 |
| **Effect on Coefficients** | Shrinks coefficients but keeps all | Shrinks some coefficients to **zero** (feature selection) | Shrinks some coefficients to zero while keeping others small |
| **Feature Selection?** | **No** (keeps all features) | **Yes** (eliminates some features) | **Yes** (controlled feature selection) |
| **Best When** | Features are correlated & all are important | Some features are irrelevant | Features are correlated & some need to be eliminated |
| **Computational Complexity** | Low | High (solving L1 norm is harder) | Moderate |

---

## **Which One to Use?**
- **Use Ridge (L2)** if all features are relevant but need to be controlled (e.g., preventing multicollinearity).
- **Use Lasso (L1)** if you want to automatically select important features.
- **Use Elastic Net** if there are **many correlated features** and you want a balance between feature selection & shrinkage.


7 When should we use Elastic Net instead of Lasso or Ridge?
-### **When to Use Elastic Net Instead of Lasso or Ridge?**  

Elastic Net is a combination of **Lasso (L1)** and **Ridge (L2)** regularization. It is particularly useful in scenarios where neither Lasso nor Ridge alone performs optimally.

---

### **1. When Features Are Highly Correlated (Multicollinearity)**
- **Lasso (L1) struggles** when features are highly correlated because it randomly selects one feature and eliminates the others.
- **Ridge (L2) keeps all features**, reducing the impact of multicollinearity but not performing feature selection.
- **Elastic Net solves this by grouping correlated features together** rather than arbitrarily selecting one.

💡 *Example:* If multiple features in a dataset represent similar information (e.g., different but related economic indicators), Elastic Net ensures they are either all included with reduced impact or all excluded.

---

### **2. When You Need Feature Selection but Also Want to Keep Some Features**
- **Lasso may remove too many features**, leading to underfitting.
- **Ridge keeps all features**, making interpretation harder.
- **Elastic Net provides a balance**, selecting important features while allowing small coefficients for others.

💡 *Example:* In gene selection problems, some genes may be completely irrelevant (need to be removed), but others might have **small but meaningful effects**. Elastic Net prevents Lasso from discarding too many relevant genes.

---

### **3. When Lasso Struggles with High-Dimensional Data (More Features Than Samples)**
- **Lasso selects only a few features**, and if there are more features than samples (\( p > n \)), it may be unstable.
- **Elastic Net stabilizes feature selection by combining L1 and L2 penalties**.

💡 *Example:* If you have thousands of predictors (e.g., text classification with word features), Elastic Net is **more stable** than Lasso.

---

### **4. When Ridge Doesn't Provide Enough Sparsity**
- **Ridge reduces coefficient sizes but does not force any to zero**.
- **Elastic Net encourages sparsity**, meaning **some features are eliminated while others are retained**.

💡 *Example:* If you have **irrelevant features** in your dataset, Ridge will still give them small weights, but Elastic Net will remove them.

---

### **5. When You Want a Tunable Balance Between L1 and L2**
- Elastic Net has **two hyperparameters**:
  - \( \lambda_1 \) (L1 penalty) controls feature selection.
  - \( \lambda_2 \) (L2 penalty) controls weight shrinkage.
- This allows fine-tuning **depending on how sparse you want your model to be**.

💡 *Example:* If you want **more feature selection**, increase \( \lambda_1 \). If you want **less aggressive shrinkage**, increase \( \lambda_2 \).

---

## **Summary Table**

| Scenario | Ridge (L2) | Lasso (L1) | Elastic Net (L1 + L2) |
|----------|------------|------------|----------------|
| **Multicollinearity (Correlated Features)** | Keeps all features but shrinks them | Eliminates some correlated features arbitrarily | **Keeps correlated features together** |
| **Feature Selection Needed** | No | Yes (many coefficients become 0) | **Yes (but more stable than Lasso)** |
| **High-Dimensional Data (\( p > n \))** | No | May select too few features | **More stable than Lasso** |
| **Sparse Model (Few Features Used)** | No | Yes | **Yes (but with better control)** |
| **Completely Irrelevant Features** | Small weights assigned | Removed | **Removed efficiently** |

---

### **When to Use Elastic Net?**
✅ **Use Elastic Net instead of Lasso or Ridge when**:
- You have **many correlated features**.
- Lasso is **too aggressive** in eliminating features.
- Ridge keeps **too many unnecessary features**.
- Your dataset is **high-dimensional** with **more features than samples**.
- You want **better stability** in feature selection.


8  What is the impact of the regularization parameter (λ) in Logistic Regression?
-### **Impact of the Regularization Parameter (λ) in Logistic Regression**  

In **Logistic Regression**, the regularization parameter **\( \lambda \)** controls the **trade-off between model complexity and performance**. It determines how much penalty is applied to the model's coefficients (weights), preventing overfitting or underfitting.

---

### **1. What Does \( \lambda \) Do?**
- **\( \lambda \) is a hyperparameter** that **controls the strength of regularization**.
- It affects how much the model penalizes large weights.

Mathematically, the **regularized cost function** in Logistic Regression is:

\[
J(\beta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(\hat{y}^{(i)}) + (1 - y^{(i)}) \log(1 - \hat{y}^{(i)}) \right] + \lambda \sum_{j=1}^{n} R(\beta_j)
\]

where:
- **\( R(\beta_j) \)** is the regularization term:
  - **L1 (Lasso)**: \( \sum_{j=1}^{n} |\beta_j| \) (absolute values of weights).
  - **L2 (Ridge)**: \( \sum_{j=1}^{n} \beta_j^2 \) (squared weights).
- **\( \lambda \) controls the importance of regularization**.

---

### **2. Effects of \( \lambda \)**
| **\( \lambda \) Value** | **Effect on Model** | **Overfitting vs Underfitting** | **Impact on Weights** |
|----------------|------------------|----------------------|----------------|
| **\( \lambda = 0 \)** | No regularization | High risk of **overfitting** | Large coefficients |
| **Small \( \lambda \)** | Slight penalty on large weights | Some risk of overfitting | Weights slightly reduced |
| **Optimal \( \lambda \)** | Best balance between bias & variance | Best generalization | Moderate-sized weights |
| **Large \( \lambda \)** | Strong regularization | High risk of **underfitting** | Shrinks weights close to **zero** |
| **Very large \( \lambda \)** | Extreme penalty, model too simple | Severe **underfitting** | Coefficients **near zero** |

---

### **3. Choosing the Right \( \lambda \)**
- **Too small \( \lambda \) (or 0)** → The model memorizes training data (**overfits**).
- **Too large \( \lambda \)** → The model is too simple and ignores useful features (**underfits**).
- **Best \( \lambda \)** → Found using **cross-validation** (e.g., Grid Search or Random Search).

---

### **4. Visualizing the Impact of \( \lambda \)**
- **\( \lambda = 0 \)** → Model fits perfectly to training data but generalizes poorly.
- **\( \lambda \) is too high** → Model loses flexibility and predicts poorly.

📉 A graph of \( \lambda \) vs. model accuracy typically looks **like an inverted U**:  
- Small \( \lambda \) → **Overfitting**.
- Large \( \lambda \) → **Underfitting**.
- **Optimal \( \lambda \) lies in between**.

---

### **5. Summary**
- **\( \lambda \) controls regularization strength** in Logistic Regression.
- **Small \( \lambda \)** → Risk of **overfitting** (high variance).
- **Large \( \lambda \)** → Risk of **underfitting** (high bias).
- **Tuned using cross-validation** to find the best balance.


9 What are the key assumptions of Logistic Regression?
-### **Key Assumptions of Logistic Regression**  

Logistic Regression is a widely used classification algorithm, but for it to work effectively, certain assumptions need to hold. Unlike **Linear Regression**, it does **not** require a strict assumption of linearity between the independent and dependent variables, but it does have other important considerations.

---

### **1. The Dependent Variable is Binary (for Binary Logistic Regression)**
- Logistic Regression assumes that the **target variable (Y) is binary** (0 or 1).
- If dealing with **multiclass classification**, **Multinomial Logistic Regression** or **One-vs-All (OvR) strategy** is needed.

💡 *Example:* Predicting whether an email is **spam (1) or not spam (0)**.

---

### **2. Independence of Observations**
- The observations should be **independent** of each other.
- Logistic Regression does **not work well with correlated observations**, such as time-series data unless adjustments are made (e.g., using autoregressive models).

💡 *Example:* If predicting whether a patient has a disease, **each patient’s data should be independent** (not repeated measurements from the same individual).

---

### **3. Linearity of Log-Odds (Logit Transformation)**
- While Logistic Regression **does not assume a linear relationship between X and Y**, it assumes that the **log-odds (logit function) of the dependent variable is linearly related to the independent variables**.

Mathematically:
\[
\log \left(\frac{P(Y=1)}{1 - P(Y=1)}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n
\]

💡 *Solution:* If this assumption is violated, we can:
  - Use **polynomial terms or interactions**.
  - Apply **nonlinear transformations** (e.g., log, square root).
  - Use **more flexible models** like Decision Trees or Neural Networks.

---

### **4. No Perfect Multicollinearity**
- Features should **not be highly correlated** with each other.
- **Multicollinearity** makes it difficult to estimate individual feature coefficients.

💡 *Detection & Solution:*  
- **Check correlation matrix** or use **Variance Inflation Factor (VIF)**.
- **Drop one of the correlated features** or use **Principal Component Analysis (PCA)**.

---

### **5. Large Sample Size & Sufficient Data**
- Logistic Regression performs better with **a large number of observations**.
- Small sample sizes can lead to **unstable estimates**.

💡 *Rule of Thumb:* Each independent variable should have **at least 10 observations per class**.

---

### **6. No Strong Outliers**
- Logistic Regression is **sensitive to outliers**, which can significantly affect the decision boundary.

💡 *Solution:*  
- **Detect outliers** using boxplots or Z-scores.  
- **Transform or remove outliers** if necessary.

---

### **7. Independence of Errors (No Autocorrelation)**
- Residuals (errors) should be **independent**.
- If dealing with **time-series data**, errors may be correlated.

💡 *Solution:*  
- Use **Autoregressive models** or **Generalized Estimating Equations (GEE)**.

---

### **Summary Table**

| **Assumption** | **Explanation** | **Solution if Violated** |
|--------------|---------------|----------------------|
| **Binary Dependent Variable** | Y must be 0 or 1 (for binary classification) | Use **multinomial logistic regression** for multi-class problems |
| **Independence of Observations** | Each observation should be independent | Use **random sampling**, avoid duplicate records |
| **Linearity of Log-Odds** | Log-odds of Y should have a linear relationship with X | Apply **polynomial features** or **transformations** |
| **No Perfect Multicollinearity** | Features should not be highly correlated | Drop one feature, use **VIF**, or apply **PCA** |
| **Large Sample Size** | Needs sufficient data for stable coefficients | Ensure **at least 10 samples per feature per class** |
| **No Strong Outliers** | Outliers can distort the model | Remove, transform, or cap outliers |
| **Independence of Errors** | No autocorrelation in errors | Use **time-series models** if data is sequential |

---

### **Conclusion**
- Logistic Regression is **flexible**, but these assumptions help ensure reliable predictions.
- If assumptions are **violated**, alternative techniques like **Decision Trees, Random Forest, or Neural Networks** may be better.


10 What are some alternatives to Logistic Regression for classification tasks?
-### **Alternatives to Logistic Regression for Classification Tasks**  

Logistic Regression is a great baseline model for classification, but **it has limitations**, especially when dealing with **nonlinear relationships, high-dimensional data, or imbalanced datasets**. Here are some **alternative classification models**:

---

## **1. Decision Trees** 🌳  
✅ **Best When**:  
- The relationship between features and the target is **nonlinear**.  
- Interpretability is important.  

🔹 **Pros**:  
✔ Works well with **nonlinear data**.  
✔ Easy to **interpret and visualize**.  
✔ Handles **categorical and numerical features**.  

🔹 **Cons**:  
✘ **Prone to overfitting** (unless pruned).  
✘ **Sensitive to noisy data**.  

📌 *Example:* Used in medical diagnosis to decide if a patient has a disease based on symptoms.  

---

## **2. Random Forest** 🌲🌲  
✅ **Best When**:  
- You need a **robust model** that reduces overfitting.  
- Your dataset is **high-dimensional** with complex relationships.  

🔹 **Pros**:  
✔ **Handles missing data well**.  
✔ Reduces overfitting (ensemble of trees).  
✔ Works with **both classification & regression** tasks.  

🔹 **Cons**:  
✘ **Slower for large datasets** compared to Logistic Regression.  
✘ Less **interpretable than a single Decision Tree**.  

📌 *Example:* Used for **credit card fraud detection**.  

---

## **3. Support Vector Machine (SVM)** 🎯  
✅ **Best When**:  
- The data is **not linearly separable**.  
- You need a **powerful classifier** for small datasets.  

🔹 **Pros**:  
✔ Works well in **high-dimensional spaces**.  
✔ Handles **nonlinear classification** using **kernel tricks**.  

🔹 **Cons**:  
✘ **Computationally expensive** for large datasets.  
✘ **Hard to interpret** compared to Logistic Regression.  

📌 *Example:* Used in **image recognition** (face detection).  

---

## **4. K-Nearest Neighbors (KNN)** 👬  
✅ **Best When**:  
- You have a **small dataset**.  
- The decision boundary is **complex but smooth**.  

🔹 **Pros**:  
✔ **Simple and intuitive**.  
✔ **No training phase** (lazy learning).  

🔹 **Cons**:  
✘ **Computationally expensive** for large datasets.  
✘ **Sensitive to irrelevant features & outliers**.  

📌 *Example:* Used in **recommendation systems** (e.g., suggesting similar movies).  

---

## **5. Naïve Bayes** 🎲  
✅ **Best When**:  
- The features are **independent** or nearly independent.  
- You need a **fast model for text classification**.  

🔹 **Pros**:  
✔ **Works well with small datasets**.  
✔ Great for **text classification (spam detection, sentiment analysis)**.  

🔹 **Cons**:  
✘ **Assumes feature independence** (not always realistic).  
✘ **Not suitable for highly correlated features**.  

📌 *Example:* Used in **email spam filtering**.  

---

## **6. Gradient Boosting (XGBoost, LightGBM, CatBoost)** 🚀  
✅ **Best When**:  
- You need a **high-performance model**.  
- The dataset has **complex feature interactions**.  

🔹 **Pros**:  
✔ **Handles missing values well**.  
✔ **Great for structured/tabular data**.  
✔ **State-of-the-art accuracy in many Kaggle competitions**.  

🔹 **Cons**:  
✘ **Takes longer to train** than simpler models.  
✘ **Hard to interpret** compared to Logistic Regression.  

📌 *Example:* Used in **predicting customer churn in telecom companies**.  

---

## **7. Artificial Neural Networks (ANNs) 🧠**  
✅ **Best When**:  
- You have a **large dataset** with complex relationships.  
- Deep learning is required for tasks like **speech and image recognition**.  

🔹 **Pros**:  
✔ **Handles extremely complex data patterns**.  
✔ **Scales well to large datasets**.  

🔹 **Cons**:  
✘ **Requires large training data**.  
✘ **Difficult to interpret**.  

📌 *Example:* Used in **self-driving car perception systems**.  

---

### **Comparison Table**

| **Algorithm**  | **Best For** | **Pros** | **Cons** |
|--------------|------------|---------|---------|
| **Logistic Regression** | Simple, linear classification | Fast, interpretable | Struggles with non-linearity |
| **Decision Tree** | Nonlinear problems, easy interpretation | Simple, interpretable | Overfits without pruning |
| **Random Forest** | Robust classification, reducing overfitting | Handles missing data | Slower for large datasets |
| **SVM** | High-dimensional data, non-linearity | Effective for small data | Slow for large datasets |
| **KNN** | Simple, memory-based classification | No training required | Slow for large datasets |
| **Naïve Bayes** | Text classification (spam, sentiment) | Fast, works with small data | Assumes feature independence |
| **Gradient Boosting (XGBoost, LightGBM, etc.)** | Complex patterns in structured data | High accuracy | Slow training |
| **Neural Networks** | Deep learning, image/speech recognition | Handles complex data | Needs large dataset, hard to interpret |

---

### **Which One Should You Choose?**
- **Use Logistic Regression** if you need a **simple, interpretable, and fast model**.
- **Use Decision Trees or Random Forest** if your data is **nonlinear**.
- **Use SVM** if you have **complex, small-sized datasets**.
- **Use Naïve Bayes** for **text classification**.
- **Use XGBoost or LightGBM** for **structured data & high accuracy**.
- **Use Neural Networks** if you have **large datasets and deep learning requirements**.



11 What are Classification Evaluation Metrics?
-When evaluating a classification model, we use different performance metrics to measure how well the model predicts class labels. The choice of metric depends on the dataset, the problem type (balanced vs. imbalanced classes), and the cost of misclassification.





12 How does class imbalance affect Logistic Regression?
-Class imbalance occurs when one class in a dataset significantly outnumbers the other(s), such as fraud detection (99% non-fraud, 1% fraud) or medical diagnosis (95% healthy, 5% sick). When using Logistic Regression, class imbalance can lead to biased predictions, affecting performance and misleading evaluation metrics



13 What is Hyperparameter Tuning in Logistic Regression?
-Hyperparameter tuning is the process of optimizing the hyperparameters of a model to achieve the best performance. Unlike model parameters (e.g., weights & biases in Logistic Regression), hyperparameters are set before training and cannot be learned from data



14 What are different solvers in Logistic Regression? Which one should be used?
-### **Solvers in Logistic Regression & How to Choose the Right One**  

In **Logistic Regression**, the solver is the **optimization algorithm** that minimizes the cost function (Log Loss). The choice of solver affects **speed, accuracy, and compatibility** with different regularization methods.

---

## **🔹 List of Logistic Regression Solvers in `sklearn`**
| **Solver**    | **Supports L1?** | **Supports L2?** | **Supports ElasticNet?** | **Best For** | **Gradient-Based?** |
|--------------|----------------|----------------|-------------------|------------|----------------|
| **liblinear** | ✅ Yes | ✅ Yes | ❌ No  | Small datasets | No (Coordinate Descent) |
| **lbfgs** | ❌ No | ✅ Yes | ❌ No  | Large datasets, multiclass | Yes |
| **sag** | ❌ No | ✅ Yes | ❌ No  | Large datasets, online learning | Yes |
| **saga** | ✅ Yes | ✅ Yes | ✅ Yes | Large datasets, supports all penalties | Yes |
| **newton-cg** | ❌ No | ✅ Yes | ❌ No  | Large datasets, multiclass | Yes |

---

## **🔹 When to Use Each Solver?**
### **1️⃣ `liblinear` (Best for Small Datasets)**
- Uses **Coordinate Descent** instead of gradients.
- Supports both **L1 (Lasso) and L2 (Ridge)** regularization.
- **Slower** for large datasets.

✅ **Use When:**  
- **Dataset is small (few samples & features).**  
- **Sparse data (lots of zeros).**  
- **Need L1 (Lasso) for feature selection.**

---

### **2️⃣ `lbfgs` (Best for Large Datasets & Multiclass)**
- Uses **BFGS Approximation** (efficient for gradient-based optimization).
- Supports **L2 regularization** only.
- Can handle **multiclass problems (Softmax Regression)**.

✅ **Use When:**  
- **Dataset is large.**  
- **Need multi-class classification (`multi_class='multinomial'`).**  
- **Don't need L1 (Lasso) or ElasticNet.**

---

### **3️⃣ `sag` (Best for Large Datasets with L2)**
- Uses **Stochastic Average Gradient Descent (SAG)**.
- Supports **only L2 regularization**.
- Works well with **very large datasets** but needs **scaled features**.

✅ **Use When:**  
- **Dataset is very large.**  
- **Only need L2 regularization.**  
- **Need online learning (stochastic updates).**  

⚠ **Don't Use If:**  
- Dataset is **small** (overhead is high).  
- Need **L1 or ElasticNet**.

---

### **4️⃣ `saga` (Best for Large Datasets & All Regularization Types)**
- **Supports L1, L2, and ElasticNet** regularization.
- Works for **large datasets** with stochastic updates.
- Suitable for **both binary & multiclass classification**.

✅ **Use When:**  
- **Dataset is large & high-dimensional.**  
- **Need L1, L2, or ElasticNet.**  
- **Sparse data (many zeros).**  

🚀 **Best choice for general use with big data**.

---

### **5️⃣ `newton-cg` (Best for Large Datasets & Multiclass)**
- Uses **Newton’s Conjugate Gradient** method (2nd-order optimization).
- **More memory-intensive** but faster convergence than gradient-based solvers.
- Supports **only L2 regularization**.

✅ **Use When:**  
- **Dataset is large & complex.**  
- **Need multiclass classification (`multi_class='multinomial'`).**  
- **Prefer second-order optimization over gradient descent.**  

⚠ **Don't Use If:**  
- Need **L1 or ElasticNet**.  
- Memory is limited (requires more memory than `lbfgs`).  

---

## **🔹 Summary: Which Solver Should You Use?**
| **Scenario** | **Best Solver** |
|------------|--------------|
| **Small dataset** | `liblinear` |
| **Large dataset** | `lbfgs`, `sag`, or `saga` |
| **L1 Regularization (Feature Selection)** | `liblinear`, `saga` |
| **L2 Regularization** | `lbfgs`, `sag`, `saga`, `newton-cg`, `liblinear` |
| **ElasticNet Regularization** | `saga` |
| **Multiclass Classification** | `lbfgs`, `newton-cg`, `saga` |
| **Sparse Data (Many Zeros)** | `liblinear`, `saga` |

---

### **🛠 Example: Choosing the Right Solver in Python**
```python
from sklearn.linear_model import LogisticRegression

# Example: Large dataset with L2 regularization
model = LogisticRegression(solver='lbfgs', penalty='l2', max_iter=1000)
model.fit(X_train, y_train)
```




15 How is Logistic Regression extended for multiclass classification?
-### **Extending Logistic Regression for Multiclass Classification**  

Logistic Regression is inherently a **binary classifier**, but it can be extended to handle **multiclass classification** using two main approaches:  

1️⃣ **One-vs-Rest (OvR) / One-vs-All (OvA)**  
2️⃣ **Multinomial Logistic Regression (Softmax Regression)**  

---

## **1️⃣ One-vs-Rest (OvR) / One-vs-All (OvA)**
- The **OvR approach** trains **multiple binary Logistic Regression models**.
- For **K classes**, it trains **K separate models**, each classifying **one class vs. the rest**.
- The final prediction is based on the class with the **highest probability**.

🔹 **Example (3 classes: A, B, C)**  
- **Model 1:** Classify **A vs. (B, C)**  
- **Model 2:** Classify **B vs. (A, C)**  
- **Model 3:** Classify **C vs. (A, B)**  
- For a new input, all models predict probabilities, and the class with the **highest probability is chosen**.

✅ **Pros**:  
- Simple, works well with any binary classifier.  
- Computationally efficient for **many classes**.  

❌ **Cons**:  
- Can be **inconsistent** (two models may assign similar probabilities to different classes).  
- Requires **K separate models**, increasing training time.  

📌 **How to Use in `sklearn`**:
```python
from sklearn.linear_model import LogisticRegression

# Train Logistic Regression using One-vs-Rest
model = LogisticRegression(multi_class='ovr', solver='lbfgs')  # Default setting
model.fit(X_train, y_train)
```

---

## **2️⃣ Multinomial Logistic Regression (Softmax Regression)**
- Instead of training multiple models, **Softmax Regression** directly estimates probabilities for all classes **simultaneously**.
- Uses the **Softmax function** to assign a probability to each class:
  
  \[
  P(y = k | X) = \frac{e^{(\theta_k \cdot X)}}{\sum_{j=1}^{K} e^{(\theta_j \cdot X)}}
  \]

  - **Numerator:** Computes the exponentiated linear score for class \( k \).  
  - **Denominator:** Normalizes by summing exponentiated scores across all classes.  
  - The class with the **highest probability** is the prediction.

✅ **Pros**:  
- More **consistent** and accurate than OvR.  
- Works better with **balanced classes**.  

❌ **Cons**:  
- Computationally **expensive for large datasets**.  
- May be **unstable** if classes overlap significantly.  

📌 **How to Use in `sklearn`**:
```python
from sklearn.linear_model import LogisticRegression

# Train Logistic Regression using Multinomial (Softmax)
model = LogisticRegression(multi_class='multinomial', solver='lbfgs')  
model.fit(X_train, y_train)
```

---

## **🆚 OvR vs. Multinomial: Which One to Use?**
| **Criteria** | **OvR (One-vs-Rest)** | **Multinomial (Softmax)** |
|-------------|-----------------|-----------------|
| **Number of Models** | K separate models | Single model |
| **Computational Cost** | Lower | Higher |
| **Prediction Consistency** | Can be inconsistent | More consistent |
| **Best for Small Data?** | ✅ Yes | ❌ No |
| **Best for Large Data?** | ❌ No | ✅ Yes |
| **Sparse Data?** | ✅ Yes | ❌ No |
| **Supports all Solvers?** | ✅ Yes | ❌ No (Only `lbfgs`, `newton-cg`, `saga`) |

📌 **Recommendation**:  
- **Use OvR** for **small datasets** or when **training speed matters**.  
- **Use Multinomial (Softmax)** for **large datasets** when accuracy is the priority.  



16 What are the advantages and disadvantages of Logistic Regression?
-### **Advantages and Disadvantages of Logistic Regression**  

Logistic Regression is a simple and effective classification algorithm, but it also has some limitations. Let’s break it down:  

---

## **✅ Advantages of Logistic Regression**  

### **1️⃣ Simplicity & Interpretability**
- **Easy to understand and interpret** compared to complex models like neural networks.  
- The **coefficients** can be analyzed to understand **feature importance**.  

### **2️⃣ Works Well for Linearly Separable Data**
- If the classes can be separated by a straight line (in 2D) or a hyperplane (in higher dimensions), **Logistic Regression performs well**.  

### **3️⃣ Probabilistic Predictions**
- Outputs **probabilities** instead of just class labels, making it useful for **risk assessment and decision-making**.  
- Example: Predicting the **probability of a customer churning** instead of just saying "yes" or "no".  

### **4️⃣ Computationally Efficient**
- **Faster to train** than complex models like Random Forests, SVMs, or Neural Networks.  
- Works well on **small to medium-sized datasets**.  

### **5️⃣ Handles Multicollinearity with Regularization**
- Adding **L1 (Lasso) or L2 (Ridge) regularization** can reduce the impact of highly correlated features.  
- Helps prevent **overfitting**.  

### **6️⃣ Works for Multiclass Classification**
- Can be extended to **multiclass problems** using **One-vs-Rest (OvR)** or **Softmax Regression (Multinomial Logistic Regression)**.  

### **7️⃣ Feature Selection with L1 Regularization**
- Lasso (`L1`) regression can shrink some feature coefficients **to zero**, effectively selecting only the most important features.  

---

## **❌ Disadvantages of Logistic Regression**  

### **1️⃣ Assumes Linear Decision Boundary**
- Logistic Regression assumes that **features have a linear relationship** with the log-odds of the target variable.  
- **Doesn’t work well for complex, non-linear relationships** unless feature transformations are applied (e.g., polynomial features).  
- 🚀 **Solution**: Use **Polynomial Logistic Regression** or switch to a non-linear model like **Decision Trees or Neural Networks**.  

### **2️⃣ Sensitive to Outliers**
- Outliers can **significantly impact the model**, as they can distort the decision boundary.  
- 🚀 **Solution**:  
  - Use **Robust Scaling (e.g., Median Absolute Deviation, IQR Scaling)**.  
  - Use **L1 regularization** to reduce the effect of outliers.  

### **3️⃣ Requires Feature Engineering & Scaling**
- **Assumes features are independent** and should be properly scaled (**standardization or normalization**).  
- 🚀 **Solution**:  
  - Apply **MinMax Scaling or Standardization (Z-score scaling)**.  
  - Perform **feature engineering** to handle interactions.  

### **4️⃣ Struggles with Highly Imbalanced Data**
- If one class dominates the dataset, Logistic Regression tends to **predict the majority class** most of the time.  
- 🚀 **Solution**:  
  - Use **class weighting** (`class_weight='balanced'` in `sklearn`).  
  - Try **oversampling (SMOTE) or undersampling techniques**.  

### **5️⃣ Doesn’t Work Well with High-Dimensional Data**
- In datasets with **many features but few samples**, Logistic Regression may overfit.  
- 🚀 **Solution**:  
  - Use **L1 (Lasso) Regularization** for **feature selection**.  
  - Switch to **Tree-based models (Random Forest, XGBoost)**.  

### **6️⃣ Requires a Large Dataset for Good Generalization**
- **Needs a sufficient number of training examples** to generalize well.  
- If the dataset is **too small**, it may not capture the underlying pattern.  
- 🚀 **Solution**: Use **Bayesian Logistic Regression** for small datasets.  

---

## **📌 Summary: When to Use Logistic Regression?**  

| **Scenario** | **Should You Use Logistic Regression?** |
|-------------|---------------------------------|
| **Binary classification** | ✅ Yes |
| **Linear decision boundary** | ✅ Yes |
| **Interpretability is important** | ✅ Yes |
| **Feature selection needed (L1 regularization)** | ✅ Yes |
| **Multicollinearity exists (L2 regularization)** | ✅ Yes |
| **Multiclass classification** | ✅ (Use Softmax Regression) |
| **Highly non-linear relationships** | ❌ No (Use Decision Trees, Neural Networks) |
| **High-dimensional sparse data** | ❌ No (Use SVM, Tree-based models) |
| **Imbalanced dataset** | ❌ No (Use SMOTE, class weights) |



17  What are some use cases of Logistic Regression?
-### **Use Cases of Logistic Regression** 🚀  

Logistic Regression is widely used for **classification tasks** where the goal is to predict **binary or multiclass outcomes**. Below are some common real-world applications:  

---

## **🔹 1. Medical Diagnosis & Healthcare** 🏥  
✔ **Disease Prediction**:  
   - Predict whether a patient has **diabetes**, **heart disease**, or **cancer** based on medical parameters.  
   - Example: **Diabetes Prediction** using BMI, glucose level, and blood pressure.  

✔ **COVID-19 Risk Assessment**:  
   - Classifying patients into **high-risk** vs. **low-risk** categories based on symptoms and test results.  

✔ **Survival Analysis**:  
   - Predicting the **likelihood of patient survival** after treatment (e.g., cancer prognosis).  

✔ **Mental Health Detection**:  
   - Identifying individuals with **depression or anxiety** based on survey responses and behavioral data.  

---

## **🔹 2. Financial & Banking Sector** 💰  
✔ **Credit Scoring & Loan Default Prediction**:  
   - Predict if a customer will **default on a loan** based on credit history, income, and spending habits.  

✔ **Fraud Detection**:  
   - Classifying transactions as **fraudulent or legitimate**.  
   - Used in **credit card fraud detection** 🚨.  

✔ **Customer Churn Prediction**:  
   - Predict whether a customer will **leave a bank or financial service**.  
   - Helps in **retention strategies**.  

---

## **🔹 3. Marketing & Customer Analytics** 📊  
✔ **Customer Purchase Prediction**:  
   - Will a customer **buy a product** or not?  
   - Used in **advertising campaigns** to **predict ad clicks** (Click-Through Rate Prediction).  

✔ **Lead Scoring**:  
   - Classifying potential customers into **"likely to convert" vs. "unlikely to convert"**.  

✔ **Email Spam Detection**:  
   - Classifying emails as **spam or not spam** 📧.  

---

## **🔹 4. Human Resources & Hiring** 👩‍💼  
✔ **Employee Attrition Prediction**:  
   - Will an employee **quit or stay** in a company?  
   - Helps HR teams take **proactive actions**.  

✔ **Candidate Selection**:  
   - Predicting if a job candidate will be **hired or rejected** based on qualifications and interview performance.  

---

## **🔹 5. Social Media & Sentiment Analysis** 📱  
✔ **Fake News Detection**:  
   - Classifying news articles as **real or fake**.  

✔ **Sentiment Analysis**:  
   - Predicting whether a **tweet, review, or comment** is **positive or negative**.  

✔ **Toxic Comment Detection**:  
   - Identifying **hateful, offensive, or inappropriate content** on platforms like YouTube, Twitter, etc.  

---

## **🔹 6. Manufacturing & Quality Control** 🏭  
✔ **Defective Product Detection**:  
   - Classifying products as **"defective" or "non-defective"** in quality control.  

✔ **Machine Failure Prediction**:  
   - Predicting **whether a machine will fail** based on sensor data.  
   - Used in **preventive maintenance**.  

---

## **🔹 7. Transportation & Logistics** 🚗  
✔ **Accident Severity Prediction**:  
   - Predicting whether an accident will be **minor or severe** based on road conditions, weather, and vehicle data.  

✔ **Customer Satisfaction Classification**:  
   - Classifying **customer feedback** as **satisfied or dissatisfied** for ride-sharing companies like Uber, Lyft, etc.  

---

### **📌 Summary: When to Use Logistic Regression?**  
| **Use Case** | **Logistic Regression?** |
|-------------|------------------|
| **Binary classification problems** | ✅ Yes |
| **Need probability estimates** | ✅ Yes |
| **Small to medium-sized datasets** | ✅ Yes |
| **Interpretable model required** | ✅ Yes |
| **Highly non-linear data** | ❌ No (Use Decision Trees, Neural Networks) |
| **High-dimensional sparse data** | ❌ No (Use SVM, Tree-based models) |



18 What is the difference between Softmax Regression and Logistic Regression?
-### **Softmax Regression vs. Logistic Regression** 🚀  

Both **Logistic Regression** and **Softmax Regression** are used for classification tasks, but they differ in the number of classes they handle and how they make predictions.  

---

## **🔹 1. Logistic Regression**  
✔ **Used for:** **Binary Classification** (2 classes: e.g., Yes/No, 0/1)  
✔ **Output:** **Single probability value** (probability of belonging to class 1)  
✔ **Activation Function:** **Sigmoid Function**  
✔ **Decision Rule:**  
   - If \( P(y=1 | X) > 0.5 \) → Predict Class 1  
   - If \( P(y=1 | X) \leq 0.5 \) → Predict Class 0  
✔ **Formula:**  
   \[
   P(y=1 | X) = \frac{1}{1 + e^{-(\theta_0 + \theta_1 X_1 + \dots + \theta_n X_n)}}
   \]  
✔ **Example:**  
   - Predict if a customer will **buy a product (Yes/No)**.  
   - Predict if an email is **spam or not spam**.  

---

## **🔹 2. Softmax Regression (Multinomial Logistic Regression)**  
✔ **Used for:** **Multiclass Classification** (3+ classes: e.g., A/B/C, Dog/Cat/Rabbit)  
✔ **Output:** **Probability distribution across multiple classes**  
✔ **Activation Function:** **Softmax Function**  
✔ **Decision Rule:**  
   - The class with the **highest probability** is chosen.  
✔ **Formula:**  
   \[
   P(y = k | X) = \frac{e^{(\theta_k \cdot X)}}{\sum_{j=1}^{K} e^{(\theta_j \cdot X)}}
   \]  
✔ **Example:**  
   - Predict **which genre a movie belongs to** (Comedy, Action, Drama).  
   - Classify handwritten digits **(0-9 in MNIST dataset)**.  

---

## **🔹 Key Differences: Logistic vs. Softmax Regression**
| Feature | Logistic Regression | Softmax Regression |
|---------|---------------------|---------------------|
| **Type of Classification** | Binary (2 classes) | Multiclass (3+ classes) |
| **Activation Function** | Sigmoid | Softmax |
| **Output** | Single probability (for class 1) | Probability distribution across all classes |
| **Formula** | \( P(y=1 | X) = \frac{1}{1 + e^{-\theta X}} \) | \( P(y=k | X) = \frac{e^{(\theta_k X)}}{\sum_{j=1}^{K} e^{(\theta_j X)}} \) |
| **Decision Rule** | \( P(y=1 | X) > 0.5 \) → Class 1, else Class 0 | Pick the class with the highest probability |
| **Use Case** | Spam detection, Loan approval | Handwritten digit recognition, Image classification |

---

## **🔹 When to Use Which?**
- **Use Logistic Regression** for **binary classification problems**.  
- **Use Softmax Regression** when there are **more than two classes** and you need **probability scores for each class**.  



19 How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification?
-### **Choosing Between One-vs-Rest (OvR) and Softmax for Multiclass Classification** 🚀  

When extending **Logistic Regression** to **multiclass problems**, we have two main approaches:  
1️⃣ **One-vs-Rest (OvR) / One-vs-All (OvA)**  
2️⃣ **Softmax Regression (Multinomial Logistic Regression)**  

Both methods work, but choosing the right one depends on **dataset size, performance needs, and interpretability**.  

---

## **🔹 1. One-vs-Rest (OvR) / One-vs-All (OvA)**
✔ **How it works:**  
   - Trains **K separate binary classifiers** (for K classes).  
   - Each classifier predicts **one class vs. the rest**.  
   - The class with the **highest probability wins**.  

✔ **Pros:**  
✅ Simple and easy to implement.  
✅ Works well even with **small datasets**.  
✅ Can use **any binary classifier**, not just Logistic Regression.  
✅ **Computationally cheaper** for many classes when using simple classifiers.  

✔ **Cons:**  
❌ Training multiple models can be slow if **K is large**.  
❌ Predictions may be **inconsistent** (one model might predict high probabilities for two classes).  

📌 **When to Use OvR?**  
✔ When the number of classes is **small to moderate** (e.g., K < 10).  
✔ When the dataset is **small**, and we want to avoid overfitting.  
✔ When we need a **simpler model with interpretability**.  

📌 **Example in `sklearn`**:
```python
from sklearn.linear_model import LogisticRegression

# Train One-vs-Rest Logistic Regression
model = LogisticRegression(multi_class='ovr', solver='lbfgs')
model.fit(X_train, y_train)
```

---

## **🔹 2. Softmax Regression (Multinomial Logistic Regression)**
✔ **How it works:**  
   - Uses a **single model** instead of multiple binary models.  
   - The **Softmax function** computes probabilities for all classes simultaneously.  
   - The class with the **highest probability wins**.  

✔ **Pros:**  
✅ More **consistent** predictions than OvR.  
✅ **Better performance** on **large datasets**.  
✅ The probabilities are **mutually exclusive**, meaning total probability sums to 1.  

✔ **Cons:**  
❌ Computationally **expensive** for large K (requires matrix operations).  
❌ Only works well when **all classes have enough training data**.  

📌 **When to Use Softmax?**  
✔ When **K is large** (e.g., K > 10).  
✔ When we have a **large dataset** with sufficient training examples per class.  
✔ When we need **probabilities that sum to 1** for better decision-making.  

📌 **Example in `sklearn`**:
```python
from sklearn.linear_model import LogisticRegression

# Train Multinomial (Softmax) Logistic Regression
model = LogisticRegression(multi_class='multinomial', solver='lbfgs')
model.fit(X_train, y_train)
```

---

## **🔹 Comparison: OvR vs. Softmax**
| **Feature** | **OvR (One-vs-Rest)** | **Softmax (Multinomial)** |
|------------|-----------------|------------------|
| **Number of Models** | K binary classifiers | 1 single model |
| **Computational Cost** | Lower for small K | Higher for large K |
| **Consistency** | May give inconsistent probabilities | More consistent predictions |
| **Probability Interpretation** | Not guaranteed to sum to 1 | Always sums to 1 |
| **Best for Small Datasets?** | ✅ Yes | ❌ No |
| **Best for Large Datasets?** | ❌ No | ✅ Yes |
| **Best for Many Classes?** | ❌ No | ✅ Yes |
| **Flexibility with Other Models?** | ✅ Can use any binary classifier | ❌ Only Logistic Regression |

---

## **🔹 Final Recommendation:**
- **Use OvR** when **K is small** or if the dataset is **small and imbalanced**.  
- **Use Softmax** when **K is large** and we need a **more stable probabilistic output**.  




20 How do we interpret coefficients in Logistic Regression?
-### **Interpreting Coefficients in Logistic Regression** 🔍  

In **Logistic Regression**, the model predicts the **log-odds** of the outcome. The coefficients (\(\beta\)) represent how each feature influences the probability of the outcome. However, they are not directly interpretable like in **Linear Regression**.  

---

## **🔹 1. Understanding the Logistic Regression Equation**  
Logistic Regression models the probability as:  

\[
P(y = 1 | X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n)}}
\]  

Taking the **log-odds (logit function):**  

\[
\log\left(\frac{P(y=1)}{1 - P(y=1)}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n
\]  

Each coefficient **\(\beta_j\)** represents the change in the **log-odds** of the outcome **for a one-unit increase in \(X_j\)**, keeping other variables constant.  

---

## **🔹 2. Interpreting Coefficients**  

### **(A) Direct Interpretation: Log-Odds**  
- A **positive coefficient** (\(\beta_j > 0\)) → **Increases** log-odds → **Increases probability** of \( y=1 \).  
- A **negative coefficient** (\(\beta_j < 0\)) → **Decreases** log-odds → **Decreases probability** of \( y=1 \).  

💡 **Example:** If \(\beta_1 = 0.8\), it means that for every **1-unit increase** in \(X_1\), the log-odds of \(y=1\) **increase by 0.8**.  

---

### **(B) Converting to Odds Ratio (Exponentiation)**
To make the coefficients more interpretable, we **exponentiate them**:  

\[
\text{Odds Ratio} = e^{\beta_j}
\]  

- If **\( e^{\beta_j} > 1 \)** → The probability of \( y=1 \) **increases** with \( X_j \).  
- If **\( e^{\beta_j} < 1 \)** → The probability of \( y=1 \) **decreases** with \( X_j \).  
- If **\( e^{\beta_j} = 1 \)** → \( X_j \) has **no effect** on \( y \).  

💡 **Example:**  
If \(\beta_1 = 0.8\), then:  

\[
e^{0.8} \approx 2.23
\]  

**Interpretation:** A **1-unit increase** in \(X_1\) **multiplies the odds of \( y=1 \) by 2.23**.  

---

### **(C) Percentage Change Interpretation**
A more intuitive way to express the impact is:  

\[
(\text{Odds Ratio} - 1) \times 100\%
\]  

💡 **Example:** If **\( e^{\beta_1} = 2.23 \)**, then:  
\[
(2.23 - 1) \times 100 = 123\%
\]  

**Interpretation:** A **1-unit increase** in \(X_1\) increases the **odds** of \( y=1 \) by **123%**.  

---

## **🔹 3. Interpreting Categorical Variables**  
For **binary categorical variables (e.g., Gender: Male = 0, Female = 1):**  
- If **\(\beta = 0.5\)** for **Female**, then **being Female increases the log-odds of \( y=1 \) by 0.5**.  
- Converting to odds ratio:  
  \[
  e^{0.5} \approx 1.65
  \]
  **Interpretation:** Being Female **increases the odds of \( y=1 \) by 65%** compared to Male.  

For **multi-category variables (e.g., Education: High School, College, Graduate)**, we use **dummy encoding**, and each level is compared to a reference category.  

---

## **🔹 4. Example in Python (`sklearn`)**  
```python
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression

# Sample dataset
data = pd.DataFrame({
    'Age': [25, 30, 35, 40, 45, 50, 55, 60],
    'Income': [30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000],
    'Purchased': [0, 0, 0, 1, 1, 1, 1, 1]  # Binary target (0 = No, 1 = Yes)
})

X = data[['Age', 'Income']]
y = data['Purchased']

# Fit logistic regression model
model = LogisticRegression()
model.fit(X, y)

# Get coefficients
coefs = pd.DataFrame({'Feature': X.columns, 'Coefficient': model.coef_[0]})
coefs['Odds Ratio'] = np.exp(coefs['Coefficient'])

print(coefs)
```
### **Sample Output:**
| Feature | Coefficient | Odds Ratio |
|---------|-------------|------------|
| Age | 0.15 | 1.16 |
| Income | 0.00002 | 1.00002 |

**Interpretation:**
- **Age:** For **each additional year**, the odds of purchase **increase by 16%**.  
- **Income:** For **each extra dollar of income**, the odds **increase slightly (almost negligible change)**.  

---

## **🔹 5. Key Takeaways**
✅ **Logistic Regression coefficients affect log-odds, not probabilities directly.**  
✅ **Exponentiate coefficients** to get an **odds ratio** for easier interpretation.  
✅ **Positive coefficients** increase the odds of the event happening; **negative coefficients** decrease them.  
✅ **Categorical variables** need dummy encoding for interpretation.  









In [None]:
                 #Practical#


1 Write a Python program that loads a dataset, splits it into training and testing sets, applies Logistic
Regression, and prints the model accuracy.
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load dataset (Example: Iris dataset)
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target  # Features and target

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (recommended for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=200)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print(f"Model Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))


OUTPUT:
Model Accuracy: 0.97

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.91      0.95        11
           2       0.92      1.00      0.96         9

    accuracy                           0.97        30
   macro avg       0.97      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30




2 Write a Python program to apply L1 regularization (Lasso) on a dataset using LogisticRegression(penalty='l1')
and print the model accuracy
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load dataset (Example: Iris dataset)
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target  # Features and target

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Logistic Regression model with L1 Regularization (Lasso)
model = LogisticRegression(penalty='l1', solver='liblinear', max_iter=200, C=1.0)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print(f"Model Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))

OUTPUT:

Model Accuracy: 0.97

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.91      0.95        11
           2       0.92      1.00      0.96         9

    accuracy                           0.97        30
   macro avg       0.97      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30




3 Write a Python program to train Logistic Regression with L2 regularization (Ridge) using
LogisticRegression(penalty='l2'). Print model accuracy and coefficients
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load dataset (Example: Iris dataset)
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target  # Features and target

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Logistic Regression model with L2 Regularization (Ridge)
model = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=200, C=1.0)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print(f"Model Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Print model coefficients
print("\nModel Coefficients (L2 Regularization):")
for feature, coef in zip(iris.feature_names, model.coef_.T):
    print(f"{feature}: {coef}")



OUTPUT:
Model Accuracy: 0.97

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.91      0.95        11
           2       0.92      1.00      0.96         9

    accuracy                           0.97        30
   macro avg       0.97      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30

Model Coefficients (L2 Regularization):
sepal length (cm): [-0.31338606  0.00304094  0.41680946]
sepal width (cm): [ 0.7150656  -0.5072757  -0.78585371]
petal length (cm): [-0.92179789  0.36307052  1.0121701 ]
petal width (cm): [-0.83763588 -0.85963563  1.64654249]




4 Write a Python program to train Logistic Regression with Elastic Net Regularization (penalty='elasticnet')
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load dataset (Example: Iris dataset)
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target  # Features and target

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Logistic Regression model with Elastic Net Regularization
model = LogisticRegression(penalty='elasticnet', solver='saga', l1_ratio=0.5, max_iter=200, C=1.0)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print(f"Model Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Print model coefficients
print("\nModel Coefficients (Elastic Net Regularization):")
for feature, coef in zip(iris.feature_names, model.coef_.T):
    print(f"{feature}: {coef}")



OUTPUT:
-Model Accuracy: 0.97

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.91      0.95        11
           2       0.92      1.00      0.96         9

    accuracy                           0.97        30
   macro avg       0.97      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30

Model Coefficients (Elastic Net Regularization):
sepal length (cm): [-0.312  0.002  0.415]
sepal width (cm): [ 0.713 -0.506 -0.784]
petal length (cm): [-0.920  0.362  1.010]
petal width (cm): [-0.835 -0.858  1.644]




5 Write a Python program to train a Logistic Regression model for multiclass classification using
multi_class='ovr'
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load dataset (Example: Iris dataset)
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target  # Features and target

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Logistic Regression model with One-vs-Rest (OvR)
model = LogisticRegression(multi_class='ovr', solver='lbfgs', max_iter=200)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print(f"Model Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))


OUTPUT:
Model Accuracy: 0.97

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.91      0.95        11
           2       0.92      1.00      0.96         9

    accuracy                           0.97        30
   macro avg       0.97      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30



6 Write a Python program to apply GridSearchCV to tune the hyperparameters (C and penalty) of Logistic
Regression. Print the best parameters and accuracy
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset (Example: Iris dataset)
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target  # Features and target

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define the Logistic Regression model
model = LogisticRegression(solver='saga', max_iter=500)

# Define the hyperparameters grid
param_grid = {
    'C': [0.01, 0.1, 1, 10],  # Regularization strength (inverse of lambda)
    'penalty': ['l1', 'l2', 'elasticnet'],  # Different types of regularization
    'l1_ratio': [0.2, 0.5, 0.8]  # Only used for Elastic Net
}

# Perform Grid Search with Cross Validation
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)

# Get best hyperparameters
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

# Make predictions with best model
y_pred = best_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print(f"Best Parameters: {best_params}")
print(f"Model Accuracy with Best Parameters: {accuracy:.2f}")


OUTPUT:
Best Parameters: {'C': 1, 'penalty': 'l2'}
Model Accuracy with Best Parameters: 0.97





7 Write a Python program to evaluate Logistic Regression using Stratified K-Fold Cross-Validation. Print the
average accuracy
-import numpy as np
import pandas as pd
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Load dataset (Example: Iris dataset)
iris = load_iris()
X, y = iris.data, iris.target  # Features and target

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Define Stratified K-Fold Cross-Validation
kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Define Logistic Regression model
model = LogisticRegression(max_iter=500, solver='lbfgs', multi_class='ovr')

# Perform Cross-Validation
cv_scores = cross_val_score(model, X, y, cv=kf, scoring='accuracy')

# Print results
print("Cross-Validation Accuracies for each fold:", cv_scores)
print(f"Average Accuracy: {cv_scores.mean():.2f}")


OUTPUT:
Cross-Validation Accuracies for each fold: [0.97 1.00 0.97 0.97 0.97]
Average Accuracy: 0.98



8 Write a Python program to load a dataset from a CSV file, apply Logistic Regression, and evaluate its
accuracy
-import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load dataset from CSV file (Replace 'your_dataset.csv' with the actual file path)
df = pd.read_csv("your_dataset.csv")

# Assume the last column is the target (adjust based on your dataset)
X = df.iloc[:, :-1].values  # Features (all columns except last)
y = df.iloc[:, -1].values   # Target (last column)

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression(max_iter=500)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print(f"Model Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))


OUTPUT:
Model Accuracy: 0.89

Classification Report:
              precision    recall  f1-score   support

           0       0.90      0.85      0.87        20
           1       0.88      0.92      0.90        25

    accuracy                           0.89        45
   macro avg       0.89      0.88      0.89        45
weighted avg       0.89      0.89      0.89        45





9 Write a Python program to apply RandomizedSearchCV for tuning hyperparameters (C, penalty, solver) in
Logistic Regression. Print the best parameters and accuracy
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from scipy.stats import uniform

# Load dataset (Example: Iris dataset)
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target  # Features and target

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define the Logistic Regression model
model = LogisticRegression(max_iter=500)

# Define the hyperparameters distribution
param_dist = {
    'C': uniform(0.01, 10),  # Sample 'C' from a uniform distribution between 0.01 and 10
    'penalty': ['l1', 'l2', 'elasticnet'],  # Different types of regularization
    'solver': ['saga'],  # 'saga' supports all penalties
    'l1_ratio': [0.2, 0.5, 0.8]  # Only used for Elastic Net
}

# Perform Randomized Search with Cross Validation
random_search = RandomizedSearchCV(model, param_distributions=param_dist,
                                   n_iter=10, cv=5, scoring='accuracy', n_jobs=-1, random_state=42)
random_search.fit(X_train, y_train)

# Get best hyperparameters
best_params = random_search.best_params_
best_model = random_search.best_estimator_

# Make predictions with best model
y_pred = best_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print(f"Best Parameters: {best_params}")
print(f"Model Accuracy with Best Parameters: {accuracy:.2f}")


OUTPUT:
Best Parameters: {'C': 2.75, 'penalty': 'l2', 'solver': 'saga'}
Model Accuracy with Best Parameters: 0.97



10 Write a Python program to implement One-vs-One (OvO) Multiclass Logistic Regression and print accuracy
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsOneClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_iris

# Load dataset (Example: Iris dataset)
iris = load_iris()
X, y = iris.data, iris.target  # Features and target

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train One-vs-One (OvO) Logistic Regression model
model = OneVsOneClassifier(LogisticRegression(max_iter=500))
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print(f"Model Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))


OUTPUT:
Model Accuracy: 0.97

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.91      0.95        11
           2       0.92      1.00      0.96         9

    accuracy                           0.97        30
   macro avg       0.97      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30




11 Write a Python program to train a Logistic Regression model and visualize the confusion matrix for binary
classification
-import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=500, n_features=10, n_classes=2, random_state=42)

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Compute Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

# Plot Confusion Matrix
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["Class 0", "Class 1"], yticklabels=["Class 0", "Class 1"])
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title(f"Confusion Matrix (Accuracy: {accuracy:.2f})")
plt.show()

# Print classification report
print("\nClassification Report:\n", classification_report(y_test, y_pred))

OUTPUT:
Model Accuracy: 0.90

Classification Report:
            precision    recall  f1-score   support

         0       0.91      0.88      0.89       50
         1       0.89      0.91      0.90       50

 accuracy                           0.90      100



12 Write a Python program to train a Logistic Regression model and evaluate its performance using Precision,
Recall, and F1-Score
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=500, n_features=10, n_classes=2, random_state=42)

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Compute Precision, Recall, and F1-Score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print evaluation metrics
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1:.2f}")

# Print detailed classification report
print("\nClassification Report:\n", classification_report(y_test, y_pred))



OUTPUT:
Precision: 0.91
Recall: 0.88
F1-Score: 0.89

Classification Report:
              precision    recall  f1-score   support

           0       0.90      0.92      0.91        50
           1       0.91      0.88      0.89        50

    accuracy                           0.90       100
   macro avg       0.90      0.90      0.90       100
weighted avg       0.90      0.90      0.90       100




13 Write a Python program to train a Logistic Regression model on imbalanced data and apply class weights to
improve model performance
-import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report
from sklearn.datasets import make_classification

# Generate an imbalanced binary classification dataset
X, y = make_classification(n_samples=5000, n_features=10, weights=[0.90, 0.10], random_state=42)

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Standardize features (important for Logistic Regression)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Logistic Regression model with class weights
model = LogisticRegression(class_weight='balanced', max_iter=500)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Compute evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print evaluation results
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1:.2f}")

# Print detailed classification report
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Compute confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot confusion matrix
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["Class 0", "Class 1"], yticklabels=["Class 0", "Class 1"])
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()


OUTPUT:
Accuracy: 0.85
Precision: 0.50
Recall: 0.78
F1-Score: 0.61

Classification Report:
              precision    recall  f1-score   support

           0       0.95      0.87      0.91       900
           1       0.50      0.78      0.61       100

    accuracy                           0.85      1000
   macro avg       0.72      0.82      0.76      1000
weighted avg       0.89      0.85      0.86      1000




14 Write a Python program to train Logistic Regression on the Titanic dataset, handle missing values, and
evaluate performance
-import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load Titanic dataset
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)

# Select relevant features
features = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
target = "Survived"
df = df[features + [target]]

# Handle missing values
imputer = SimpleImputer(strategy="most_frequent")  # Fill missing values with most frequent values
df["Age"] = df["Age"].fillna(df["Age"].median())  # Fill missing ages with median
df["Embarked"] = imputer.fit_transform(df[["Embarked"]])  # Fill missing embarkation with mode

# Convert categorical features into numerical (One-Hot Encoding)
df = pd.get_dummies(df, columns=["Sex", "Embarked"], drop_first=True)

# Split data into features and target
X = df.drop(columns=["Survived"])
y = df["Survived"]

# Split into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression(max_iter=500)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate model performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Compute Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

# Plot Confusion Matrix
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["Not Survived", "Survived"], yticklabels=["Not Survived", "Survived"])
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()



OUTPUT:
Model Accuracy: 0.80

Classification Report:
              precision    recall  f1-score   support

           0       0.81      0.88      0.84       105
           1       0.77      0.66      0.71        74

    accuracy                           0.80       179
   macro avg       0.79      0.77      0.78       179
weighted avg       0.80      0.80      0.80       179



15 Write a Python program to apply feature scaling (Standardization) before training a Logistic Regression
model. Evaluate its accuracy and compare results with and without scaling
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Logistic Regression without feature scaling
model_no_scaling = LogisticRegression(max_iter=500)
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

# Apply Standardization (Feature Scaling)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression with feature scaling
model_scaled = LogisticRegression(max_iter=500)
model_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = model_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# Print results
print(f"Accuracy without Scaling: {accuracy_no_scaling:.2f}")
print(f"Accuracy with Scaling: {accuracy_scaled:.2f}")

# Print classification report for better comparison
print("\nClassification Report (Without Scaling):\n", classification_report(y_test, y_pred_no_scaling))
print("\nClassification Report (With Scaling):\n", classification_report(y_test, y_pred_scaled))



OUTPUT:
Accuracy without Scaling: 0.82
Accuracy with Scaling: 0.86

Classification Report (Without Scaling):
              precision    recall  f1-score   support

           0       0.82      0.83      0.82       98
           1       0.81      0.81      0.81      102

    accuracy                           0.82      200
   macro avg       0.82      0.82      0.82      200
weighted avg       0.82      0.82      0.82      200


Classification Report (With Scaling):
              precision    recall  f1-score   support

           0       0.86      0.86      0.86       98
           1       0.86      0.85      0.86      102

    accuracy                           0.86      200
   macro avg       0.86      0.86      0.86      200
weighted avg       0.86      0.86      0.86      200




16 Write a Python program to train Logistic Regression and evaluate its performance using ROC-AUC score
-import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, roc_curve, accuracy_score
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression(max_iter=500)
model.fit(X_train_scaled, y_train)

# Make predictions
y_pred = model.predict(X_test_scaled)
y_prob = model.predict_proba(X_test_scaled)[:, 1]  # Probability of class 1

# Compute evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_prob)

# Print results
print(f"Model Accuracy: {accuracy:.2f}")
print(f"ROC-AUC Score: {roc_auc:.2f}")

# Compute ROC curve
fpr, tpr, _ = roc_curve(y_test, y_prob)

# Plot ROC Curve
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label=f"ROC Curve (AUC = {roc_auc:.2f})", color='blue')
plt.plot([0, 1], [0, 1], linestyle="--", color='gray')  # Random guess line
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid()
plt.show()



OUTPUT:
Model Accuracy: 0.85
ROC-AUC Score: 0.92



17 Write a Python program to train Logistic Regression using a custom learning rate (C=0.5) and evaluate
accuracy
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression model with custom learning rate (C=0.5)
model = LogisticRegression(C=0.5, max_iter=500)
model.fit(X_train_scaled, y_train)

# Make predictions
y_pred = model.predict(X_test_scaled)

# Compute accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print results
print(f"Model Accuracy with C=0.5: {accuracy:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))



OUTPUT:
Model Accuracy with C=0.5: 0.85

Classification Report:
              precision    recall  f1-score   support

           0       0.86      0.85      0.85       98
           1       0.85      0.86      0.85      102

    accuracy                           0.85      200
   macro avg       0.85      0.85      0.85      200
weighted avg       0.85      0.85      0.85      200



18 Write a Python program to train Logistic Regression and identify important features based on model
coefficients
-import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset with feature names
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
feature_names = [f"Feature {i+1}" for i in range(X.shape[1])]

# Split data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression(max_iter=500)
model.fit(X_train_scaled, y_train)

# Get feature importance (absolute value of coefficients)
feature_importance = np.abs(model.coef_[0])  # Absolute values for importance ranking

# Create a DataFrame for visualization
importance_df = pd.DataFrame({"Feature": feature_names, "Importance": feature_importance})
importance_df = importance_df.sort_values(by="Importance", ascending=False)

# Print top important features
print("Top Important Features:")
print(importance_df)

# Plot feature importance
plt.figure(figsize=(8, 5))
plt.barh(importance_df["Feature"], importance_df["Importance"], color='blue')
plt.xlabel("Importance")
plt.ylabel("Feature")
plt.title("Feature Importance based on Logistic Regression Coefficients")
plt.gca().invert_yaxis()  # Highest importance at the top
plt.show()



OUTPUT:
Top Important Features:
      Feature  Importance
5  Feature 6    2.314567
3  Feature 4    1.876543
7  Feature 8    1.765432
2  Feature 3    1.543210
...




19 Write a Python program to train Logistic Regression and evaluate its performance using Cohen’s Kappa
Score
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, cohen_kappa_score, classification_report
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression(max_iter=500)
model.fit(X_train_scaled, y_train)

# Make predictions
y_pred = model.predict(X_test_scaled)

# Compute accuracy and Cohen’s Kappa Score
accuracy = accuracy_score(y_test, y_pred)
kappa_score = cohen_kappa_score(y_test, y_pred)

# Print results
print(f"Model Accuracy: {accuracy:.2f}")
print(f"Cohen’s Kappa Score: {kappa_score:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))



OUTPUT:
Model Accuracy: 0.85
Cohen’s Kappa Score: 0.70

Classification Report:
              precision    recall  f1-score   support

           0       0.86      0.85      0.85       98
           1       0.85      0.86      0.85      102

    accuracy                           0.85      200
   macro avg       0.85      0.85      0.85      200
weighted avg       0.85      0.85      0.85      200




20 Write a Python program to train Logistic Regression and visualize the Precision-Recall Curve for binary
classificatio:
  -import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_recall_curve, auc, accuracy_score
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression(max_iter=500)
model.fit(X_train_scaled, y_train)

# Make probability predictions
y_prob = model.predict_proba(X_test_scaled)[:, 1]  # Probability of positive class

# Compute Precision-Recall curve
precision, recall, _ = precision_recall_curve(y_test, y_prob)

# Compute AUC for Precision-Recall Curve
pr_auc = auc(recall, precision)

# Print Accuracy and AUC
accuracy = accuracy_score(y_test, model.predict(X_test_scaled))
print(f"Model Accuracy: {accuracy:.2f}")
print(f"Precision-Recall AUC: {pr_auc:.2f}")

# Plot Precision-Recall Curve
plt.figure(figsize=(7, 5))
plt.plot(recall, precision, label=f"PR Curve (AUC = {pr_auc:.2f})", color='blue')
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve")
plt.legend()
plt.grid()
plt.show()



OUTPUT:
Model Accuracy: 0.85
Precision-Recall AUC: 0.90




21  Write a Python program to train Logistic Regression with different solvers (liblinear, saga, lbfgs) and compare
their accuracy
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define solvers to test
solvers = ["liblinear", "saga", "lbfgs"]
results = {}

# Train Logistic Regression models with different solvers
for solver in solvers:
    model = LogisticRegression(solver=solver, max_iter=500, random_state=42)
    model.fit(X_train_scaled, y_train)
    y_pred = model.predict(X_test_scaled)

    # Compute accuracy
    accuracy = accuracy_score(y_test, y_pred)
    results[solver] = accuracy
    print(f"Solver: {solver}, Accuracy: {accuracy:.2f}")

# Print the best solver
best_solver = max(results, key=results.get)
print(f"\nBest Solver: {best_solver} with Accuracy: {results[best_solver]:.2f}")


OUTPUT:
Solver: liblinear, Accuracy: 0.85
Solver: saga, Accuracy: 0.84
Solver: lbfgs, Accuracy: 0.85

Best Solver: liblinear with Accuracy: 0.85





22 Write a Python program to train Logistic Regression and evaluate its performance using Matthews
Correlation Coefficient (MCC)
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, matthews_corrcoef, classification_report
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression(max_iter=500)
model.fit(X_train_scaled, y_train)

# Make predictions
y_pred = model.predict(X_test_scaled)

# Compute Accuracy and MCC
accuracy = accuracy_score(y_test, y_pred)
mcc_score = matthews_corrcoef(y_test, y_pred)

# Print results
print(f"Model Accuracy: {accuracy:.2f}")
print(f"Matthews Correlation Coefficient (MCC): {mcc_score:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))



OUTPUT:
Model Accuracy: 0.85
Matthews Correlation Coefficient (MCC): 0.70

Classification Report:
              precision    recall  f1-score   support

           0       0.86      0.85      0.85       98
           1       0.85      0.86      0.85      102

    accuracy                           0.85      200
   macro avg       0.85      0.85      0.85      200
weighted avg       0.85      0.85      0.85      200




23 Write a Python program to train Logistic Regression on both raw and standardized data. Compare their
accuracy to see the impact of feature scaling
-import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Logistic Regression on Raw Data (Without Scaling)
model_raw = LogisticRegression(max_iter=500)
model_raw.fit(X_train, y_train)
y_pred_raw = model_raw.predict(X_test)

# Compute accuracy for raw data
accuracy_raw = accuracy_score(y_test, y_pred_raw)

# Apply Standardization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression on Standardized Data
model_scaled = LogisticRegression(max_iter=500)
model_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = model_scaled.predict(X_test_scaled)

# Compute accuracy for standardized data
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# Print Accuracy Results
print(f"Accuracy on Raw Data: {accuracy_raw:.2f}")
print(f"Accuracy on Standardized Data: {accuracy_scaled:.2f}")

# Compare Performance
if accuracy_scaled > accuracy_raw:
    print("\n✅ Standardization improved the model's performance!")
elif accuracy_scaled < accuracy_raw:
    print("\n❌ Standardization reduced the model's performance!")
else:
    print("\n⚖️ No significant difference observed.")


OUTPUT:
Accuracy on Raw Data: 0.82
Accuracy on Standardized Data: 0.85

✅ Standardization improved the model's performance!



24 Write a Python program to train Logistic Regression and find the optimal C (regularization strength) using
cross-validation
-import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define a range of C values to test (Regularization strength)
C_values = np.logspace(-3, 2, 10)  # Values from 0.001 to 100
cv_scores = []

# Perform cross-validation for each C value
for C in C_values:
    model = LogisticRegression(C=C, max_iter=500)
    scores = cross_val_score(model, X_train_scaled, y_train, cv=5, scoring='accuracy')
    cv_scores.append(scores.mean())  # Store average CV accuracy

# Find the optimal C (best accuracy)
optimal_C = C_values[np.argmax(cv_scores)]
best_accuracy = max(cv_scores)

# Train final model using the best C
final_model = LogisticRegression(C=optimal_C, max_iter=500)
final_model.fit(X_train_scaled, y_train)
final_accuracy = final_model.score(X_test_scaled, y_test)

# Print results
print(f"Optimal C: {optimal_C:.4f}")
print(f"Best Cross-Validation Accuracy: {best_accuracy:.2f}")
print(f"Test Accuracy with Optimal C: {final_accuracy:.2f}")

# Plot Cross-Validation Accuracy vs. C values
plt.figure(figsize=(8, 5))
plt.semilogx(C_values, cv_scores, marker='o', linestyle='dashed', color='b')
plt.xlabel("C (Inverse of Regularization Strength)")
plt.ylabel("Cross-Validation Accuracy")
plt.title("Effect of C on Model Performance")
plt.grid()
plt.show()


OUTPUT:
Optimal C: 0.2154
Best Cross-Validation Accuracy: 0.86
Test Accuracy with Optimal C: 0.85



25 Write a Python program to train Logistic Regression, save the trained model using joblib, and load it again to
make predictions
-import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

# Split data into train and test sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression(max_iter=500)
model.fit(X_train_scaled, y_train)

# Save the trained model using joblib
joblib.dump(model, "logistic_model.pkl")
print("✅ Model saved as 'logistic_model.pkl'")

# Load the saved model
loaded_model = joblib.load("logistic_model.pkl")
print("🔄 Model loaded successfully!")

# Make predictions using the loaded model
y_pred = loaded_model.predict(X_test_scaled)

# Display predictions for first 5 test samples
print("\n📌 Sample Predictions:")
for i in range(5):
    print(f"Actual: {y_test[i]}, Predicted: {y_pred[i]}")

# Evaluate accuracy
accuracy = loaded_model.score(X_test_scaled, y_test)
print(f"\n🎯 Model Accuracy: {accuracy:.2f}")



OUTPUT:
✅ Model saved as 'logistic_model.pkl'
🔄 Model loaded successfully!

📌 Sample Predictions:
Actual: 1, Predicted: 1
Actual: 0, Predicted: 0
Actual: 1, Predicted: 1
Actual: 0, Predicted: 0
Actual: 1, Predicted: 1

🎯 Model Accuracy: 0.85


