**Question 1:  What is Logistic Regression, and how does it differ from Linear Regression?**

Logistic Regression is a statistical method used for binary classification problems—where the outcome is categorical and typically has two classes (e.g., yes/no, spam/not spam, pass/fail).

Logistic Regression is a supervised machine learning algorithm used for classification problems. Unlike linear regression which predicts continuous values it predicts the probability that an input belongs to a specific class. It is used for binary classification where the output can be one of two possible categories such as Yes/No, True/False or 0/1. It uses sigmoid function to convert inputs into a probability value between 0 and 1.

**Key Differences from Linear Regression**

**Output Type**

 - Linear Regression: Predicts continuous numeric values (like house prices or temperature).
 - Logistic Regression: predicts probabilities between 0 and 1, which are then converted to categorical predictions.

**Use Case**
- Linear Regression: Used for regression tasks.
- Logistic Regression: Used for classification tasks.

**Prediction Function**
- Linear Regression: Straight line i.e y = wX + b
- Logistic Regression: Sigmoid function i.e y = 1 / (1 + e^-(wX + b))

**Target Variable**
- Linear Regression: Continuous (house price)
- Logistic Regression: Binary (0 or 1, yes or no)

**Loss Function**
- Linear Regression: Mean Squared Error (MSE)
- Logistic Regression: Binary Cross-Entropy (Log Loss)


**Question 2: Explain the role of the Sigmoid function in Logistic Regression.**

The sigmoid function is the mathematical heart of logistic regression, serving as the crucial link between linear combinations of input features and probability predictions.The Sigmoid function is crucial in Logistic Regression as it transforms linear combinations of input features into probabilities, enabling binary classification.

The Sigmoid function, mathematically defined as as:σ(z) = 1 / (1 + e^(-z)), is an S-shaped curve that maps any real-valued number into the range (0, 1). This property makes it particularly suitable for modeling probabilities, as the output can be interpreted as the likelihood of a given input belonging to a particular class.

**Role in Logistic Regression**

- **Bounded Output (0 to 1):**
The sigmoid function naturally constrains all outputs between 0 and 1, making it perfect for representing probabilities. No matter how large or small the input z becomes, the output will always be a valid probability value.

- **S-Shaped Curve:**
The function creates a smooth S-shaped (sigmoidal) curve that transitions gradually from 0 to 1. This provides a natural decision boundary and avoids abrupt jumps that would be problematic for classification.

- **Monotonic and Differentiable:**
The function is always increasing and smooth everywhere, which ensures that larger input values always correspond to higher probabilities. The differentiability is crucial for optimization algorithms like gradient descent.

- **Probability Mapping:**
The sigmoid transforms the linear predictor z (which can range from -∞ to +∞) into a probability between 0 and 1. This probability represents the likelihood of belonging to the positive class.

- **Decision Boundary:**
When σ(z) = 0.5, we have z = 0, which creates a natural decision boundary. Values above 0.5 typically predict class 1, while values below predict class 0.

- **Interpretability:**
The sigmoid's output directly represents confidence in the prediction. A value near 0.9 indicates high confidence for class 1, while 0.1 indicates high confidence for class 0.

**Question 3: What is Regularization in Logistic Regression and why is it needed?**

Regularization is a fundamental technique used in logistic regression to prevent overfitting by adding a penalty term to the cost function. This penalty term constrains the magnitude of the model parameters (weights), encouraging the model to find simpler solutions that generalize better to unseen data. The core idea is to balance the model's ability to fit the training data with its complexity, following the principle that simpler models are often better at generalizing.

It does this by adding a penalty term to the objective function (also called the loss function or error function) that the model is trying to minimize.By adding a penalty term to the objective function, regularization helps to reduce the complexity of the model and prevent it from fitting the training data too closely. The penalty term is a hyperparameter that controls the strength of the regularization. A higher value for the penalty term leads to stronger regularization and a simpler model, while a lower value allows the model to be more complex.


**Why is Regularization Needed?**  

**Preventing overfitting** is the main reason for regularization. Without it, logistic regression can memorize training data, including noise. This leads to poor performance on new data. Regularization limits this flexibility. It forces the model to learn general patterns instead of specific training examples.  

**Handling high-dimensional data** is important when features outnumber samples. In these situations, unregularized models can become unstable and prone to overfitting. Regularization keeps the parameter space in check, which makes the model more stable and trustworthy.  

**Managing multicollinearity** helps with the issue of highly correlated features that cause unstable parameter estimates. Regularization stabilizes these estimates and stops individual parameters from becoming too large.  

**Numerical stability** is maintained by keeping parameter values within reasonable limits. This prevents overflow errors and helps make optimization more stable. It ensures reliable computation throughout the training process.

**Question 4: What are some common evaluation metrics for classification models, and why are they important?**

When building machine learning models, it’s important to understand how well they perform. Evaluation metrics help us to measure the effectiveness of our models.
There is no universal best metric, choosing the right one depends on the problem, data distribution and business needs.To evaluate the performance of classification models, we use the following metrics:

**1. Accuracy:**
Accuracy is a fundamental metric used for evaluating the performance of a classification model. It tells us the proportion of correct predictions made by the model out of all predictions.


**2. Precision**
It measures how many of the positive predictions made by the model are actually correct. It's useful when the cost of false positives is high such as in medical diagnoses where predicting a disease when it’s not present can have serious consequences.

**3. Recall**
Recall or Sensitivity measures how many of the actual positive cases were correctly identified by the model. It is important when missing a positive case (false negative) is more costly than false positives.

**4. F1 Score**
The F1 Score is the harmonic mean of precision and recall. It is useful when we need a balance between precision and recall as it combines both into a single number. A high F1 score means the model performs well on both metrics. Its range is [0,1].

Lower recall and higher precision gives us great accuracy but then it misses a large number of instances. More the F1 score better will be performance. It can be expressed mathematically in this way:

**5. Logarithmic Loss (Log Loss)**
Log loss measures the uncertainty of the model’s predictions. It is calculated by penalizing the model for assigning low probabilities to the correct classes. This metric is used in multi-class classification and is helpful when we want to assess a model’s confidence in its predictions. If there are N  samples belonging to the M class, then we calculate the Log loss in this way:


**Question 5: Write a Python program that loads a CSV file into a Pandas DataFrame, 
splits into train/test sets, trains a Logistic Regression model, and prints its accuracy. 
(Use Dataset from sklearn package)**

In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)


iris = load_iris()
# Create a DataFrame
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

df.to_csv("iris.csv", index=False)
df = pd.read_csv("iris.csv")

X_iris = df.drop('target', axis=1)
y_iris = df['target']

# Split into train/test sets
X_train_iris, X_test_iris, y_train_iris, y_test_iris = train_test_split(
    X_iris, y_iris, test_size=0.2, random_state=42, stratify=y_iris)

# Train Logistic Regression model
lr_iris = LogisticRegression(random_state=42, max_iter=200)
lr_iris.fit(X_train_iris, y_train_iris)

# Make predictions and calculate accuracy
y_pred_iris = lr_iris.predict(X_test_iris)
accuracy_iris = accuracy_score(y_test_iris, y_pred_iris)

print(f"Iris Dataset Results:")
print(f"Training set size: {len(X_train_iris)}")
print(f"Test set size: {len(X_test_iris)}")
print(f"Accuracy: {accuracy_iris:.4f}")
print(f"Classification Report:")
print(classification_report(y_test_iris, y_pred_iris, target_names=iris.target_names))



Iris Dataset Results:
Training set size: 120
Test set size: 30
Accuracy: 0.9667
Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      0.90      0.95        10
   virginica       0.91      1.00      0.95        10

    accuracy                           0.97        30
   macro avg       0.97      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30



**Question 6: Write a Python program to train a Logistic Regression model using L2 
regularization (Ridge) and print the model coefficients and accuracy.**

In [2]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)


# Load dataset
cancer = load_breast_cancer()
df = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
df['target'] = cancer.target

# Features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Scale the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Logistic Regression with L2 regularization
model = LogisticRegression(
    penalty='l2',
    solver='lbfgs',
    max_iter=1000,
    random_state=42
)
model.fit(X_train_scaled, y_train)

# Predictions
y_pred = model.predict(X_test_scaled)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)

# Output
print("Model Coefficients:")
for feature, coef in zip(X.columns, model.coef_[0]):
    print(f"{feature}: {coef:.4f}")

print("\nIntercept:", model.intercept_[0])
print(f"\nModel Accuracy: {accuracy:.4f}")


Model Coefficients:
mean radius: -0.5115
mean texture: -0.5527
mean perimeter: -0.4763
mean area: -0.5411
mean smoothness: -0.2125
mean compactness: 0.6483
mean concavity: -0.6021
mean concave points: -0.7042
mean symmetry: -0.1672
mean fractal dimension: 0.1997
radius error: -1.0830
texture error: 0.2488
perimeter error: -0.5443
area error: -0.9291
smoothness error: -0.1603
compactness error: 0.6472
concavity error: 0.1606
concave points error: -0.4438
symmetry error: 0.3605
fractal dimension error: 0.4379
worst radius: -0.9476
worst texture: -1.2551
worst perimeter: -0.7632
worst area: -0.9478
worst smoothness: -0.7466
worst compactness: 0.0555
worst concavity: -0.8232
worst concave points: -0.9537
worst symmetry: -0.9392
worst fractal dimension: -0.1873

Intercept: 0.3022075735370298

Model Accuracy: 0.9825


**Question 7: Write a Python program to train a Logistic Regression model for multiclass 
classification using multi_class='ovr' and print the classification report. 
(Use Dataset from sklearn package)**

In [7]:
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
import warnings
warnings.filterwarnings("ignore")


# Load dataset
wine = load_wine()
df = pd.DataFrame(data=wine.data, columns=wine.feature_names)
df['target'] = wine.target

# Features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Logistic Regression with One-vs-Rest strategy
model = LogisticRegression(
    multi_class='ovr',
    solver='lbfgs',
    max_iter=2000,
    random_state=42
)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=wine.target_names))


Classification Report:
              precision    recall  f1-score   support

     class_0       1.00      1.00      1.00        12
     class_1       0.88      1.00      0.93        14
     class_2       1.00      0.80      0.89        10

    accuracy                           0.94        36
   macro avg       0.96      0.93      0.94        36
weighted avg       0.95      0.94      0.94        36



**Question 8: Write a Python program to apply GridSearchCV to tune C and penalty 
hyperparameters for Logistic Regression and print the best parameters and validation 
accuracy. 
(Use Dataset from sklearn package)**

In [4]:
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)


# Load dataset
wine = load_wine()
df = pd.DataFrame(data=wine.data, columns=wine.feature_names)
df['target'] = wine.target

# Features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Create pipeline (Scaling + Logistic Regression)
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('logreg', LogisticRegression(max_iter=1000, solver='liblinear', multi_class='ovr'))
])

# Define parameter grid
param_grid = {
    'logreg__C': [0.01, 0.1, 1, 10, 100],     # Regularization strength
    'logreg__penalty': ['l1', 'l2']           # Penalty type
}

# GridSearchCV
grid_search = GridSearchCV(
    estimator=pipeline,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

# Best parameters and accuracy
print("Best Parameters:", grid_search.best_params_)
print(f"Best Cross-Validation Accuracy: {grid_search.best_score_:.4f}")


Best Parameters: {'logreg__C': 0.1, 'logreg__penalty': 'l2'}
Best Cross-Validation Accuracy: 0.9862


**Question 9: Write a Python program to standardize the features before training Logistic 
Regression and compare the model's accuracy with and without scaling. 
(Use Dataset from sklearn package)**

In [5]:
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

# Load dataset
wine = load_wine()
df = pd.DataFrame(data=wine.data, columns=wine.feature_names)
df['target'] = wine.target

# Features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Logistic Regression without scaling
model_no_scale = LogisticRegression(max_iter=1000, solver='lbfgs', multi_class='ovr')
model_no_scale.fit(X_train, y_train)
y_pred_no_scale = model_no_scale.predict(X_test)
acc_no_scale = accuracy_score(y_test, y_pred_no_scale)

# Logistic Regression with scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaled = LogisticRegression(max_iter=1000, solver='lbfgs', multi_class='ovr')
model_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = model_scaled.predict(X_test_scaled)
acc_scaled = accuracy_score(y_test, y_pred_scaled)

# Print results
print(f"Accuracy without Scaling: {acc_no_scale:.4f}")
print(f"Accuracy with Scaling:    {acc_scaled:.4f}")


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT

Increase the number of iterations to improve the convergence (max_iter=1000).
You might also want to scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Accuracy without Scaling: 0.9444
Accuracy with Scaling:    1.0000


**Question 10: Imagine you are working at an e-commerce company that wants to 
predict which customers will respond to a marketing campaign. Given an imbalanced 
dataset (only 5% of customers respond), describe the approach you’d take to build a 
Logistic Regression model — including data handling, feature scaling, balancing 
classes, hyperparameter tuning, and evaluating the model for this real-world business 
use case.**

Data Handling & Preprocessing  
I would start by loading the dataset and doing some initial exploratory data analysis (EDA). This involves checking for missing values, understanding data types, and spotting any outliers. For categorical features, I would use techniques like one-hot encoding to change them into a numerical format that the model can use. Next, I would split the data into a training set and a testing set, usually in a 70/30 or 80/20 ratio. It's important to keep the original class distribution in the split, which can be achieved through a stratified split.  

Feature Scaling  
Logistic Regression models are sensitive to the size of features. So, I would scale all numerical features. Standardization (Z-score normalization) is a good option here since it adjusts the data to have a mean of 0 and a standard deviation of 1. This stops features with larger scales from overpowering the model's learning. The scaling parameters (mean and standard deviation) would be learned from the training data and then applied to both the training and testing sets to avoid data leakage.  

Balancing Classes  
Since only 5% of customers respond, the dataset is very imbalanced. Training a model on this data would likely cause it to mostly predict the majority class (non-responders), leading to poor performance on the minority class. To fix this, I would use a method like oversampling the minority class. A common approach is SMOTE (Synthetic Minority Over-sampling Technique), which generates synthetic samples of the minority class to help balance the dataset. I would only apply this to the training data so the testing data accurately reflects the real-world distribution.  

Model Training & Hyperparameter Tuning  
I would use a Logistic Regression model with a penalty to prevent overfitting. The two common types are L1 (Lasso) and L2 (Ridge) penalties. L1 can help with feature selection by driving some coefficients to zero, while L2 is better for reducing the impact of less important features. I would use cross-validation on the training data to adjust the hyperparameters, mainly the regularization strength (C parameter). A Grid Search or Randomized Search would help find the best 'C' and 'penalty' combination for optimal performance.  

Model Evaluation  
Given the imbalanced nature of the data, accuracy is not a reliable measure. Instead, I would focus on metrics that provide better insights for imbalanced datasets:  

- **Precision**: The percentage of positive identifications that were correct. This ensures that when we predict a customer will respond, they actually do.  
- **Recall**: The percentage of actual responders that were correctly identified. This is vital for capturing potential responders.  
- **F1-Score**: The harmonic mean of precision and recall. It offers a single score that balances both metrics.  
- **Area Under the Receiver Operating Characteristic Curve (AUC-ROC)**: This metric assesses the model's ability to tell apart the positive and negative classes. An AUC of 1.0 indicates a perfect model, while 0.5 indicates a random guess. A higher AUC is always better.