**Question 1: What is Logistic Regression, and how does it differ from Linear Regression?**

->Logistic Regression is a statistical and machine learning algorithm used for classification problems, where the dependent variable is categorical.
1. Linear Regression is used for predicting continuous numerical values, while Logistic Regression is used for predicting discrete class labels.
2.Linear Regression outputs any real number between negative and positive infinity, whereas Logistic Regression outputs probabilities between 0 and 1.
3. Linear Regression uses a linear function and Mean Squared Error as its loss function, while Logistic Regression uses a logistic function and Log Loss (Cross-Entropy Loss) for training.

**Question 2: Explain the role of the Sigmoid function in Logistic Regression.**

->In Logistic Regression, the Sigmoid function plays a crucial role in converting the linear combination of input features which can take any value from negative to positive infinity into a probability value between 0 and 1.
This transformation is essential because Logistic Regression predicts the likelihood of an instance belonging to a particular class.




**Question 3: What is Regularization in Logistic Regression and why is it needed?**

-> Regularization is a technique used to prevent overfitting by adding a penalty term to the cost function that discourages the model from assigning excessively large weights to features.

Overfitting occurs when the model learns noise or irrelevant patterns from the training data, leading to poor performance on new, unseen data. Regularization works by controlling the complexity of the model, effectively simplifying it so that it generalizes better.

**Question 4: What are some common evaluation metrics for classification models, and why are they important?**

->Common evaluation metrics for classification models include: Accuracy, Precision, Recall, F1-score, and the Area Under the ROC Curve (AUC-ROC).

1. Accuracy measures the proportion of correctly classified instances out of all instances, but it can be misleading if the data is imbalanced.
2. Precision measures the proportion of true positive predictions out of all positive predictions, which is important when the cost of false positives is high.
3. Recall (or Sensitivity) measures the proportion of true positives out of all actual positives, which is important when the cost of false negatives is high.
4. The F1-score is the harmonic mean of Precision and Recall, providing a balance between them, especially when there is an uneven class distribution.
5. The AUC-ROC score measures the model’s ability to distinguish between classes across different classification thresholds, with higher values indicating better performance.

These metrics are important because they give a more complete picture of a model’s effectiveness than accuracy alone, helping to choose the right model and tune it based on the specific needs of the problem.



In [1]:
"""
Question 5: Write a Python program that loads a CSV file into a Pandas DataFrame,
splits into train/test sets, trains a Logistic Regression model,
and prints its accuracy. (Use Dataset from sklearn package)
"""
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

from sklearn.datasets import load_iris
data= load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
df = df[df['target'] != 2]

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Accuracy: 1.00


In [2]:
"""
Question 6: Write a Python program to train a Logistic Regression model using L2 regularization (Ridge)
and print the model coefficients and accuracy. (Use Dataset from sklearn package)
"""
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

from sklearn.datasets import load_iris
data= load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
df = df[df['target'] != 2]

from sklearn.model_selection import train_test_split
X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

from sklearn.linear_model import LogisticRegression
model = LogisticRegression(penalty='l2', solver='liblinear')
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)

print("Model Coefficients:", model.coef_)
print(f"Accuracy: {accuracy:.2f}")


Model Coefficients: [[-0.3753915  -1.39664105  2.15250857  0.96423532]]
Accuracy: 1.00


In [3]:
"""
Question 7: Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr'
and print the classification report. (Use Dataset from sklearn package)
"""
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

from sklearn.model_selection import train_test_split
X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

from sklearn.linear_model import LogisticRegression
model = LogisticRegression(multi_class='ovr', solver='liblinear')
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report

print("Classification Report:\n", classification_report(y_test, y_pred))

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [4]:
"""
Question 8: Write a Python program to apply GridSearchCV to tune C and penalty hyperparameters for Logistic Regression
and print the best parameters and validation accuracy. (Use Dataset from sklearn package)
"""
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

from sklearn.datasets import load_iris
# Load dataset from sklearn
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

from sklearn.model_selection import train_test_split
# Split into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
from sklearn.linear_model import LogisticRegression
# Define the Logistic Regression model
model = LogisticRegression(solver='liblinear')

from sklearn.model_selection import GridSearchCV
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2']
}

# Apply GridSearchCV
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Print best parameters and best score
print("Best Parameters:", grid_search.best_params_)
print(f"Best Cross-Validation Accuracy: {grid_search.best_score_:.2f}")

Best Parameters: {'C': 10, 'penalty': 'l1'}
Best Cross-Validation Accuracy: 0.96


In [6]:
"""
Question 9: Write a Python program to standardize the features before training Logistic Regression
and compare the model's accuracy with and without scaling. (Use Dataset from sklearn package)
"""
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

from sklearn.datasets import load_iris
# Load dataset from sklearn
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target

from sklearn.model_selection import train_test_split
# Split into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
from sklearn.linear_model import LogisticRegression

# Logistic Regression without scaling
model_no_scaling = LogisticRegression(max_iter=200)
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Logistic Regression with scaling
model_scaling = LogisticRegression(max_iter=200)
model_scaling.fit(X_train_scaled, y_train)
y_pred_scaling = model_scaling.predict(X_test_scaled)
accuracy_scaling = accuracy_score(y_test, y_pred_scaling)

# Print results
print(f"Accuracy without scaling: {accuracy_no_scaling:.2f}")
print(f"Accuracy with scaling: {accuracy_scaling:.2f}")


Accuracy without scaling: 1.00
Accuracy with scaling: 1.00


**Question 10: Imagine you are working at an e-commerce company that wants to predict which customers will respond to a marketing campaign. Given an imbalanced dataset (only 5% of customers respond), describe the approach you’d take to build a Logistic Regression model — including data handling, feature scaling, balancing classes, hyperparameter tuning, and evaluating the model for this real-world business use case.**

-> For this problem, I would follow a structured approach to ensure the Logistic Regression model handles the severe class imbalance and produces reliable predictions.

 First, I would start with data understanding and preprocessing by checking for missing values, outliers, and feature distributions. Since Logistic Regression is sensitive to feature scales, I would standardize numerical features using techniques like StandardScaler.

For categorical variables, I would apply one-hot encoding. Next, to address the class imbalance (only 5% positive responses), I would try methods such as class weighting (class_weight='balanced' in Logistic Regression), oversampling the minority class with SMOTE, or undersampling the majority class, depending on which works best in cross-validation. During model training, I would perform hyperparameter tuning using GridSearchCV to optimize parameters like C (regularization strength), penalty (L1/L2), and the solver, ensuring I use cross-validation to avoid overfitting. Since this is a business-critical imbalanced problem, I would not rely solely on accuracy; instead, I would evaluate the model using metrics like Precision, Recall, F1-score, ROC-AUC, and especially the Precision-Recall curve, because they better reflect performance on the minority class.

 Finally, I would choose the classification threshold not just at 0.5 but based on business requirements — for example, maximizing Recall to reach more potential responders, or optimizing Precision to avoid wasting campaign resources. This end-to-end process would produce a balanced, well-tuned Logistic Regression model suitable for the marketing campaign’s goals.

