In [None]:
### Q1. Difference Between Linear Regression and Logistic Regression

# Linear Regression:
# - Used for predicting continuous dependent variables.
# - Output is a continuous value.
# - Fits a line to the data using the least squares method.
# - Example: Predicting house prices based on area and location.

# Logistic Regression:
# - Used for predicting categorical dependent variables.
# - Output is a probability that maps to a category using a sigmoid function.
# - Example: Classifying emails as spam or not spam.

### Q2. Cost Function in Logistic Regression

# - The cost function used is the Log Loss or Binary Cross-Entropy.
# - Formula: \( J(\theta) = - \frac{1}{m} \sum_{i=1}^m \big[y_i \log(h_\theta(x_i)) + (1 - y_i) \log(1 - h_\theta(x_i))\big] \)
# - Optimized using Gradient Descent to find the parameters that minimize the cost.

### Q3. Regularization in Logistic Regression

# - Regularization adds a penalty term to the cost function to prevent overfitting.
# - Types:
#   1. L1 Regularization (Lasso): Adds \(\lambda \sum |\theta_j|\).
#   2. L2 Regularization (Ridge): Adds \(\lambda \sum \theta_j^2\).
# - Helps by penalizing large coefficients, encouraging simpler models.

### Q4. ROC Curve

# - Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR).
# - Area Under the Curve (AUC) measures the model’s ability to distinguish between classes.
# - Ideal AUC is 1; random guessing yields an AUC of 0.5.

### Q5. Feature Selection in Logistic Regression

# Common techniques:
# - Recursive Feature Elimination (RFE): Iteratively removes the least important features.
# - L1 Regularization: Shrinks irrelevant feature coefficients to zero.
# - Mutual Information: Measures the dependency between features and the target variable.
# - Helps reduce model complexity and improve interpretability and performance.

### Q6. Handling Imbalanced Datasets

# Strategies:
# - Resampling:
#   1. Oversampling (e.g., SMOTE).
#   2. Undersampling.
# - Adjusting class weights: Assign higher weights to minority class samples.
# - Using appropriate evaluation metrics: Precision, Recall, F1-Score, and AUC-ROC.

### Q7. Common Issues and Challenges

# - Multicollinearity:
#   1. Use Variance Inflation Factor (VIF) to detect it.
#   2. Drop correlated features or use PCA.
# - Non-linearity:
#   1. Logistic regression assumes linear relationships. Consider feature engineering or using non-linear models.
# - Overfitting:
#   1. Use regularization or cross-validation.
# - Outliers:
#   1. Detect using box plots or IQR and consider removing them.

### Example Code:
```python
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc, classification_report
from sklearn.preprocessing import StandardScaler

# Sample Data
X, y = datasets.load_breast_cancer(return_X_y=True)

# Splitting Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scaling Features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Logistic Regression
model = LogisticRegression(penalty='l2', class_weight='balanced')
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test, model.predict_proba(X_test)[:, 1])
roc_auc = auc(fpr, tpr)

# Plotting ROC
plt.figure()
plt.plot(fpr, tpr, label=f'ROC curve (area = {roc_auc:.2f})')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc='lower right')
plt.show()

# Classification Report
print(classification_report(y_test, y_pred))
