In [3]:
# Q1. Purpose of Grid Search CV
# Grid Search CV is used to find the optimal hyperparameters for a machine learning model.
# It works by exhaustively searching over a specified parameter grid.
# The performance of each combination is evaluated using cross-validation.

In [4]:

# Q2. Difference between Grid Search CV and Randomized Search CV
# Grid Search CV tests all possible combinations of hyperparameters.
# This ensures finding the best combination but can be computationally expensive.
# Randomized Search CV, on the other hand, samples random combinations.
# It is faster and more efficient for larger parameter spaces.

In [5]:
# Q3. What is Data Leakage?
# Data leakage occurs when information outside the training dataset is used to create the model.
# This can lead to overfitting and unrealistic performance metrics.
# Example: Using future data or features derived from the target variable in training.

In [6]:
# Q4. Preventing Data Leakage
# 1. Split data into train/test sets before preprocessing or feature selection.
# 2. Apply preprocessing steps, such as scaling or encoding, only to the training data.
# 3. Use the same transformations from training data on test data.
# 4. Avoid including features that are directly correlated with the target variable.

In [7]:
# Q5. Confusion Matrix
# A confusion matrix is a table that summarizes the performance of a classification model.
# It displays counts of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).

# Example:
from sklearn.metrics import confusion_matrix
import numpy as np

# Predictions and actual values
y_true = [1, 0, 1, 1, 0, 1]
y_pred = [1, 0, 1, 0, 0, 1]
cm = confusion_matrix(y_true, y_pred)
print(cm)

[[2 0]
 [1 3]]


In [8]:
# Q6. Precision vs Recall
# Precision measures the accuracy of positive predictions.
# It is calculated as: Precision = TP / (TP + FP)
# Recall measures the model's ability to capture positive cases.
# It is calculated as: Recall = TP / (TP + FN)

In [9]:
# Q7. Interpreting Confusion Matrix Errors
# False Positives (FP): Cases that were predicted as positive but are actually negative.
# These are also known as Type I errors.
# False Negatives (FN): Cases that were predicted as negative but are actually positive.
# These are also known as Type II errors.

In [10]:
# Q8. Metrics Derived from Confusion Matrix
# 1. Accuracy: Measures overall correctness.
# Formula: Accuracy = (TP + TN) / (Total Samples)
# 2. Precision: Measures the accuracy of positive predictions.
# Formula: Precision = TP / (TP + FP)
# 3. Recall: Measures the ability to capture all positive cases.
# Formula: Recall = TP / (TP + FN)
# 4. F1-Score: Harmonic mean of precision and recall.
# Formula: F1-Score = 2 * (Precision * Recall) / (Precision + Recall)

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

           0       0.67      1.00      0.80         2
           1       1.00      0.75      0.86         4

    accuracy                           0.83         6
   macro avg       0.83      0.88      0.83         6
weighted avg       0.89      0.83      0.84         6



In [11]:
# Q9. Relationship Between Accuracy and Confusion Matrix
# Accuracy is calculated based on the values in the confusion matrix.
# However, it may not be reliable for imbalanced datasets.
# Metrics like Precision, Recall, and F1-Score are better indicators in such cases.

In [12]:
# Q10. Identifying Bias or Limitations
# The confusion matrix can reveal class imbalances.
# For example, high False Negatives may indicate poor performance for minority classes.
# Additional metrics, like precision and recall, can help identify model biases.
# It is also essential to analyze errors to determine whether the model needs improvement.