### Q1. Purpose of Grid Search CV in Machine Learning
**Grid Search Cross-Validation (CV)** is used to find the optimal hyperparameters for a machine learning model. It works by exhaustively searching through a predefined set of hyperparameter combinations, training and evaluating the model on each combination using cross-validation. The best combination is selected based on performance metrics.

### Q2. Difference Between Grid Search CV and Randomized Search CV
- **Grid Search CV**: Tests all possible combinations of specified hyperparameters. It can be computationally expensive if the parameter grid is large.
- **Randomized Search CV**: Randomly selects a specified number of combinations from the hyperparameter grid. It is more efficient, especially with large parameter spaces.

**When to Choose**:
- Use **Grid Search CV** when you have a smaller number of hyperparameters and computational resources are not limited.
- Use **Randomized Search CV** for larger hyperparameter spaces or when you want quicker results without exhaustive search.

### Q3. What is Data Leakage, and Why is it a Problem?
**Data leakage** occurs when information from outside the training dataset is used to create the model, leading to over-optimistic performance during training but poor generalization on new data.

Example: Using future information in the training data, like including the target variable or any related information before splitting the data.

### Q4. How to Prevent Data Leakage
To prevent data leakage:
- **Split the data properly**: Ensure that the training set does not contain information from the test set.
- **Perform transformations** (e.g., scaling, encoding) **after splitting** to avoid using test data information in the training process.
- **Carefully choose features**: Ensure they do not include information that would not be available in a real-world scenario.

### Q5. What is a Confusion Matrix?
A **Confusion Matrix** is a table that summarizes the performance of a classification model by displaying the counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). It helps assess how well the model differentiates between classes.

### Q6. Difference Between Precision and Recall
- **Precision**: The proportion of true positives among the predicted positives. It reflects how many predicted positive results were correct.
  \[
  \text{Precision} = \frac{TP}{TP + FP}
  \]
- **Recall** (Sensitivity): The proportion of true positives among the actual positives. It reflects how many actual positives were correctly identified.
  \[
  \text{Recall} = \frac{TP}{TP + FN}
  \]

### Q7. Interpreting a Confusion Matrix for Error Types
From a confusion matrix, you can:
- Identify **false positives (Type I error)** where the model incorrectly classifies a negative instance as positive.
- Identify **false negatives (Type II error)** where the model incorrectly classifies a positive instance as negative.
Analyzing these helps understand if the model is more prone to certain types of errors.

### Q8. Common Metrics Derived from a Confusion Matrix
- **Accuracy**: The proportion of correctly classified instances out of all instances.
  \[
  \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
  \]
- **F1 Score**: The harmonic mean of precision and recall, balancing the two.
  \[
  F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
  \]
- **Specificity**: The proportion of true negatives among all negatives.
  \[
  \text{Specificity} = \frac{TN}{TN + FP}
  \]

### Q9. Relationship Between Accuracy and Confusion Matrix Values
Accuracy is based on the sum of true positives and true negatives relative to the total number of instances. However, it does not differentiate between types of errors, making it less reliable with imbalanced datasets where precision and recall can provide more insight.

### Q10. Using a Confusion Matrix to Identify Biases or Limitations
A confusion matrix can reveal potential biases by:
- **High false positives**: Indicating that the model might be overly sensitive to the positive class.
- **High false negatives**: Indicating that the model might be overly conservative, favoring the negative class.
By examining the confusion matrix, you can identify if the model favors certain classes, and adjust the model or consider rebalancing the dataset to mitigate these biases.