**Q1. Purpose of Grid Search CV:**

Grid Search Cross-Validation (Grid Search CV) is a technique used for hyperparameter tuning in machine learning. The purpose is to systematically search through a predefined set of hyperparameter combinations, evaluate the model's performance for each combination using cross-validation, and determine the hyperparameters that yield the best performance.

**How it works:**
1. Define a grid of hyperparameter values to explore.
2. Train and evaluate the model for each combination using cross-validation.
3. Select the hyperparameter combination that performs the best based on a specified metric (e.g., accuracy, F1 score).

**Q2. Difference between Grid Search CV and Randomized Search CV:**

- **Grid Search CV:** Exhaustively searches through all possible hyperparameter combinations in the specified grid. It can be computationally expensive, especially with a large parameter space.

- **Randomized Search CV:** Randomly samples a specified number of hyperparameter combinations from the parameter space. It is more efficient for large search spaces and can be advantageous when computational resources are limited.

Choose between them based on the computational budget, the size of the hyperparameter space, and the desired level of exploration.

**Q3. Data Leakage:**

Data leakage occurs when information from outside the training dataset is used to create a machine learning model. It can lead to overly optimistic performance estimates during training but result in poor generalization to new, unseen data.

**Example:**
If you include information from the test set (e.g., using future information) in the training process, the model may learn patterns that do not generalize. For instance, using target variable values that occur after the time point of prediction in time-series data would be a form of data leakage.

**Q4. Preventing Data Leakage:**

- **Separation of Training and Test Sets:** Ensure that the training set and test set are independent, with no overlap in time or information.
  
- **Feature Engineering:** Be cautious when creating features to avoid inadvertently including information that the model wouldn't have at prediction time.

- **Cross-Validation:** Use techniques like cross-validation to evaluate model performance without leaking information from the test set into the training process.

**Q5. Confusion Matrix:**

A confusion matrix is a table that describes the performance of a classification model. It compares the predicted class labels with the actual class labels and categorizes predictions as true positives, true negatives, false positives, and false negatives.

**Q6. Precision and Recall:**

- **Precision:** The ratio of true positives to the total predicted positives. It measures the accuracy of positive predictions.
\[ \text{Precision} = \frac{\text{True Positives}}{\text{True Positives + False Positives}} \]

- **Recall (Sensitivity):** The ratio of true positives to the total actual positives. It measures the ability of the model to capture all the relevant instances.
\[ \text{Recall} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}} \]

**Q7. Interpreting Confusion Matrix for Errors:**

- **False Positive (Type I Error):** Model predicts positive when it's actually negative.
  
- **False Negative (Type II Error):** Model predicts negative when it's actually positive.

**Q8. Common Metrics from Confusion Matrix:**

- **Accuracy:** \[ \frac{\text{True Positives + True Negatives}}{\text{Total Population}} \]
  
- **F1 Score:** \[ \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision + Recall}} \]

**Q9. Accuracy and Confusion Matrix:**

Accuracy is the overall correctness of the model and is calculated using elements from the confusion matrix. It's the ratio of correctly predicted instances to the total instances.

\[ \text{Accuracy} = \frac{\text{True Positives + True Negatives}}{\text{Total Population}} \]

**Q10. Using Confusion Matrix to Identify Biases or Limitations:**

Examine the confusion matrix for each class, especially in imbalanced datasets. If there are significant discrepancies in precision and recall across classes, it may indicate biases or limitations. Additionally, investigate the impact of false positives and false negatives on different classes to understand potential sources of bias in predictions.