### Q1

- **Grid Search CV**:
  - Exhaustively searches over all combinations of specified hyperparameters.
  - Evaluates each combination using cross-validation.
  - Finds the best model configuration.
  - Computationally expensive, suitable for small hyperparameter spaces.

- **Randomized Search CV**:
  - Randomly samples a fixed number of hyperparameter combinations.
  - More efficient for larger hyperparameter spaces.
  - Faster but might not find the optimal parameters.

### Q2

- **Grid Search CV**:
  - Evaluates all possible combinations of hyperparameters.
  - Guarantees finding the optimal set but is computationally intensive.

- **Randomized Search CV**:
  - Samples random combinations of hyperparameters.
  - Quicker and less resource-intensive.
  - Ideal for large hyperparameter spaces or limited computational resources.

### Q3

- **Data Leakage**:
  - Occurs when the model uses information unavailable at prediction time.
  - Leads to overly optimistic performance estimates.
  - Results in poor generalization to new data.

- **Example**:
  - Including future sales data to predict current customer behavior introduces leakage.

### Q4

- **Prevention Strategies**:
  - Use pipelines to apply data transformations separately on training and testing datasets.
  - Select features available at prediction time only.
  - Perform train-test splits before any data preprocessing steps.

### Q5

- **Confusion Matrix**:
  - Evaluates classification model performance.
  - Components: True Positives (TP), True Negatives (TN), False Positives (FP), False Negatives (FN).
  - Provides detailed insights into model predictions.

### Q6

- **Precision**:
  - Ratio of true positive predictions to total predicted positives.
  - Focuses on the accuracy of positive predictions.
  - Formula: \( \text{Precision} = \frac{TP}{TP + FP} \).

- **Recall**:
  - Ratio of true positive predictions to actual total positives.
  - Emphasizes capturing all positive instances.
  - Formula: \( \text{Recall} = \frac{TP}{TP + FN} \).

### Q7

- **Interpreting Errors**:
  - **False Positives (FP)**: Incorrect positive predictions, critical in minimizing false alarms.
  - **False Negatives (FN)**: Missed positive cases, crucial in contexts like medical diagnosis.

### Q8

- **Metrics from Confusion Matrix**:
  - **Accuracy**: \(\frac{TP + TN}{TP + TN + FP + FN}\) - Overall model correctness.
  - **Precision**: \(\frac{TP}{TP + FP}\) - Accuracy of positive predictions.
  - **Recall**: \(\frac{TP}{TP + FN}\) - Ability to find all positive instances.
  - **F1-Score**: \(2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\) - Balance of precision and recall.
  - **Specificity**: \(\frac{TN}{TN + FP}\) - Ability to identify negative instances.

### Q9

- **Accuracy and Confusion Matrix**:
  - Proportion of correct predictions (TP + TN) to total predictions.
  - May be misleading in imbalanced datasets.

### Q10

- **Identifying Biases**:
  - Analyze false positive and false negative rates.
  - High false positive rate: Potential bias towards predicting the positive class.
  - High false negative rate: Potential bias towards predicting the negative class.
  - Consider the problem context to prioritize which errors to address.
