### 1. What is the difference between precision and recall?

#### **Precision**:
- Precision measures how many of the predicted **positive** cases are actually positive.
- **Formula**:  
  \[
  \text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}}
  \]
- It focuses on **reducing false positives**.
- **Example**:  
  If a spam filter identifies 100 emails as spam, and 90 are truly spam, the precision is \(90/100 = 0.9\) or 90%.

#### **Recall**:
- Recall measures how many of the actual **positive** cases were correctly identified by the model.
- **Formula**:  
  \[
  \text{Recall} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}}
  \]
- It focuses on **reducing false negatives**.
- **Example**:  
  If there are 100 spam emails, and the model catches 90 of them, the recall is \(90/100 = 0.9\) or 90%.

---

#### **Key Difference**:
| Aspect       | Precision                              | Recall                                 |
|--------------|----------------------------------------|----------------------------------------|
| **Focus**    | Correctness of **positive predictions** | Completeness of identifying positives |
| **Goal**     | Reduce **false positives**             | Reduce **false negatives**            |
| **Use Case** | When false positives are costly (e.g., email spam detection). | When false negatives are costly (e.g., disease detection). |

---

### 2. What is cross-validation, and why is it important in binary classification?

#### **Cross-Validation**:
- Cross-validation is a statistical technique used to evaluate the performance of a machine learning model by dividing the data into **training** and **testing** subsets multiple times.

#### **Common Types**:
1. **K-Fold Cross-Validation**:  
   Divides the dataset into \( k \) subsets (folds). The model is trained on \( k-1 \) folds and tested on the remaining fold, repeated \( k \) times.
   
2. **Stratified K-Fold Cross-Validation**:  
   Ensures that each fold has a proportional representation of each class (important for imbalanced datasets).

---

#### **Why is it important in binary classification?**
1. **Reduces Overfitting**:  
   Cross-validation provides a more reliable estimate of model performance by testing the model on unseen data in multiple splits, preventing overfitting to a specific train-test split.

2. **Reliable Performance Metrics**:  
   By averaging metrics across folds (e.g., accuracy, precision, recall, F1-score), you get a robust estimate of how the model performs on unseen data.

3. **Handles Imbalanced Datasets**:  
   Stratified cross-validation ensures class proportions are maintained in each fold, leading to a fair evaluation.

4. **Utilizes Data Efficiently**:  
   Cross-validation uses the entire dataset for training and testing, ensuring no data is wasted.

---

#### **Example**:
In a binary classification problem (e.g., spam detection), cross-validation ensures the model is tested on different subsets of data. This prevents biases and ensures that performance metrics reflect the model’s ability to generalize to unseen data, which is crucial for imbalanced datasets or limited data availability.
