### K-Nearest Neighbors (KNN) Algorithm

**Key Concepts of KNN**

- **Instance-Based Learning**: 
    - KNN is a type of instance-based learning where the algorithm does not explicitly learn a model.
    - Instead, it memorizes the training data and makes predictions by comparing new data points to the stored instances.
    
- **Distance Metric**: 
    - KNN relies on a distance metric to find the nearest neighbors. Common distance metrics include:
        - **Euclidean Distance**: \( \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2} \)
        - **Manhattan Distance**: \( \sum_{i=1}^{n} |x_i - y_i| \)
        - **Minkowski Distance**: A generalized form of Euclidean and Manhattan distances.
    
- **Parameter K**: 
    - The parameter \( k \) represents the number of nearest neighbors to consider when making a prediction. 
    - The choice of \( k \) can significantly affect the performance of the algorithm. 
    - A smaller \( k \) can be noisy and susceptible to outliers, while a larger \( k \) provides a smoother decision boundary.


    
- **Regression**:
    - **Training Phase**: The algorithm stores the training data.
    - **Prediction Phase**:
        - For a new data point, calculate the distance between this point and all the points in the training set.
        - Select the \( k \) training points that are closest to the new data point.
        - Predict the value as the average of the values of these \( k \) neighbors.



**How KNN Works**

- **Classification**:
    - **Training Phase**: The algorithm stores the training data.
    - **Prediction Phase**:
        - For a new data point, calculate the distance between this point and all the points in the training set.
        - Select the \( k \) training points that are closest to the new data point.
        - Assign the most common class among these \( k \) neighbors to the new data point.

### Evaluation of KNN

1. **Accuracy**
    - Measures the proportion of correctly classified instances out of the total instances.
    - Suitable for balanced datasets.
    - Formula: 
      \[
      \text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}
      \]

2. **Precision**
    - Measures the proportion of true positive instances out of the total predicted positive instances.
    - Important for scenarios where false positives are costly.
    - Formula:
      \[
      \text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}
      \]

3. **Recall (Sensitivity)**
    - Measures the proportion of true positive instances out of the total actual positive instances.
    - Important for scenarios where false negatives are costly.
    - Formula:
      \[
      \text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}
      \]

4. **F1 Score**
    - Harmonic mean of precision and recall.
    - Balances precision and recall, useful when the dataset is imbalanced.
    - Formula:
      \[
      \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
      \]

5. **Confusion Matrix**
    - A table used to describe the performance of a classification model.
    - Contains the counts of true positives, true negatives, false positives, and false negatives.
    - Example:

      |                | Predicted Positive | Predicted Negative |
      |----------------|--------------------|--------------------|
      | **Actual Positive** | True Positive (TP)    | False Negative (FN)   |
      | **Actual Negative** | False Positive (FP)   | True Negative (TN)    |

6. **ROC Curve (Receiver Operating Characteristic Curve)**
    - A plot of the true positive rate (recall) against the false positive rate at various threshold settings.
    - Helps visualize the performance of a classifier.

7. **AUC (Area Under the Curve)**
    - Represents the area under the ROC curve.
    - Values range from 0 to 1, where a value closer to 1 indicates better performance.

8. **Cross-Validation**
    - Technique to evaluate the performance of a model by partitioning the data into subsets.
    - Common methods include k-fold cross-validation and stratified k-fold cross-validation.
    - Helps in assessing the model's generalizability to unseen data.

