## 1. What are the key hyperparameters in KNN?

The key hyperparameters in the K-Nearest Neighbors (KNN) algorithm are:

- **n_neighbors**: 
  - Defines the number of nearest neighbors to consider when making a prediction. A smaller value makes the model sensitive to noise, whereas a larger value may smooth out the decision boundaries.

- **weights**: 
  - Specifies the weight function used in prediction. It can take the following values:
    - `'uniform'`: All points in each neighborhood are weighted equally.
    - `'distance'`: Closer neighbors have a greater influence on the prediction.

- **metric**: 
  - Specifies the distance metric used to calculate the distance between points. It can be any of the distance metrics supported by the algorithm (e.g., Euclidean, Manhattan).

- **algorithm**: 
  - Specifies the algorithm used to compute the nearest neighbors. Common options include:
    - `'auto'`: Automatically selects the best algorithm based on the dataset.
    - `'ball_tree'`: A tree structure for efficient neighbor search.
    - `'kd_tree'`: A more efficient tree structure than the brute-force method for high-dimensional data.
    - `'brute'`: A brute-force method for calculating distances.

- **leaf_size**: 
  - Relevant when using the `ball_tree` or `kd_tree` algorithms. It controls the size of leaf nodes in the tree, impacting the speed of the query and the memory used by the model.

- **p**: 
  - Defines the power parameter for the Minkowski distance metric. When `p = 1`, it uses Manhattan distance, and when `p = 2`, it uses Euclidean distance.

- **metric_params**: 
  - Allows for passing additional arguments to the distance metric. It is usually not commonly used unless you are using custom distance metrics.

---
## 2. What distance metrics can be used in KNN?

In KNN, the distance metric defines how the "closeness" of points is measured. The most commonly used distance metrics are:

- **Euclidean Distance (p=2)**: 
  - The straight-line distance between two points in the space, calculated as:

  $$d(x, y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2}$$

  - This is the default distance metric in KNN.

- **Manhattan Distance (p=1)**: 
  - The sum of the absolute differences between the coordinates of two points, calculated as:

  $$d(x, y) = \sum_{i=1}^{n} |x_i - y_i|$$

  - This is useful for grid-like pathfinding problems.

- **Minkowski Distance**: 
  - A generalization of both Euclidean and Manhattan distances, calculated as:

  $$d(x, y) = \left( \sum_{i=1}^{n} |x_i - y_i|^p \right)^{1/p}$$

  - By varying the value of `p`, it can represent both the Manhattan and Euclidean distances.

- **Cosine Similarity**: 
  - Measures the cosine of the angle between two vectors. It is used when dealing with text data (document similarity), where the magnitude of the vectors doesn't matter, only the direction.

- **Hamming Distance**: 
  - Used for categorical features, this measures the number of positions at which the corresponding elements are different between two strings.

- **Chebyshev Distance**: 
  - This distance metric is defined as the maximum of the absolute differences between the coordinates of two points:

  $$d(x, y) = \max_{i} |x_i - y_i|$$

  - It is useful when the dimensions of the data vary greatly.

- **Mahalanobis Distance**: 
  - This distance takes into account the correlations of the data and is useful when the data features are correlated. It is calculated as:

  $$d(x, y) = \sqrt{(x - y)^T S^{-1} (x - y)}$$

  - Where \( S \) is the covariance matrix of the dataset.

These are the most common distance metrics, but KNN can also be customized to use other distance metrics depending on the specific problem.

