# KNN Algorithm - Assignment

## Q1. What is the KNN algorithm?
K-Nearest Neighbors (KNN) is a simple, non-parametric algorithm used for classification and regression tasks. It works by finding the `k` closest data points (neighbors) in the training dataset to a new input and making predictions based on the majority label (for classification) or the average value (for regression) of these neighbors.

---

## Q2. How do you choose the value of K in KNN?
The value of `K` can be chosen using cross-validation. The optimal value depends on the specific dataset, and choosing a smaller `K` can lead to overfitting, while a larger `K` can smooth the decision boundaries, potentially leading to underfitting. A common practice is to try different values of `K` and select the one that gives the best performance on the validation set.

---

## Q3. What is the difference between KNN classifier and KNN regressor?
- **KNN Classifier**: Predicts the class of a data point based on the majority class of the nearest neighbors.
- **KNN Regressor**: Predicts a continuous value by averaging the values of the nearest neighbors.

The main difference is that the classifier deals with categorical outcomes, while the regressor deals with continuous outcomes.

---

## Q4. How do you measure the performance of KNN?
- **For classification**: Performance can be measured using accuracy, precision, recall, F1-score, or confusion matrix.
- **For regression**: Metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE), or R-squared are commonly used to evaluate performance.

---

## Q5. What is the curse of dimensionality in KNN?
The curse of dimensionality refers to the phenomenon where the performance of KNN deteriorates as the number of features (dimensions) increases. This happens because the distance between points becomes less meaningful in high-dimensional spaces, making it harder to distinguish between neighbors and leading to poor predictions.

---

## Q6. How do you handle missing values in KNN?
In KNN, missing values can be handled by:
1. **Imputation**: Filling in the missing values with the mean, median, or mode of the column.
2. **Distance-weighted neighbors**: Predicting the missing value by averaging the values of the `k` nearest neighbors, weighted by their distance to the missing data point.

---

## Q7. Compare and contrast the performance of the KNN classifier and regressor. Which one is better for which type of problem?
- **KNN Classifier**: Better suited for classification tasks where the output is categorical.
- **KNN Regressor**: Best for problems where the output is a continuous variable.

The classifier tends to be more robust to outliers, while the regressor may struggle in high-dimensional spaces or with noisy data.

---

## Q8. What are the strengths and weaknesses of the KNN algorithm for classification and regression tasks, and how can these be addressed?
### Strengths:
- **Simple**: Easy to understand and implement.
- **No training phase**: It's a lazy learner and does not require model training.
  
### Weaknesses:
- **Computationally expensive**: Requires calculating the distance to all training points for each prediction.
- **Sensitive to irrelevant features**: Feature selection and scaling are important.
- **Poor performance in high dimensions**: Due to the curse of dimensionality.

### Addressing weaknesses:
- Use dimensionality reduction techniques like PCA.
- Optimize feature scaling to ensure meaningful distances.
- Use approximate nearest neighbors algorithms to reduce computation time.

---

## Q9. What is the difference between Euclidean distance and Manhattan distance in KNN?
- **Euclidean distance**: The straight-line distance between two points in space. It is computed as the square root of the sum of the squared differences between the coordinates of the points.
  
- **Manhattan distance**: The distance between two points measured along axes at right angles. It is computed as the sum of the absolute differences between the coordinates of the points.

The choice of distance metric can affect the performance of KNN depending on the distribution of the data.

---

## Q10. What is the role of feature scaling in KNN?
Feature scaling is crucial in KNN because the algorithm relies on distance calculations. If the features are on different scales, the algorithm will give more weight to features with larger ranges, potentially leading to inaccurate predictions. Scaling methods like Min-Max normalization or Standardization (Z-score normalization) should be applied to ensure that all features contribute equally to the distance calculation.

---

