###[Q1.] What is the main difference between the Euclidean distance metric and the Manhattan distance metric in KNN? How might this difference affect the performance of a KNN classifier or regressor?
#####[Ans]
- Euclidean Distance measures the straight-line (or "as the crow flies") distance between two points in space. It is more sensitive to large differences in feature values.
Formula:
$$
\text{Distance} = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}
$$

- Manhattan Distance measures the distance between two points along axes at right angles, following a grid-like path. It is less sensitive to outliers.
Formula:
$$
\text{Distance} = \sum_{i=1}^n |x_i - y_i|
$$
- Impact on Performance:

  - Euclidean distance works better for problems where all features contribute equally to the target value.
  - Manhattan distance is preferred when dealing with high-dimensional data or where features have varying scales.

###[Q2.] How do you choose the optimal value of k for a KNN classifier or regressor? What techniques can be used to determine the optimal k value?
#####[Ans]
- The value of 𝑘 controls the number of neighbors considered during prediction.
- Techniques to find optimal 𝑘:
  1. Cross-Validation: Test different values of 𝑘 using validation datasets to determine the value that minimizes the error rate.
  2. Grid Search: Automate the search over a range of 𝑘 values to find the one yielding the best performance.
  3. Elbow Method: Plot the error rate against 𝑘 and choose the 𝑘 where the error stabilizes or is minimized.
- Lower 𝑘 might result in high variance, while higher 𝑘 can lead to high bias.

###[Q3.] How does the choice of distance metric affect the performance of a KNN classifier or regressor? In what situations might you choose one distance metric over the other?
#####[Ans]
- Impact of Distance Metric:

  - The distance metric determines how neighbors are selected.
  - Euclidean distance performs better when feature importance is uniform and differences in magnitude matter.
  - Manhattan distance is better suited for high-dimensional or sparse data.
- Choosing Distance Metric:

  - Use Euclidean distance for continuous data and low dimensions.
  - Use Manhattan distance for categorical data or when features vary widely in scale.

###[Q4.] What are some common hyperparameters in KNN classifiers and regressors, and how do they affect the performance of the model? How might you go about tuning these hyperparameters to improve model performance?
#####[Ans]
- Common Hyperparameters:

  1. k (Number of Neighbors): Controls the size of the neighborhood considered.
Lower 𝑘: Sensitive to noise, low bias, high variance.
    - Higher 𝑘: Smoother decision boundaries, high bias, low variance.
    - Distance Metric: Defines how neighbors are calculated (e.g., Euclidean, Manhattan, Minkowski).
  2. Weighting Scheme: Uniform weighting or distance-weighted neighbors.
  3. Distance-weighted can improve performance by giving closer neighbors more influence.
- Hyperparameter Tuning:

  - Use grid search or random search to explore different combinations of 𝑘, distance metrics, and weighting schemes.
  - Employ cross-validation to evaluate the performance of different configurations.

###[Q5.] How does the size of the training set affect the performance of a KNN classifier or regressor? What techniques can be used to optimize the size of the training set?
#####[Ans]
- Effect of Training Set Size:

  - Larger training sets improve accuracy by providing more data for the algorithm to find patterns.
  - Smaller training sets can lead to underfitting and reduced generalization
  
- Techniques to Optimize Training Set Size:

  1. Use sampling techniques to select representative subsets.
  2. Apply dimensionality reduction (e.g., PCA) to reduce computation without sacrificing performance.
  3. Ensure stratification to maintain class distributions in classification problems.

###[Q6.] What are some potential drawbacks of using KNN as a classifier or regressor? How might you overcome these drawbacks to improve the performance of the model?
#####[Ans]
- Drawbacks:

  1. High Computation Cost: KNN is computationally expensive for large datasets as it requires storing and searching the entire training set.
  2. Curse of Dimensionality: Performance degrades with high-dimensional data because distance metrics become less meaningful.
  3. Sensitive to Noise: KNN is prone to errors if the dataset contains noisy points.
  4. Feature Scaling: Sensitive to differences in scale, requiring careful preprocessing.
- Solutions:

  1. Use KD-Trees or Ball Trees for faster neighbor searches.
  2. Apply feature scaling (e.g., standardization or normalization).
  3. Use dimensionality reduction to mitigate the curse of dimensionality.
  4. Employ weighted KNN to reduce the impact of noisy or distant points.