**Q1: What is the KNN algorithm?**

K-Nearest Neighbors (KNN) is a simple, non-parametric, and lazy learning algorithm used for classification and regression. It operates by finding the K nearest training examples in the feature space to the given query point and making predictions based on the labels (for classification) or average values (for regression) of these neighbors.



**Q2: How do you choose the value of K in KNN?**

Choosing the value of K in KNN involves balancing the bias-variance trade-off:

* Small K values can lead to high variance and overfitting, as the model becomes sensitive to noise in the data.
* Large K values can lead to high bias and underfitting, as the model becomes too smooth and may ignore important patterns.
A common approach is to use cross-validation to test different K values and select the one that provides the best performance on the validation set.

**Q3: What is the difference between KNN classifier and KNN regressor?**

* KNN Classifier: Predicts the class label for a given input based on the majority class among its K nearest neighbors.
* KNN Regressor: Predicts a continuous value for a given input by averaging the values of its K nearest neighbors.

**Q4: How do you measure the performance of KNN?**

* Classification: Performance can be measured using metrics such as accuracy, precision, recall, F1-score, and confusion matrix.
* Regression: Performance can be measured using metrics such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared.

**Q5: What is the curse of dimensionality in KNN?**

The curse of dimensionality refers to the phenomenon where the performance of KNN deteriorates as the number of dimensions (features) increases. In high-dimensional spaces, the distances between points become less meaningful because data points tend to become equidistant from each other, making it difficult for the KNN algorithm to identify meaningful neighbors.



**Q6: How do you handle missing values in KNN?**

Handling missing values in KNN can be done using:

* Imputation: Filling in missing values with statistical measures (mean, median) or using models to predict the missing values.
* Removing instances: Dropping instances with missing values, although this might not be feasible if too much data is lost.
* Using algorithms: Employing algorithms that can handle missing values, such as k-Nearest Neighbor imputation, where missing values are filled using the average of the nearest neighbors.

**Q7: Compare and contrast the performance of the KNN classifier and regressor. Which one is better for which type of problem?**

* KNN Classifier: Suitable for classification tasks where the goal is to assign a label to an input. It performs well when there are clear decision boundaries and sufficient labeled data.
* KNN Regressor: Suitable for regression tasks where the goal is to predict a continuous value. It works well when the relationship between input features and the target variable is smooth and not overly complex.

**Q8: What are the strengths and weaknesses of the KNN algorithm for classification and regression tasks, and how can these be addressed?**


`Strengths:`

* Simple and easy to implement.
* No assumptions about the underlying data distribution.
* Can perform well with a large amount of data.

`Weaknesses:`

* Computationally expensive at prediction time, especially with large datasets.
* Sensitive to the choice of K and the distance metric.
* Can suffer from the curse of dimensionality.

`Addressing weaknesses:`

* Use dimensionality reduction techniques (e.g., PCA) to mitigate the curse of dimensionality.
* Use efficient data structures (e.g., KD-trees, Ball-trees) to speed up neighbor searches.
* Employ cross-validation to select the optimal K and distance metric.


Q9: What is the difference between Euclidean distance and Manhattan distance in KNN?


* Euclidean Distance: Measures the straight-line distance between two points in Euclidean space. It is computed as the square root of the sum of the squared differences between corresponding features.

![Screenshot 2024-08-04 182028.png](attachment:42fb2d30-0835-4dac-956a-5581e7675c34.png)

* Manhattan Distance: Measures the distance between two points by summing the absolute differences between corresponding features.

![Screenshot 2024-08-04 182033.png](attachment:892c466e-e8d1-49a2-8f75-a554b19e0a34.png)

**Q10: What is the role of feature scaling in KNN?**

Feature scaling is crucial in KNN because the algorithm relies on distance calculations. Features with larger ranges can disproportionately influence the distance measurements, leading to biased results. Scaling features to a similar range (e.g., using standardization or normalization) ensures that each feature contributes equally to the distance calculations, improving the performance and reliability of the KNN algorithm.






