In [None]:
Q1. What is K‑Nearest Neighbors (KNN) and how does it work?
Answer: KNN is a non‑parametric, instance‑based learning algorithm. It classifies or predicts outcomes by finding the K closest
data points (neighbors) to a query point using a distance metric, then making decisions based on those neighbors (majority vote for classification, 
average for regression).

===================================================================================================================================
Q2. What is the difference between KNN Classification and KNN Regression?
Answer:
- Classification: Predicts categorical labels using majority voting among neighbors.
- Regression: Predicts continuous values using the average (or weighted average) of neighbors.

===================================================================================================================================    
Q3. What is the role of the distance metric in KNN?
Answer: The distance metric (e.g., Euclidean, Manhattan, Minkowski, cosine similarity) determines how “closeness” between points is measured.
Choice of metric directly affects neighbor selection and model accuracy.

===================================================================================================================================
Q4. What is the Curse of Dimensionality in KNN?
Answer: As the number of features increases, distances between points become less meaningful (all points appear equally far).
    This reduces KNN’s effectiveness and increases computational cost.

===================================================================================================================================
Q5. How can we choose the best value of K in KNN?
Answer:
- Use cross‑validation to test different K values.
- Small K → sensitive to noise, high variance.
- Large K → smoother decision boundary, higher bias.
- Often chosen as an odd number to avoid ties.

===================================================================================================================================
Q6. What are KD Tree and Ball Tree in KNN?
Answer:
- KD Tree: A binary tree that partitions space along feature axes, efficient for low‑dimensional data.
- Ball Tree: A tree that partitions space into hyperspheres (balls), better for high‑dimensional data.

===================================================================================================================================
Q7. When should you use KD Tree vs. Ball Tree?
Answer:
- KD Tree: Best for dimensions < 20.
- Ball Tree: Preferred for higher dimensions or when data distribution is irregular.

===================================================================================================================================
Q8. What are the disadvantages of KNN?
Answer:
- Computationally expensive at prediction time.
- Sensitive to irrelevant/noisy features.
- Poor performance in high dimensions.
- Requires feature scaling.

===================================================================================================================================
Q9. How does feature scaling affect KNN?
Answer: Since KNN relies on distance metrics, features with larger ranges dominate. Scaling (standardization or normalization)
    ensures fair contribution of all features.

===================================================================================================================================
Q10. How does KNN handle missing values in a dataset?
Answer: KNN can impute missing values by finding nearest neighbors and replacing missing entries with the mean/median/mode of those neighbors.
PCA (Principal Component Analysis)

===================================================================================================================================
Q11. What is PCA (Principal Component Analysis)?
Answer: PCA is a dimensionality reduction technique that transforms correlated features into a smaller set of uncorrelated variables
    called principal components, capturing maximum variance.

===================================================================================================================================
Q12. How does PCA work?
Answer:
- Standardize data.
- Compute covariance matrix.
- Find eigenvalues and eigenvectors.
- Sort eigenvectors by eigenvalues (variance explained).
- Project data onto top components.

===================================================================================================================================
Q13. What is the geometric intuition behind PCA?
Answer: PCA finds new axes (principal components) that maximize variance. Geometrically, it rotates the coordinate system to align with
    \directions of greatest spread in the data.

===================================================================================================================================
Q14. What is the difference between Feature Selection and Feature Extraction?
Answer:
- Feature Selection: Chooses a subset of original features.
- Feature Extraction: Creates new features (like PCA components) from transformations of original features.

===================================================================================================================================
Q15. What are Eigenvalues and Eigenvectors in PCA?
Answer:
- Eigenvectors: Directions of principal components (axes of maximum variance).
- Eigenvalues: Magnitude of variance captured along each eigenvector.

===================================================================================================================================
Q16. How do you decide the number of components to keep in PCA?
Answer:
- Use explained variance ratio (e.g., keep components explaining 95% variance).
- Scree plot (elbow method).
- Cross‑validation for downstream tasks.

\===================================================================================================================================
Q17. Can PCA be used for classification?
Answer: PCA itself is unsupervised, but reduced features can be fed into classifiers (like KNN, SVM) to improve efficiency and reduce overfitting.
===================================================================================================================================
Q18. What are the limitations of PCA?
Answer:
- Assumes linear relationships.
- Sensitive to scaling.
- May discard features important for classification but low in variance.
- Hard to interpret transformed components.

===================================================================================================================================
Q19. How do KNN and PCA complement each other?
Answer: PCA reduces dimensionality, mitigating the curse of dimensionality and speeding up KNN. KNN then uses the reduced feature space for
                                more effective classification/regression.

===================================================================================================================================
Q20. What are the key differences between PCA and Linear Discriminant Analysis (LDA)?
Answer:
- PCA: Unsupervised, maximizes variance, doesn’t use class labels.
- LDA: Supervised, maximizes class separability, uses labels to find discriminant axes.
- PCA → dimensionality reduction; LDA → classification improvement.



# PRACTICALS

In [None]:
#Q21. Train a KNN Classifier on the Iris dataset and print model accuracy
#Answer:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train KNN Classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Predict and evaluate
y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

#===================================================================================================================================



#Q22. Train a KNN Regressor on a synthetic dataset and evaluate using Mean Squared Error (MSE)
#Answer:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Synthetic dataset
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, 100)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train KNN Regressor
knn_reg = KNeighborsRegressor(n_neighbors=5)
knn_reg.fit(X_train, y_train)

# Predict and evaluate
y_pred = knn_reg.predict(X_test)
print("MSE:", mean_squared_error(y_test, y_pred))

#===================================================================================================================================


#Q23. Train a KNN Classifier using different distance metrics (Euclidean and Manhattan) and compare accuracy
#Answer:
from sklearn.metrics import accuracy_score

# Euclidean distance
knn_euclidean = KNeighborsClassifier(n_neighbors=5, metric='euclidean')
knn_euclidean.fit(X_train, y_train)
y_pred_euclidean = knn_euclidean.predict(X_test)
print("Euclidean Accuracy:", accuracy_score(y_test, y_pred_euclidean))

# Manhattan distance
knn_manhattan = KNeighborsClassifier(n_neighbors=5, metric='manhattan')
knn_manhattan.fit(X_train, y_train)
y_pred_manhattan = knn_manhattan.predict(X_test)
print("Manhattan Accuracy:", accuracy_score(y_test, y_pred_manhattan))


#===================================================================================================================================
#Q24. Train a KNN Classifier with different values of K and visualize decision boundaries
#Answer:
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

# Use only first two features for visualization
X_vis, y_vis = iris.data[:, :2], iris.target
X_train, X_test, y_train, y_test = train_test_split(X_vis, y_vis, test_size=0.3, random_state=42)

# Decision boundary plot function
def plot_decision_boundary(knn, X, y, title):
    h = .02
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.3, cmap=ListedColormap(('red','green','blue')))
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap=ListedColormap(('red','green','blue')))
    plt.title(title)
    plt.show()

# Try different K values
for k in [1, 5, 10]:
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    plot_decision_boundary(knn, X_vis, y_vis, f"K={k}")

#===================================================================================================================================

#Q25. Apply Feature Scaling before training a KNN model and compare results with unscaled data
#Answer:
from sklearn.preprocessing import StandardScaler

# Without scaling
knn_unscaled = KNeighborsClassifier(n_neighbors=5)
knn_unscaled.fit(X_train, y_train)
print("Unscaled Accuracy:", accuracy_score(y_test, knn_unscaled.predict(X_test)))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn_scaled = KNeighborsClassifier(n_neighbors=5)
knn_scaled.fit(X_train_scaled, y_train)
print("Scaled Accuracy:", accuracy_score(y_test, knn_scaled.predict(X_test_scaled)))

#===================================================================================================================================
#Q26. Train a PCA model on synthetic data and print the explained variance ratio for each component
#Answer:
from sklearn.decomposition import PCA

# Synthetic dataset
X = np.random.rand(100, 5)

# PCA
pca = PCA(n_components=5)
pca.fit(X)

print("Explained Variance Ratio:", pca.explained_variance_ratio_)


#===================================================================================================================================
#Q27. Apply PCA before training a KNN Classifier and compare accuracy with and without PCA
#Answer:
# Without PCA
knn_no_pca = KNeighborsClassifier(n_neighbors=5)
knn_no_pca.fit(X_train, y_train)
print("Accuracy without PCA:", accuracy_score(y_test, knn_no_pca.predict(X_test)))

# With PCA
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)

knn_pca = KNeighborsClassifier(n_neighbors=5)
knn_pca.fit(X_train_pca, y_train)
print("Accuracy with PCA:", accuracy_score(y_test, knn_pca.predict(X_test_pca)))

#===================================================================================================================================

#Q28. Perform Hyperparameter Tuning on a KNN Classifier using GridSearchCV
#Answer:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_neighbors': [3, 5, 7, 9],
    'weights': ['uniform', 'distance'],
    'metric': ['euclidean', 'manhattan']
}

grid = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Accuracy:", grid.best_score_)


#===================================================================================================================================
#Q29. Train a KNN Classifier and check the number of misclassified samples
#Answer:
y_pred = knn.predict(X_test)
misclassified = (y_test != y_pred).sum()
print("Number of misclassified samples:", misclassified)


#===================================================================================================================================
#Q30. Train a PCA model and visualize the cumulative explained variance
#Answer:
import matplotlib.pyplot as plt
import numpy as np

pca = PCA().fit(X)

plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance')
plt.title('PCA - Cumulative Explained Variance')
plt.grid(True)
plt.show()




