##  Machine Learning Task: KNN Classification on the Breast Cancer Dataset

In this exercise, we will build a simple **K-Nearest Neighbors (KNN)** classification model using the **Breast Cancer dataset** from `scikit-learn`.

###  Steps to Complete

1. **Load the dataset** using `sklearn.datasets.load_breast_cancer`.
2. **Split the data** into:
   - 80% training set  
   - 20% test set  
3. **Implement KNN from scratch**:
   - Use **Euclidean distance**
   - Use **k = 3**
   - Predict the label based on **majority vote** from the 3 nearest neighbors
4. **Train and evaluate scikit-learn’s KNeighborsClassifier** using the same value of `k = 3`
5. **Compare results**:
   - Print accuracy for:
     - Your custom implementation
     - scikit-learn’s implementation  
   - Check whether predictions from both methods are identical

### Expected Outcome

By the end of this task, you should have:
- A working custom KNN classifier  
- A comparison of performance with scikit-learn’s KNN  
- A printed conclusion on accuracy and prediction similarity  

## Answer:

In [1]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define Euclidean distance function
def euclidean_distance(a, b):
    return np.sqrt(np.sum((a - b) ** 2))

# Implement KNN from scratch
def knn_predict(X_train, y_train, X_test, k=3):
    predictions = []
    for test_point in X_test:
        # Calculate distances to all training points
        distances = [euclidean_distance(test_point, train_point) for train_point in X_train]
        # Find the k nearest neighbors
        k_indices = np.argsort(distances)[:k]
        k_nearest_labels = y_train[k_indices]
        # Majority vote
        unique, counts = np.unique(k_nearest_labels, return_counts=True)
        predicted_label = unique[np.argmax(counts)]
        predictions.append(predicted_label)
    return np.array(predictions)

# Predict using the custom KNN
y_pred_custom = knn_predict(X_train, y_train, X_test, k=3)

# Predict using scikit-learn's KNN
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_pred_sklearn = knn.predict(X_test)

# Calculate and print accuracies
accuracy_custom = accuracy_score(y_test, y_pred_custom)
accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)
predictions_same = np.array_equal(y_pred_custom, y_pred_sklearn)

print(f"Custom KNN Accuracy: {accuracy_custom:.4f}")
print(f"Scikit-learn KNN Accuracy: {accuracy_sklearn:.4f}")
print(f"Do both predictions match exactly? {'Yes' if predictions_same else 'No'}")

Custom KNN Accuracy: 0.9298
Scikit-learn KNN Accuracy: 0.9298
Do both predictions match exactly? Yes
