# Practical Exercise: Classifying Animals with k-Nearest Neighbors
In this exercise, you will use the **k-Nearest Neighbors (k-NN)** algorithm to classify animals based on physical features.

**Dataset features:**
- `height_cm`: the height of the animal
- `weight_kg`: the weight of the animal
- `has_tail`: 1 if the animal has a tail, 0 otherwise

The goal is to predict the `animal_type`:
- 0 = Bird
- 1 = Mammal


## Step 1: Load Required Libraries

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
print("✅ Libraries loaded.")

✅ Libraries loaded.


## Step 2: Load and Inspect the Dataset

In [2]:
df = pd.read_csv('data/knn_animals_data.csv')
print("📄 First rows of the dataset:")
print(df.head())

print("\n📊 Dataset description:")
print(df.describe())

📄 First rows of the dataset:
   height_cm  weight_kg  has_tail  animal_type
0       76.5       19.7         0            0
1       56.0       28.6         0            1
2       64.7       16.3         1            1
3       83.6       15.9         1            1
4       78.0       19.5         0            0

📊 Dataset description:
        height_cm   weight_kg    has_tail  animal_type
count  150.000000  150.000000  150.000000   150.000000
mean    51.880000   19.648000    0.493333     0.580000
std     15.350859    4.870301    0.501630     0.495212
min     11.700000    6.100000    0.000000     0.000000
25%     40.050000   16.625000    0.000000     0.000000
50%     52.050000   19.500000    0.000000     1.000000
75%     62.750000   23.075000    1.000000     1.000000
max     85.700000   31.500000    1.000000     1.000000


## Step 3: Prepare the Data
We separate the features (X) from the target variable (y), then split the data into training and testing sets.

In [3]:
X = df[['height_cm', 'weight_kg', 'has_tail']]
y = df['animal_type']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
print("✅ Data split completed.")

✅ Data split completed.


## Step 4: Train the k-NN Model

In [4]:
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
print("✅ Model training complete.")

✅ Model training complete.


## Step 5: Predict and Evaluate the Model

In [5]:
y_pred = model.predict(X_test)
print(f"🔍 Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

🔍 Accuracy: 0.53

Classification Report:
              precision    recall  f1-score   support

           0       0.48      0.50      0.49        20
           1       0.58      0.56      0.57        25

    accuracy                           0.53        45
   macro avg       0.53      0.53      0.53        45
weighted avg       0.54      0.53      0.53        45



## Step 6: Try a Custom Animal Profile

In [6]:
print("🔍 Enter animal features to classify:")
height = float(input("Height (cm): "))
weight = float(input("Weight (kg): "))
tail = int(input("Has tail? (0 = No, 1 = Yes): "))

animal_df = pd.DataFrame([[height, weight, tail]], columns=X.columns)
pred = model.predict(animal_df)[0]
animal_name = 'Mammal' if pred == 1 else 'Bird'
print(f"Prediction: 🐾 This animal is likely a **{animal_name}**")

🔍 Enter animal features to classify:


Height (cm):  120
Weight (kg):  100
Has tail? (0 = No, 1 = Yes):  y


ValueError: invalid literal for int() with base 10: 'y'