# Activity Classification - KNeighborsClassifier Training

This notebook trains a KNeighborsClassifier on the physical activity dataset using GridSearchCV for hyperparameter tuning.

## Load Data and Prepare Training Set


In [2]:
%reset -f

import importlib

import activity_functions
importlib.reload(activity_functions)
from activity_functions import *

In [3]:
activtity = load_data()

Loaded from Kaggle: /Users/tamimdostyar/.cache/kagglehub/datasets/diegosilvadefrana/fisical-activity-dataset/versions/4/dataset2.csv


In [4]:
df_train, df_test = create_train_test(activtity, test_ratio=0.2)
print(df_train.shape)
print(df_test.shape)

(2291244, 33)
(572812, 33)


In [5]:
X_train, y_train, X_test, y_test = prepare_for_train(df_train, df_test)

## Hyperparameter Tuning with GridSearchCV

In [6]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV

def grid_searchCV(X, y):
    model = KNeighborsClassifier()
    param = {
        "n_neighbors": [3, 5, 7, 9, 11],
        "weights": ["uniform", "distance"],
        "metric": ["euclidean", "manhattan", "minkowski"]
    }

    grid = GridSearchCV(
        model,
        param,
        verbose=1,
        refit=True,
        cv=3,
        scoring='accuracy',
        n_jobs=-1
    )

    grid.fit(X, y)
    return grid
best_model = grid_searchCV(X_train, y_train)

Fitting 3 folds for each of 30 candidates, totalling 90 fits


Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 188, in _get_module_lock
KeyError: 'numpy.lib.ufunclike'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/DS420/lib/python3.10/runpy.py", line 187, in _run_module_as_main
Exception ignored in: <module 'collections.abc' from '/opt/anaconda3/envs/DS420/lib/python3.10/collections/abc.py'>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
KeyboardInterrupt: 
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/opt/anaconda3/envs/DS420/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/opt/anaconda3/envs/DS420/lib/python3.10/site-packages/joblib/__init__.py", line 115, in <module>
Traceback (most recent call last):
  File "/opt/anaconda3/envs/DS420/lib/python3.10/runpy.py", line 187, in _run_module_as_main
Traceback (most recent call last):

KeyboardInterrupt: 

In [None]:
cv_result = pd.DataFrame(best_model.cv_results_)
columns = ['params', 'rank_test_score', 'mean_train_score', 'mean_test_score']
cv_result = cv_result[columns]
cv_result.sort_values(by='rank_test_score')

## Best Hyperparameters Found

Display the best hyperparameters found by GridSearchCV:


In [None]:
print("Best Hyperparameters:")
print(best_model.best_params_)
print(f"\nBest Cross-Validation Accuracy: {best_model.best_score_:.4f}")


## Model Evaluation

Evaluate the best model on the test set:


In [None]:
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score, classification_report

# Predict on test set
y_test_hat = best_model.predict(X_test)

# Calculate metrics
compute_scores(y_test, y_test_hat, verbose=True)


In [None]:
print("\nDetailed Classification Report:")
print(classification_report(y_test, y_test_hat))


## Summary

The KNeighborsClassifier was tuned using GridSearchCV with the following hyperparameter grid:
- **n_neighbors**: [3, 5, 7, 9, 11]
- **weights**: ["uniform", "distance"]
- **metric**: ["euclidean", "manhattan", "minkowski"]

This resulted in **30 candidate models** evaluated with **3-fold cross-validation** (90 total fits).

The best model was selected based on accuracy and evaluated on the held-out test set above.
