# Assignment: kNN Classification on Wine Quality Dataset

**Group:** 2nd‑year IT‑tradenomi students
**Dataset:** WineQT (Kaggle)
**Goal:** Understand and experiment with kNN classification


## Part 1 — Familiarization and Basic kNN

Dataset background and sklearn explanation.


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

df = pd.read_csv("WineQT.csv")
X = df.drop(['quality','Id'], axis=1)
y = df['quality']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy_score(y_test, y_pred)

0.5152838427947598

## Part 2 — Experiments with k values, splits, and k‑fold


In [2]:
results = []
for k in [1,3,5,7,9,11,15]:
    model = KNeighborsClassifier(n_neighbors=k)
    model.fit(X_train, y_train)
    pred = model.predict(X_test)
    results.append([k, accuracy_score(y_test, pred)])
pd.DataFrame(results, columns=['k','accuracy'])

Unnamed: 0,k,accuracy
0,1,0.606987
1,3,0.502183
2,5,0.515284
3,7,0.528384
4,9,0.537118
5,11,0.524017
6,15,0.550218


In [3]:
splits = [0.2,0.3,0.4]
rows = []
for s in splits:
    X_train2, X_test2, y_train2, y_test2 = train_test_split(X, y, test_size=s, random_state=42)
    knn = KNeighborsClassifier(n_neighbors=5)
    knn.fit(X_train2, y_train2)
    pred2 = knn.predict(X_test2)
    rows.append([s, accuracy_score(y_test2, pred2)])
pd.DataFrame(rows, columns=['test_size','accuracy'])

Unnamed: 0,test_size,accuracy
0,0.2,0.515284
1,0.3,0.495627
2,0.4,0.482533


In [4]:
from sklearn.model_selection import KFold
scores = []
kf = KFold(n_splits=5, shuffle=True, random_state=42)
for train_idx, test_idx in kf.split(X):
    X_tr, X_te = X.iloc[train_idx], X.iloc[test_idx]
    y_tr, y_te = y.iloc[train_idx], y.iloc[test_idx]
    model = KNeighborsClassifier(n_neighbors=5)
    model.fit(X_tr, y_tr)
    preds = model.predict(X_te)
    scores.append(accuracy_score(y_te, preds))
sum(scores)/len(scores)

0.47769861334559105

### ✅ Conclusion: kNN tested on Wine dataset with different settings.