## 🧶 Touch and User Classification Using Smart Fabric

In this task, we classify both the user and the type of touch interaction based on 3,200 sensor readings from a smart textile. The dataset includes labeled user identities and interaction types. Our goal is to evaluate model performance on each classification task and compute a composite score.


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.feature_selection import VarianceThreshold, SelectFromModel
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

## 🔍 Data Exploration & Preprocessing

We first remove low-variance features, which are unlikely to contribute meaningfully to classification. Then we use `RandomForestClassifier` for embedded feature selection, retaining only the most informative features.


In [2]:
# === Load Dataset ===
print("Reading dataset...")
data = pd.read_excel("03-Touch and User Classification from Smart Fabric.xlsx")
print(f"Shape of data: {data.shape}")

Reading dataset...
Shape of data: (2056, 3206)


In [3]:
# === Features and Targets ===
feature_indices = list(range(1, 3201))
X = data[feature_indices]
target_user = data['user_id']
target_touch = data['touch_type']

# === Variance Threshold ===
print("Applying variance threshold...")
selector = VarianceThreshold(threshold=0.01)
X_filtered = selector.fit_transform(X)
print(f"Features after variance filtering: {X_filtered.shape[1]}")

# === Feature Selection via RandomForest ===
print("Selecting features using RandomForestClassifier...")
forest_selector = RandomForestClassifier(random_state=13)
forest_selector.fit(X_filtered, target_user)

model_selector = SelectFromModel(forest_selector, threshold="median", prefit=True)
X_selected = model_selector.transform(X_filtered)
print(f"Selected features (median threshold): {X_selected.shape[1]}")

Applying variance threshold...
Features after variance filtering: 3200
Selecting features using RandomForestClassifier...
Selected features (median threshold): 1600


## 🤖 Model Training and Evaluation

We train two separate classifiers:
- One to predict the `user_id`
- Another to predict the `touch_type`

We use **Gradient Boosting Classifier** for both tasks, due to its effectiveness on high-dimensional data. Model performance is measured using accuracy.


In [None]:
# === Helper function for training and evaluation ===
def train_and_evaluate(X, y, label):
    print(f"\nTraining classifier for: {label}")
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=13, stratify=y
    )
    clf = GradientBoostingClassifier(n_estimators=50, random_state=13)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print(f"{label} Accuracy: {acc:.4f}")
    return acc


# === Run both classifiers ===
acc_user = train_and_evaluate(X_selected, target_user, "User ID")
acc_touch = train_and_evaluate(X_selected, target_touch, "Touch Type")


Training classifier for: User ID


## 🧮 Final Score

The final project score is calculated as the multiplication of both model accuracies:

`Final Score = Accuracy(user_id) × Accuracy(touch_type)`


In [None]:
final_score = acc_user * acc_touch
print(f"\n🧠 Final Composite Score (User * Touch): {final_score:.4f}")

## 📊 Accuracy Visualization

The following bar chart summarizes the classification accuracy of both models for better interpretability.


In [None]:
# === Plot Accuracy ===
labels = ['User ID', 'Touch Type']
accuracies = [acc_user, acc_touch]

plt.figure(figsize=(7, 5))
bars = plt.bar(labels, accuracies, color=['steelblue', 'tomato'])
plt.ylim(0, 1)
plt.ylabel("Accuracy")
plt.title("📈 Smart Fabric Classification Accuracy")
for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval + 0.02, f"{yval:.2f}", ha='center')
plt.grid(True, axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

Reading dataset...
Shape of data: (2056, 3206)
Applying variance threshold...
Features after variance filtering: 3200
Selecting features using RandomForestClassifier...
Selected features (median threshold): 1600

Training classifier for: User ID
