# Binary Prediction of Smoker Status using Bio-Signals

Smoking prediction model using XGBoost classifier. The pipeline includes data preprocessing with MinMaxScaler, model training achieving 80% accuracy, and generating predictions for test data. The model uses health metrics to predict smoking status, with results saved to a submission file.

Dataset: https://www.kaggle.com/competitions/playground-series-s3e24/data

Hugging Face: https://huggingface.co/spaces/alperugurcan/smoking-predictor

In [1]:
# Import essential libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load data
train = pd.read_csv('/kaggle/input/playground-series-s3e24/train.csv')
test = pd.read_csv('/kaggle/input/playground-series-s3e24/test.csv')

# Prepare training data
X_train = train.drop(['id', 'smoking'], axis=1)
y_train = train['smoking']

# Scale features
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)

# Split data
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Train XGBoost (best performing model from original code)
xg = XGBClassifier(n_estimators=100, max_depth=5, min_child_weight=1, max_delta_step=0, random_state=42)
xg.fit(X_train, y_train)

# Make predictions on test data
test_processed = scaler.transform(test.drop('id', axis=1))
y_test = xg.predict(test_processed)

# Create submission file
submission = pd.DataFrame({'id': test['id'], 'smoking': y_test})
submission.to_csv('submission.csv', index=False)

In [3]:
# Save the model and scaler
import joblib

# Get feature names before scaling
feature_names = train.drop(['id', 'smoking'], axis=1).columns.tolist()

# Save the model
joblib.dump(xg, 'smoking_predictor_model.joblib')
# Save the scaler
joblib.dump(scaler, 'scaler.joblib')
# Save feature names
joblib.dump(feature_names, 'feature_names.joblib')

['feature_names.joblib']