boosting

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# load dataset
numeric_df = pd.read_csv("numeric_dataset.csv")
np.random.seed(42)
numeric_df["target"] = np.random.choice([0, 1], size=len(numeric_df))
X = numeric_df.drop("target", axis=1)
y = numeric_df["target"]

# split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# adaboost
ada = AdaBoostClassifier(n_estimators=100, random_state=42)
ada.fit(X_train, y_train)
ada_pred = ada.predict(X_test)
print("AdaBoost Accuracy (Numeric):", accuracy_score(y_test, ada_pred))

# gradient boosting
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb.fit(X_train, y_train)
gb_pred = gb.predict(X_test)
print("GradientBoosting Accuracy (Numeric):", accuracy_score(y_test, gb_pred))


AdaBoost Accuracy (Numeric): 0.5154166666666666
GradientBoosting Accuracy (Numeric): 0.50125


The provided Python script demonstrates applying boosting ensemble techniques—specifically AdaBoost and Gradient Boosting—on a numeric dataset. It starts by importing essential libraries: pandas and numpy for data manipulation, and scikit-learn modules for model building, training, and evaluation. The dataset is loaded from "numeric_dataset.csv" into a DataFrame called numeric_df. To simulate a target variable, a synthetic binary column "target" is created using numpy.random.choice, and a fixed random seed (np.random.seed(42)) ensures reproducibility.

Features (X) and target (y) are separated, and the data is split into training and testing sets using an 80:20 ratio via train_test_split. The AdaBoostClassifier is instantiated with 100 estimators (n_estimators=100) and trained on the training set. Predictions on the test set are generated and accuracy is calculated using accuracy_score. Similarly, a GradientBoostingClassifier with 100 estimators is trained and evaluated.

Both AdaBoost and Gradient Boosting are sequential ensemble methods that focus on correcting errors of previous models. AdaBoost emphasizes misclassified samples by assigning higher weights, while Gradient Boosting fits new models to the residual errors of previous models. These techniques reduce bias and variance and generally outperform single decision trees. The script prints the test accuracy of both methods, demonstrating the power of boosting for numeric datasets and highlighting the difference between adaptive and gradient-based boosting approaches.