# Introduction
Boosting algorithms are powerful techniques used to improve the performance of weak learners. In this tutorial, we will explore how to use AdaBoost, XGBoost, and Gradient Boosting to predict whether a client will subscribe to a term deposit based on the Bank Marketing dataset. This dataset contains information on direct marketing campaigns of a Portuguese banking institution.

## AdaBoost Tutorial


### Step 1: Import Required Libraries
First, import the necessary libraries for data manipulation, model training, and evaluation.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')

### Step 2: Load and Preprocess the Dataset
Load the Bank Marketing dataset and preprocess it. This includes handling missing values, encoding categorical variables, and splitting the data into features and target variables.

In [3]:
# Load the dataset
#url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip"
#!wget $url # Download the zip file
#!unzip bank-additional.zip # Unzip the file
file = "C:\\Users\\abo_O\\Downloads\\bank-additional\\bank-additional\\bank-additional-full.csv"

data = pd.read_csv(file, delimiter=';') # Load the data

# Encode categorical variables
data = pd.get_dummies(data, drop_first=True)

# Split the data into features and target variable
X = data.drop('y_yes', axis=1)
y = data['y_yes']

### Step 3: Split the Dataset
Split the dataset into training and testing sets to evaluate the performance of the models.

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)


### Step 4: Initialize and Train the AdaBoost Classifier
Initialize a Decision Tree classifier and use it as the base estimator for the AdaBoost classifier.

In [7]:
# Initialize base classifier and AdaBoost Meta-estimator
base_estimator = DecisionTreeClassifier(max_depth=1)
adaboost_classifier = AdaBoostClassifier(base_estimator, n_estimators=50, random_state=42)

# Train the classifier on the training data
adaboost_classifier.fit(X_train, y_train)

# Make predictions on the test data
predictions = adaboost_classifier.predict(X_test)

# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, predictions)
print(f'AdaBoost Classifier Model Accuracy: {accuracy * 100:.2f}%')


AdaBoost Classifier Model Accuracy: 90.84%


## XGBoost Tutorial


### Step 1: Import Required Libraries
First, import the necessary libraries for data manipulation, model training, and evaluation.

In [9]:
!pip install xgboost

Collecting xgboost
  Downloading xgboost-2.1.1-py3-none-win_amd64.whl.metadata (2.1 kB)
Downloading xgboost-2.1.1-py3-none-win_amd64.whl (124.9 MB)
   ---------------------------------------- 0.0/124.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/124.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/124.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/124.9 MB ? eta -:--:--
   ---------------------------------------- 0.0/124.9 MB 217.9 kB/s eta 0:09:34
   ---------------------------------------- 0.0/124.9 MB 217.9 kB/s eta 0:09:34
   ---------------------------------------- 0.1/124.9 MB 351.4 kB/s eta 0:05:56
   ---------------------------------------- 0.1/124.9 MB 435.7 kB/s eta 0:04:47
   ---------------------------------------- 0.2/124.9 MB 544.7 kB/s eta 0:03:50
   ---------------------------------------- 0.3/124.9 MB 747.0 kB/s eta 0:02:47
   ---------------------------------------- 0.4/124.9 MB 1.0 MB/s eta 0:02:04
   -------


[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: C:\Users\abo_O\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [10]:
import pandas as pd
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

### Step 2: Load and Preprocess the Dataset
Load the Bank Marketing dataset and preprocess it. This includes handling missing values, encoding categorical variables, and splitting the data into features and target variables.

In [11]:
# # Load the dataset
# url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip"
# !wget $url # Download the zip file
# !unzip bank-additional.zip # Unzip the file
# data = pd.read_csv('bank-additional/bank-additional-full.csv', delimiter=';') # Load the data

# Encode categorical variables
data = pd.get_dummies(data, drop_first=True)

# Split the data into features and target variable
X = data.drop('y_yes', axis=1)
y = data['y_yes']

### Step 3: Split the Dataset
Split the dataset into training and testing sets to evaluate the performance of the models.

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Step 4: Initialize and Train the XGBoost Classifier
Initialize and train the XGBoost classifier.

In [13]:
# Initialize and train the XGBoost classifier
xgb_classifier = XGBClassifier(n_estimators=50, random_state=42)
xgb_classifier.fit(X_train, y_train)

# Make predictions on the test data
predictions = xgb_classifier.predict(X_test)

# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, predictions)
print(f'XGBoost Classifier Model Accuracy: {accuracy * 100:.2f}%')

XGBoost Classifier Model Accuracy: 91.58%


## Gradient Boosting Tutorial


### Step 1: Import Required Libraries
First, import the necessary libraries for data manipulation, model training, and evaluation.

In [14]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

### Step 2: Load and Preprocess the Dataset
Load the Bank Marketing dataset and preprocess it. This includes handling missing values, encoding categorical variables, and splitting the data into features and target variables.

In [15]:
# # Load the dataset
# url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip"
# !wget $url # Download the zip file
# !unzip bank-additional.zip # Unzip the file
# data = pd.read_csv('bank-additional/bank-additional-full.csv', delimiter=';') # Load the data

# Encode categorical variables
data = pd.get_dummies(data, drop_first=True)

# Split the data into features and target variable
X = data.drop('y_yes', axis=1)
y = data['y_yes']

### Step 3: Split the Dataset
Split the dataset into training and testing sets to evaluate the performance of the models.

In [16]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

### Step 4: Initialize and Train the Gradient Boosting Classifier
Initialize and train the Gradient Boosting classifier.

In [17]:
# Initialize and train the Gradient Boosting classifier
gradient_boosting_classifier = GradientBoostingClassifier(n_estimators=50, random_state=42)
gradient_boosting_classifier.fit(X_train, y_train)

# Make predictions on the test data
predictions = gradient_boosting_classifier.predict(X_test)

# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, predictions)
print(f'Gradient Boosting Classifier Model Accuracy: {accuracy * 100:.2f}%')

Gradient Boosting Classifier Model Accuracy: 91.62%
