### **AdaBoost Regressor**

**AdaBoost (Adaptive Boosting)** is an ensemble learning technique that combines the outputs of multiple weak learners (typically decision trees) to improve overall model performance.The core idea is to create a strong model by combining several weak learners that individually perform slightly better than random guessing.

AdaBoost can also be adapted for regression tasks, in which the model focuses on predicting continuous values rather than discrete classes. The **AdaBoost Regressor** works similarly to AdaBoost for classification but aims to minimize the prediction error instead of misclassification.

**Step 1: Import Libraries**


In [67]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error,r2_score
import xlrd


**Step 2: Load the Dataset**

In [68]:
# Load the dataset
df = pd.read_excel('E:\\Machine Learning\\global_superstore\\Global Superstore.xls')

# Display the first few rows
df.head()


Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,City,State,...,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit,Shipping Cost,Order Priority
0,32298,CA-2012-124891,2012-07-31,2012-07-31,Same Day,RH-19495,Rick Hansen,Consumer,New York City,New York,...,TEC-AC-10003033,Technology,Accessories,Plantronics CS510 - Over-the-Head monaural Wir...,2309.65,7,0.0,762.1845,933.57,Critical
1,26341,IN-2013-77878,2013-02-05,2013-02-07,Second Class,JR-16210,Justin Ritter,Corporate,Wollongong,New South Wales,...,FUR-CH-10003950,Furniture,Chairs,"Novimex Executive Leather Armchair, Black",3709.395,9,0.1,-288.765,923.63,Critical
2,25330,IN-2013-71249,2013-10-17,2013-10-18,First Class,CR-12730,Craig Reiter,Consumer,Brisbane,Queensland,...,TEC-PH-10004664,Technology,Phones,"Nokia Smart Phone, with Caller ID",5175.171,9,0.1,919.971,915.49,Medium
3,13524,ES-2013-1579342,2013-01-28,2013-01-30,First Class,KM-16375,Katherine Murray,Home Office,Berlin,Berlin,...,TEC-PH-10004583,Technology,Phones,"Motorola Smart Phone, Cordless",2892.51,5,0.1,-96.54,910.16,Medium
4,47221,SG-2013-4320,2013-11-05,2013-11-06,Same Day,RH-9495,Rick Hansen,Consumer,Dakar,Dakar,...,TEC-SHA-10000501,Technology,Copiers,"Sharp Wireless Fax, High-Speed",2832.96,8,0.0,311.52,903.04,Critical


**Step 3: Data Preprocessing**

In [69]:
df = df.dropna(subset=['Profit'])

categorical_columns = ['Ship Mode', 'Customer Name', 'Segment', 'City', 'State', 'Country', 
                       'Market', 'Region', 'Category', 'Sub-Category', 'Product Name', 'Order Priority','Product ID']

label_encoders = {}
for column in categorical_columns:
    le = LabelEncoder()
    df[column] = le.fit_transform(df[column])
    label_encoders[column] = le 

df.head()


Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,City,State,...,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit,Shipping Cost,Order Priority
0,32298,CA-2012-124891,2012-07-31,2012-07-31,1,RH-19495,632,0,2290,703,...,8246,2,0,2750,2309.65,7,0.0,762.1845,933.57,0
1,26341,IN-2013-77878,2013-02-05,2013-02-07,2,JR-16210,413,1,3518,702,...,907,0,5,2525,3709.395,9,0.1,-288.765,923.63,0
2,25330,IN-2013-71249,2013-10-17,2013-10-18,0,CR-12730,181,0,497,820,...,10157,2,13,2502,5175.171,9,0.1,919.971,915.49,3
3,13524,ES-2013-1579342,2013-01-28,2013-01-30,0,KM-16375,424,2,375,145,...,10146,2,13,2414,2892.51,5,0.1,-96.54,910.16,3
4,47221,SG-2013-4320,2013-11-05,2013-11-06,1,RH-9495,632,0,857,270,...,10249,2,6,3158,2832.96,8,0.0,311.52,903.04,0


In [70]:
df_num = df.select_dtypes(include = [np.number])
df_num.columns

Index(['Row ID', 'Ship Mode', 'Customer Name', 'Segment', 'City', 'State',
       'Country', 'Postal Code', 'Market', 'Region', 'Product ID', 'Category',
       'Sub-Category', 'Product Name', 'Sales', 'Quantity', 'Discount',
       'Profit', 'Shipping Cost', 'Order Priority'],
      dtype='object')

In [71]:
df_num_corr = df_num.corr()
df_num_columns = []
df_num_columns.extend(df_num_corr[(df_num_corr['Profit']>0.3)].index.values)
df_num_columns.extend(df_num_corr[(df_num_corr['Profit']<-0.3)].index.values)

In [72]:
df_num_columns

['Sales', 'Profit', 'Shipping Cost', 'Discount']

In [73]:
from sklearn.preprocessing import StandardScaler
columns_to_scale = ['Sales', 'Profit', 'Shipping Cost', 'Discount']

standard_scaler = StandardScaler()
df= df.copy()
df[columns_to_scale] = standard_scaler.fit_transform(df[columns_to_scale])


In [74]:
df.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,City,State,...,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit,Shipping Cost,Order Priority
0,32298,CA-2012-124891,2012-07-31,2012-07-31,1,RH-19495,632,0,2290,703,...,8246,2,0,2750,4.231596,7,-0.67321,4.207735,15.833395,0
1,26341,IN-2013-77878,2013-02-05,2013-02-07,2,JR-16210,413,1,3518,702,...,907,0,5,2525,7.102511,9,-0.202129,-1.82045,15.659911,0
2,25330,IN-2013-71249,2013-10-17,2013-10-18,0,CR-12730,181,0,497,820,...,10157,2,13,2502,10.108857,9,-0.202129,5.11279,15.517842,3
3,13524,ES-2013-1579342,2013-01-28,2013-01-30,0,KM-16375,424,2,375,145,...,10146,2,13,2414,5.427057,5,-0.202129,-0.717859,15.424817,3
4,47221,SG-2013-4320,2013-11-05,2013-11-06,1,RH-9495,632,0,857,270,...,10249,2,6,3158,5.304919,8,-0.67321,1.62275,15.300551,0


**Step 4: Split the Data into Training and Testing Sets**


In [75]:
# Select features (X) and target (y)
X = df[['Row ID', 'Order ID', 'Order Date', 'Ship Date', 'Ship Mode', 'Customer ID', 
        'Customer Name', 'Segment', 'City', 'State', 'Country', 'Market', 'Region', 
        'Product ID', 'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 
        'Shipping Cost', 'Order Priority','Sales']]

# Target variable (for regression)
y = df['Profit']

In [76]:
X = X.drop(columns = ['Row ID', 'Order ID', 'Order Date', 'Ship Date'])

In [77]:
X = X.drop(columns = ["Product ID",'Customer ID'])

In [78]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


**Step 5: Train the Decision Tree Regressor**

In [79]:
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)


**Step 6: Make Predictions and Evaluate the Model**

In [80]:
# Make predictions on the test set
y_pred = model.predict(X_test)

r2 = r2_score(y_test, y_pred)

r2

0.42710148473356513

**Step 7 :Performing Adaboost**

In [81]:
import sklearn
print(sklearn.__version__)

1.5.1


**Step 8 : Implementing AdaBoost Regressor**

In [82]:
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor

base_estimator = DecisionTreeRegressor(max_depth=4)  # Weak learner
adaboost_regressor = AdaBoostRegressor(
    estimator=base_estimator,  # Change 'base_estimator' to 'estimator'
    n_estimators=50,  # Number of boosting stages
    learning_rate=0.1,  # Shrinkage factor
    random_state=42
)


**Step 9 : Training the model with AdaBoost Regressor**

In [83]:
adaboost_regressor.fit(X_train,y_train)

y_pred = adaboost_regressor.predict(X_test)

**Step 10 : Predicting the score of the model**

In [84]:
r2 = r2_score(y_test, y_pred)
r2

0.6087992168256989