# Gradient Boosting (Regression)

Data Source: [NASA - Airfoil Self-Noise]("https://archive.ics.uci.edu/ml/datasets/airfoil+self-noise")

**Attribute Information**

This problem has the following inputs:
- 1. Frequency, in Hertzs.
- 2. Angle of attack, in degrees.
- 3. Chord length, in meters.
- 4. Free-stream velocity, in meters per second.
- 5. Suction side displacement thickness, in meters.

The only output is:
- 6. Scaled sound pressure level, in decibels. 

In [1]:
# Importing the necessary packages
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor, AdaBoostRegressor, ExtraTreesRegressor
from sklearn.ensemble import RandomForestRegressor, BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
from math import sqrt
from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings("ignore")

In [2]:
# Load and read the dataset
airfoil = pd.read_csv("./airfoil/airfoil_self_noise.dat", sep = "\t")
airfoil.head()

Unnamed: 0,800,0,0.3048,71.3,0.00266337,126.201
0,1000,0.0,0.3048,71.3,0.002663,125.201
1,1250,0.0,0.3048,71.3,0.002663,125.951
2,1600,0.0,0.3048,71.3,0.002663,127.591
3,2000,0.0,0.3048,71.3,0.002663,127.461
4,2500,0.0,0.3048,71.3,0.002663,125.571


In [3]:
# Rename the columns as per data source
airfoil.columns = ["frequency", "angle", "length", "velocity", "displacement", "pressure"]
airfoil.head()

Unnamed: 0,frequency,angle,length,velocity,displacement,pressure
0,1000,0.0,0.3048,71.3,0.002663,125.201
1,1250,0.0,0.3048,71.3,0.002663,125.951
2,1600,0.0,0.3048,71.3,0.002663,127.591
3,2000,0.0,0.3048,71.3,0.002663,127.461
4,2500,0.0,0.3048,71.3,0.002663,125.571


In [4]:
# Display the characteristics of the dataset
print("Dimensions of dataset are: ", airfoil.shape)
print("The variables present in dataset are: ", airfoil.columns)

Dimensions of dataset are:  (1502, 6)
The variables present in dataset are:  Index(['frequency', 'angle', 'length', 'velocity', 'displacement', 'pressure'], dtype='object')


In [5]:
# Using a random seed function to generate the same dataset
np.random.seed(3000)

In [6]:
# Train-Test Split
training, test = train_test_split(airfoil, test_size = 0.3)

x_trg = training.drop("pressure", axis = 1)
y_trg = training["pressure"]

x_test = test.drop("pressure", axis = 1)
y_test = test["pressure"]

### Creating Gradient Boosting model

In [7]:
# Model building - Gradient Boosting
airfoil_grad = GradientBoostingRegressor()

# Fit the model
airfoil_grad.fit(x_trg, y_trg)
print("Accuracy of Gradient Boosting model on training set is: ", airfoil_grad.score(x_trg, y_trg))
print("Accuracy of Gradient Boosting model on test set is: ", airfoil_grad.score(x_test, y_test))

# Prediction via Gradient Boosting
airfoil_grad_pred = airfoil_grad.predict(x_test)

# Compute the RMSE of Gradient Boosting
airfoil_grad_rmse = sqrt(mean_squared_error(y_test, airfoil_grad_pred))
print("RMSE value of Gradient Boosting model is: ", airfoil_grad_rmse)

Accuracy of Gradient Boosting model on training set is:  0.8861890217507379
Accuracy of Gradient Boosting model on test set is:  0.848131736398517
RMSE value of Gradient Boosting model is:  2.7067274078012558


#### Creating a new Gradient Boosting model with Grid Search

In [8]:
# Import the necessary package
from sklearn.model_selection import GridSearchCV

In [9]:
# Setting the parameters
param_grid = {"max_depth" : [3,4,5], "n_estimators" : [50,100,200], "learning_rate" : [0.5,0.7,0.9,1.0]}
airfoil_grad_grid = GradientBoostingRegressor()
airfoil_grad_CV = GridSearchCV(estimator = airfoil_grad_grid, param_grid = param_grid, cv = 5)

In [10]:
# Fit the model
airfoil_grad_result = airfoil_grad_CV.fit(x_trg, y_trg)
print("Best Parameters are: \n", airfoil_grad_CV.best_params_)

Best Parameters are: 
 {'learning_rate': 0.7, 'max_depth': 3, 'n_estimators': 200}


#### Creating the model with best scores

In [11]:
# Model building - Gradient Boosting with best scores
airfoil_grad_best = GradientBoostingRegressor(
                    max_depth = airfoil_grad_result.best_params_["max_depth"],
                    n_estimators = airfoil_grad_result.best_params_["n_estimators"],
                    learning_rate = airfoil_grad_result.best_params_["learning_rate"])

#### Evaluating the model with best scores

In [12]:
# Fit the model
airfoil_grad_best.fit(x_trg, y_trg)
print("Accuracy of GB model with best parameter on training set is: ", airfoil_grad_best.score(x_trg, y_trg))
print("Accuracy of GB model with best parameter on test set is: ", airfoil_grad_best.score(x_test, y_test))

# Prediction via GB model with best parameters
airfoil_grad_pred_2 = airfoil_grad_best.predict(x_test)

# Compute the RMSE value of GB best model
airfoil_grad_rmse_2 = sqrt(mean_squared_error(y_test, airfoil_grad_pred_2))
print("RMSE value of new Gradient Boosting model is: ", airfoil_grad_rmse_2)

Accuracy of GB model with best parameter on training set is:  0.9847755313947395
Accuracy of GB model with best parameter on test set is:  0.9265134864776614
RMSE value of new Gradient Boosting model is:  1.8828461232305436


#### Creating AdaBoost model

In [13]:
# Model building - AdaBoost
airfoil_ada = AdaBoostRegressor()

# Fit the model
airfoil_ada.fit(x_trg, y_trg)
print("Accuracy of AdaBoost model on training set is: ", airfoil_ada.score(x_trg, y_trg))
print("Accuracy of AdaBoost model on test set is: ", airfoil_ada.score(x_test, y_test))

# Prediction via AdaBoost
airfoil_ada_pred = airfoil_ada.predict(x_test)

# Compute the RMSE value of AdaBoost
airfoil_ada_rmse = sqrt(mean_squared_error(y_test, airfoil_ada_pred))
print("RMSE value of AdaBoost model is: ", airfoil_ada_rmse)

Accuracy of AdaBoost model on training set is:  0.7047355559502793
Accuracy of AdaBoost model on test set is:  0.7099200113293812
RMSE value of AdaBoost model is:  3.7408465325768563


#### Creating Extra Tree model

In [14]:
# Model building - Extra Tree
airfoil_extratree = ExtraTreesRegressor()

# Fit the model
airfoil_extratree.fit(x_trg, y_trg)
print("Accuracy of Extra Tree model on training set is: ", airfoil_extratree.score(x_trg, y_trg))
print("Accuracy of Extra Tree model on test set is: ", airfoil_extratree.score(x_test, y_test))

# Prediction via Extra Tree
airfoil_extratree_pred = airfoil_extratree.predict(x_test)

# Compute the RMSE value of Extra Tree
airfoil_extratree_rmse = sqrt(mean_squared_error(y_test, airfoil_extratree_pred))
print("RMSE value of Extra Tree model is: ", airfoil_extratree_rmse)

Accuracy of Extra Tree model on training set is:  0.9999999850780376
Accuracy of Extra Tree model on test set is:  0.9401578153704178
RMSE value of Extra Tree model is:  1.6990835167059037


#### Creating Random Forest model

In [15]:
# Model building - Random Forest
airfoil_forest = RandomForestRegressor(random_state = 0)

# Fit the model
airfoil_forest.fit(x_trg, y_trg)
print("Accuracy of Random Forest on training set is: ", airfoil_forest.score(x_trg, y_trg))
print("Accuracy of Random Forest on test set is: ", airfoil_forest.score(x_test, y_test))

# Prediction via Random Forest
airfoil_forest_pred = airfoil_forest.predict(x_test)

# Compute the RMSE value of Random Forest
airfoil_forest_rmse = sqrt(mean_squared_error(y_test, airfoil_forest_pred))
print("RMSE value of Random Forest model is: ", airfoil_forest_rmse)

Accuracy of Random Forest on training set is:  0.9888531636649133
Accuracy of Random Forest on test set is:  0.9236843627017386
RMSE value of Random Forest model is:  1.918747268097239


#### Creating Bagging model

In [16]:
# Model building - Bagging
airfoil_bag = BaggingRegressor(base_estimator = None, n_estimators = 10, max_samples = 1.0,
                              max_features = 1.0, bootstrap = True)

# Fit the model
airfoil_bag.fit(x_trg, y_trg)
print("Accuracy of Bagging model on training set is: ", airfoil_bag.score(x_trg, y_trg))
print("Accuracy of Bagging model on test set ist: ", airfoil_bag.score(x_test, y_test))

# Prediction via Bagging
airfoil_bag_pred = airfoil_bag.predict(x_test)

# Compute the RMSE value of Bagging
airfoil_bag_rmse = sqrt(mean_squared_error(y_test, airfoil_bag_pred))
print("RMSE value of Bagging model is: ", airfoil_bag_rmse)

Accuracy of Bagging model on training set is:  0.9811483801536135
Accuracy of Bagging model on test set ist:  0.9158311075043963
RMSE value of Bagging model is:  2.0150545762967678


#### Creating Decision Tree model

In [17]:
# Model building - Decision Tree
airfoil_tree = DecisionTreeRegressor(random_state = 0)

# Fit the model
airfoil_tree.fit(x_trg, y_trg)
print("Accuracy of Decision Tree model on training set is: ", airfoil_tree.score(x_trg, y_trg))
print("Accuracy of Decision Tree model on test set ist: ", airfoil_tree.score(x_test, y_test))

# Prediction via Decision Tree
airfoil_tree_pred = airfoil_tree.predict(x_test)

# Compute the RMSE of Decision Tree
airfoil_tree_rmse = sqrt(mean_squared_error(y_test, airfoil_tree_pred))
print("RMSE value of Decision Tree model is: ", airfoil_tree_rmse)

Accuracy of Decision Tree model on training set is:  1.0
Accuracy of Decision Tree model on test set ist:  0.850105137254568
RMSE value of Decision Tree model is:  2.689084077782543
