# Tuning hyper-parameters using Grid Search and ManufacturingNet
##### To learn more about ManufacturingNet, please visit http://manufacturingnet.io/.

In this tutorial, we will use Grid Search to tune the hyper-parameters of a XGBoost regression model trained on the Mercedes-Benz Greener Manufacturing dataset included in ManufacturingNet.

### Imports

In [1]:
from ManufacturingNet import datasets
from ManufacturingNet.models import XGBoost
import numpy as np

First, we import the datasets and XGBoost modules from ManufacturingNet. The former will help us download and prepare the Mercedes-Benz Greener Manufacturing dataset, and the latter will provide the XGBoost model. We'll also import numpy to read in the dataset.



###### It is important to note that all of the package's dependencies must be installed in your environment. Check the documentation for a comprehensive list of ManufacturingNet's dependencies.

### Getting the Dataset

To download the dataset, simply call the MercedesData() method in the datasets module.

In [2]:
datasets.MercedesData()

Please check your working directory; if you see a new folder called "Mercedes_files," the method worked!

### Reading the Data

The Mercedes-Benz Greener Manufacturing dataset contains many permutations of Mercedes-Benz vehicle features; these features include whether a car has four-wheel drive, air suspension, or a heads-up display. Your task is to predict how much time a car will spend on the test bench given its features. In this tutorial, we will tackle this using an XGBoost regression model.

The Mercedes_files/ folder in your working directory contains two files: merc_features.npy, and merc_labels.npy. The former contains the cars' features, and the latter contains the time spent testing each car. We use the numpy module's load() method to load each file into the program.

In [3]:
X = np.load('./Mercedes_files/merc_features.npy', allow_pickle = True)
Y = np.load('./Mercedes_files/merc_labels.npy', allow_pickle = True)

### Creating the Model

Now, we can create the XGBoost model. To instantiate the model, we simply call the XGBoost constructor, and pass in our features and labels from above.

In [4]:
model = XGBoost(X, Y)

### Optimizing Hyper-Parameters with Grid Search

To start building the model, we call the run_regressor() method on our XGBoost model. When this line runs, a command-line interface in your terminal will guide you through the parameter inputs.

The interface will first ask you if you'd like to use all default values. To use Grid Search, we input 'n' to continue to parameter inputs.

For the first parameter, test_size, enter your preferred testing set size, or press enter to continue.

When prompted to use Grid Search, we input 'y'.

Now, we may select multiple boosters, learning rates, gamma values, tree amounts, and tree depths to try. We have entered some potential candidates below.

After finding the optimal permutation of hyper-parameters, Grid Search will save these values, and the model's parameter inputs will continue. For simplicity's sake, we'll use default values for the remaining inputs.

In [5]:
model.run_regressor()


= XGBRegressor Parameter Inputs =

Default values:
test_size = 0.25
cv = 5
objective = 'reg:squarederror'
n_estimators = 100
max_depth = 3
learning_rate = 0.1
booster = 'gbtree'
n_jobs = 1
nthread = None
gamma = 0
min_child_weight = 1
max_delta_step = 0
subsample = 1
colsample_bytree = 1
colsample_bylevel = 1
reg_alpha = 0
reg_lambda = 1
scale_pos_weight = 1
base_score = 0.5
random_state = 42
missing = None
verbosity = False



Use default parameters (Y/n)?  n



If you are unsure about a parameter, press enter to use its default value.
If you finish entering parameters early, enter 'q' to skip ahead.




What fraction of the dataset should be the testing set (0,1)?  


test_size = 0.25



Use GridSearch to find the best hyperparameters (y/N)?  y



= GridSearch Parameter Inputs =

Enter 'q' to skip GridSearch.

Enter the types of boosters.
Options: 1-'gbtree', 2-'gblinear' or 3-'dart'. Enter 'all' for all options.
Example input: 1,2,3


 1,3


boosters: ['gbtree', 'dart']

Enter a list of learning rates to try out.
Example input: 0.1,0.01,0.001


 0.1,0.01


learning_rates: [0.1, 0.01]

Enter a list of gamma values/minimum loss reductions to try out.
Example input: 0.5,1,1.5


 0.5


gammas: [0.5]

Enter a list of number of trees to try out.
Example input: 1,2,3


 100


n_estimators: [100]

Enter a list of max tree depths to try out.
Example input: 1,2,3


 3,4


max_depths: [3, 4]

= End of GridSearch inputs. =


Best GridSearch Parameters:
 {'booster': 'dart', 'gamma': 0.5, 'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100} 




Enter the number of folds for cross validation [2,):  q


cv = None

= End of inputs; press enter to continue. =


 



= XGBRegressor Results =

Mean Squared Error:  64.53131275651117   

R2 Score:            0.5664428235586155  

R Score:             0.7526239589320921  

Cross Validation Scores: [0.43206355 0.43682215 0.59214757 0.54374362 0.4988061 ]

Feature Importances: [3.4976748e-03 3.1012250e-03 2.1990235e-03 2.6222060e-03 1.4178248e-03
 0.0000000e+00 2.4875063e-03 1.9911753e-03 3.2474571e-03 0.0000000e+00
 0.0000000e+00 0.0000000e+00 1.5964758e-03 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 2.7632935e-03 2.6914261e-03
 0.0000000e+00 0.0000000e+00 1.5125168e-03 0.0000000e+00 0.0000000e+00
 2.2695516e-03 2.7928289e-03 8.3022177e-02 0.0000000e+00 5.5022151e-03
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 2.7438751e-03 0.0000000e+00 0.0000000e+00 0.0000000e+00
 8.4623070e-03 2.2385695e-03 6.3477730e-04 5.6067770e-03 0.0000000e+00
 3.8269758e-03 0.0000000e+00 8

After training the model, we can see how the XGBoost model performed with optimal hyper-parameters.
To keep the processing time reasonable, we were quite conservative with which hyper-parameter values to try. In real-world usage, you may want to try many more values.

At this point, you have a trained XGBoost model with optimal hyper-parameters, courtesy of Grid Search! To check which ManufacturingNet models support Grid Search, visit our documentation: https://manufacturingnet.readthedocs.io/en/latest/.