# Tuning hyper-parameters using Grid Search and ManufacturingNet
##### To learn more about ManufacturingNet, please visit http://manufacturingnet.io/.

In this tutorial, we will use Grid Search to tune the hyper-parameters of a XGBoost regression model trained on the Mercedes-Benz Greener Manufacturing dataset included in ManufacturingNet.

### Imports

In [2]:
from ManufacturingNet import datasets
from ManufacturingNet.models import XGBoost
import numpy as np

First, we import the datasets and XGBoost modules from ManufacturingNet. The former will help us download and prepare the Mercedes-Benz Greener Manufacturing dataset, and the latter will provide the XGBoost model. We'll also import numpy to read in the dataset.



###### It is important to note that all of the package's dependencies must be installed in your environment. Check the documentation for a comprehensive list of ManufacturingNet's dependencies.

### Getting the Dataset

To download the dataset, simply call the MercedesData() method in the datasets module.

In [4]:
datasets.MercedesData()

Downloading...
From: https://drive.google.com/uc?id=1D7eQDV4h6lEXnNE1Cbk1kRU62Dn9xMnb
To: /home/cmu/ManufacturingNet/tutorials/MercedesData.zip
100%|██████████| 220k/220k [00:00<00:00, 2.73MB/s]


Please check your working directory; if you see a new folder called "Mercedes_files," the method worked!

### Reading the Data

The Mercedes-Benz Greener Manufacturing dataset contains many permutations of Mercedes-Benz vehicle features; these features include whether a car has four-wheel drive, air suspension, or a heads-up display. Your task is to predict how much time a car will spend on the test bench given its features. In this tutorial, we will tackle this using an XGBoost regression model.

The Mercedes_files/ folder in your working directory contains two files: merc_features.npy, and merc_labels.npy. The former contains the cars' features, and the latter contains the time spent testing each car. We use the numpy module's load() method to load each file into the program.

In [5]:
X = np.load('./Mercedes_files/merc_features.npy', allow_pickle = True)
Y = np.load('./Mercedes_files/merc_labels.npy', allow_pickle = True)

In [6]:
X.shape, Y.shape

((4209, 377), (4209,))

### Creating the Model

Now, we can create the XGBoost model. To instantiate the model, we simply call the XGBoost constructor, and pass in our features and labels from above.

In [7]:
model = XGBoost(X, Y)

### Optimizing Hyper-Parameters with Grid Search

To start building the model, we call the run_regressor() method on our XGBoost model. When this line runs, a command-line interface in your terminal will guide you through the parameter inputs.

The interface will first ask you if you'd like to use all default values. To use Grid Search, we input 'n' to continue to parameter inputs.

For the first parameter, test_size, enter your preferred testing set size, or press enter to continue.

When prompted to use Grid Search, we input 'y'.

Now, we may select multiple boosters, learning rates, gamma values, tree amounts, and tree depths to try. We have entered some potential candidates below.

After finding the optimal permutation of hyper-parameters, Grid Search will save these values, and the model's parameter inputs will continue. For simplicity's sake, we'll use default values for the remaining inputs.

In [None]:
model.run_regressor()

After training the model, we can see how the XGBoost model performed with optimal hyper-parameters.
To keep the processing time reasonable, we were quite conservative with which hyper-parameter values to try. In real-world usage, you may want to try many more values.

At this point, you have a trained XGBoost model with optimal hyper-parameters, courtesy of Grid Search! To check which ManufacturingNet models support Grid Search, visit our documentation: https://manufacturingnet.readthedocs.io/en/latest/.