# Automatic tuning of XGBoost parameters using XGBTune
Thanks to the work of [Romain Picard](https://github.com/MainRo) there is now a package, called [XGBTune](https://github.com/MainRo/xgbtune), to automatically tune the parametrs of [XGBoost](https://xgboost.readthedocs.io/en/latest/parameter.html).
From the GitHub page:

## Tuning steps

The tuning is done in the following steps:

*    compute best round
*    tune max_depth and min_child_weight
*    tune gamma
*    re-compute best round
*    tune subsample and colsample_bytree
*    fine tune subsample and colsample_bytree
*    tune alpha and lambda
*    tune seed

This steps can be repeated several times. By default, two passes are done.

Here we shall use the [House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques) data as an example.

### Install `XGBTune`

In [1]:
!pip install xgbtune

Collecting xgbtune
  Downloading xgbtune-1.1.0.tar.gz (5.0 kB)
Building wheels for collected packages: xgbtune
  Building wheel for xgbtune (setup.py) ... [?25l- \ done
[?25h  Created wheel for xgbtune: filename=xgbtune-1.1.0-py2.py3-none-any.whl size=4852 sha256=e78791d567bbfc6df43826f7cf0d6909244104f6b279c5ce9dfa411eb109da81
  Stored in directory: /root/.cache/pip/wheels/0b/fc/d2/806972c7b07e47bc31b3714680fc8407c6a3174f49e45b19ef
Successfully built xgbtune
Installing collected packages: xgbtune
Successfully installed xgbtune-1.1.0


### set up the House Prices data

In [2]:
import pandas  as pd
import xgboost as xgb

#===========================================================================
# read in the data
#===========================================================================
train_data = pd.read_csv('../input/house-prices-advanced-regression-techniques/train.csv')
test_data  = pd.read_csv('../input/house-prices-advanced-regression-techniques/test.csv')

#===========================================================================
# select some features
#===========================================================================
features = ['MSSubClass', 'LotArea', 'OverallQual', 'OverallCond', 
        'YearBuilt', 'YearRemodAdd', 'BsmtFinSF1', 'BsmtFinSF2', 
        'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 
        'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 
        'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 
        'TotRmsAbvGrd', 'Fireplaces', 'GarageCars', 'GarageArea', 
        'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 
        'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold']

#===========================================================================
#===========================================================================
X_train       = train_data[features]
y_train       = train_data["SalePrice"]
X_test        = test_data[features]

### run `xgbtune`
Here we use the root of the mean squared logarithmic error regression loss (`rmsle`) as per the competition requirements

In [3]:
from xgbtune import tune_xgb_model
params = {'eval_metric': 'rmsle'}
params, round_count = tune_xgb_model(params, X_train, y_train)

tuning pass 0...
computing best round...
best round: 20
tuning max_depth and min_child_weight ...
best loss: 0.0342
best max_depth: 8
best min_child_weight: 1
tuning gamma ...
best loss: 0.0342
best gamma: 0.0
re-computing best round...
best round: 20
tuning subsample and colsample_bytree ...
best loss: 0.0342
best subsample: 1.0
best colsample_bytree: 1.0
fine tuning subsample and colsample_bytree ...
best loss: 0.0336
best subsample: 0.95
best colsample_bytree: 1.0
tuning alpha and lambda ...
best loss: 0.0336
best alpha: 0
best lambda: 1
tuning seed ...
best loss: 0.0336
best seed: 0
{'eval_metric': 'rmsle', 'max_depth': 8, 'min_child_weight': 1, 'gamma': 0.0, 'subsample': 0.95, 'colsample_bytree': 1.0, 'alpha': 0, 'lambda': 1, 'seed': 0}
tuning pass 1...
computing best round...
best round: 22
tuning max_depth and min_child_weight ...
best loss: 0.0314
best max_depth: 8
best min_child_weight: 1
tuning gamma ...
best loss: 0.0314
best gamma: 0.0
re-computing best round...
best round:

### now fit using the parameters, predict, and write out the `submission.csv` file

In [4]:
#===========================================================================
# now use the parameters from XGBTune
#===========================================================================
regressor=xgb.XGBRegressor(**params)

regressor.fit(X_train, y_train)

#===========================================================================
# use the fit to predict the prices for the test data
#===========================================================================
predictions = regressor.predict(X_test)

#===========================================================================
# write out CSV submission file
#===========================================================================
output = pd.DataFrame({"Id":test_data.Id, "SalePrice":predictions})
output.to_csv('submission.csv', index=False)