# GENERAL INTRODUCTION TO HYPERPARAMETERS:

**Hyperparameters are certain weights that determine the learning process of an algorithm. 
XGBoost algorithm has become the ultimate weapon of many data scientist. It’s a highly sophisticated algorithm, powerful enough to deal with all sorts of irregularities of data. It is a powerful machine learning algorithm especially where speed and accuracy are concerned. Building a model using XGBoost is easy. But, improving the model using XGBoost is difficult as it contains multiple parameters.**

# XGBOOST PARAMETERS
**GENERAL PARAMETERS:** **They define the overall functionality of XGBoost algorithm. These include the following parameters:**

**a) booster [default=gbtree]**

**b) silent [default=0]**

**c) nthread [default to maximum number of threads available if not set]**

**BOOSTER PARAMETERS:-** **Guide the individual booster (tree/regression) at each step**

**a)eta [default=0.3]**

**b) min_child_weight [default=1]**

**c) min_child_weight [default=1]**

**d) max_leaf_nodes**

**e) gamma [default=0]**

**f) max_delta_step [default=0]**

**g) subsample [default=1]**

**h) colsample_bytree [default=1]**

**i) colsample_bylevel [default=1]**

**j) lambda [default=1]**

**k) alpha [default=0]**

**l) scale_pos_weight [default=1]**

**LEARNING TASK PARAMETERS:-** **They Guide the optimization performed**

**a) objective [default=reg:linear]**

**b) eval_metric [ default according to objective ]**

**c) seed [default=0]**

# Hyperparameter Tuning with an example

**Now let us perform the hyperparameter tuning on the dataset to understand the working**

**Importing the libraries**

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import pandas as pd
import numpy as np
import xgboost as xgb
import matplotlib.pyplot as plt
import seaborn as sns
from xgboost.sklearn import XGBClassifier
from sklearn.model_selection import cross_validate   #Additional scklearn functions
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from hyperopt import STATUS_OK, Trials, fmin, hp, tpe

In [None]:
import warnings  
warnings.filterwarnings('ignore')
import matplotlib.pylab as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 12, 4

train = pd.read_csv('../input/bank-customers/Churn Modeling.csv')
target = 'Exited'
IDcol = 'ID'

**Importing and reading of data**

In [None]:
train

In [None]:
train.shape

In [None]:
train.drop(columns=['Surname', 'Geography'], axis=1, inplace=True)

In [None]:
train['Gender']=train['Gender'].map({'Female':0, 'Male':1})

In [None]:
train

In [None]:
train.info()

In [None]:
train.shape

In [None]:
train.columns

In [None]:
features=['RowNumber', 'CustomerId', 'CreditScore', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary' ]

In [None]:
y=train['Exited']
X=train[features]

In [None]:
y.shape

In [None]:
X.shape

In [None]:
X.info

**Splitting into train and test set**

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)

**Correlation heatmap**

In [None]:
corrmat=train.corr()
top_corr_features=corrmat.index
plt.figure(figsize=(20,20))
fig=sns.heatmap(train[top_corr_features].corr(),annot=True,
                cmap="Accent")

# Bayesian Optimisation with HYPEROPT

**Optimization is the process of finding a minimum of cost function , that determines an overall better performance of a model on both train-set and test-set.**

**Bayesian optimization is optimization or finding the best parameter for a machine learning or deep learning algorithm.**

**Here, we train the model with various possible range of parameters until a best fit model is obtained.**

**Hyperparameter tuning helps in determining the optimal tuned parameters and return the best fit model.**

# HYPEROPT
**HYPEROPT is a powerful python library that search through an hyperparameter space of values and find the best possible values that yield the minimum of the loss function.**

**Bayesian Optimization technique uses Hyperopt to tune the model hyperparameters. Hyperopt is a Python library which is used to tune model hyperparameters.**

**Initializing Domain space for range of values**

In [None]:
space={'max_depth': hp.quniform("max_depth", 3, 18, 1),
        'gamma': hp.uniform ('gamma', 1,9),
        'reg_alpha' : hp.quniform('reg_alpha', 40,180,1),
        'reg_lambda' : hp.uniform('reg_lambda', 0,1),
        'colsample_bytree' : hp.uniform('colsample_bytree', 0.5,1),
        'min_child_weight' : hp.quniform('min_child_weight', 0, 10, 1),
        'n_estimators': 180,
        'seed': 0
    }

**Define Objective Function**

In [None]:
def objective(space):
    clf=xgb.XGBClassifier(
                    n_estimators =space['n_estimators'], max_depth = int(space['max_depth']), gamma = space['gamma'],
                    reg_alpha = int(space['reg_alpha']),min_child_weight=int(space['min_child_weight']),
                    colsample_bytree=int(space['colsample_bytree']))
    
    evaluation = [( X_train, y_train), ( X_test, y_test)]
    
    clf.fit(X_train, y_train,
            eval_set=evaluation, eval_metric="auc",
            early_stopping_rounds=10,verbose=False)
    

    pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, pred>0.5)
    print ("SCORE:", accuracy)
    return {'loss': -accuracy, 'status': STATUS_OK }

**Optimization Algorithm**

In [None]:
trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 100,
                        trials = trials)

**Results**

In [None]:
print("The best hyperparameters are : ","\n")
print(best_hyperparams)