<a href="https://colab.research.google.com/github/dcruzsteven/autotuning/blob/master/BayesianHyperparameterTuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Automated Hyperparameter Tuning**

## Installation of Required Dependencies

In [0]:
# Download and install kaggle and xgboost
!pip install kaggle
!pip install xgboost
!pip install sklearn

## Required Imports and Library Definitions

In [0]:
import os # Library for operating system manipulation
import numpy as np # Library for processing numeric vectors
import imageio # Library for dealing with images
import matplotlib.pyplot as plt # Library for plotting images
import pandas as pd # Library for pandas dataframe support
from google.colab import files # Library for colab file upload/download
import xgboost as xgb # Library for gradient-boosted trees
from sklearn.model_selection import KFold # Function for k-fold CV
from sklearn.model_selection import cross_val_score # Function for CV assessment
from sklearn.model_selection import train_test_split # Function for partitioning data

## Process Kaggle Credentials

### Specify kaggle.json (downloaded from Kaggle Profile)

In [0]:
# Must search for kaggle.json file downloaded from Kaggle profile
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"dcruzsteven","key":"12ceeae5fa3155314f993a02fca2fe63"}'}

### Configure Kaggle Environment

For the purpose of this auto-tuning a really old house price
dataset (meant for regression exploration) will be used.  In
order to download the data one must first register for the
competition using the Kaggle account associated with the
kaggle.json file above

In [0]:
# Construct .kaggle subdirectory (required by the kaggle python library)
!mkdir -p ~/.kaggle
# Move downloaded kaggle.json file to .kaggle directory
!mv kaggle.json ~/.kaggle/
# Change permissions on file
!chmod 600 /root/.kaggle/kaggle.json
# Download kaggle competition
!kaggle competitions download -c house-prices-advanced-regression-techniques

Downloading sample_submission.csv to /content
  0% 0.00/31.2k [00:00<?, ?B/s]
100% 31.2k/31.2k [00:00<00:00, 30.7MB/s]
Downloading test.csv to /content
  0% 0.00/441k [00:00<?, ?B/s]
100% 441k/441k [00:00<00:00, 60.2MB/s]
Downloading train.csv to /content
  0% 0.00/450k [00:00<?, ?B/s]
100% 450k/450k [00:00<00:00, 59.0MB/s]
Downloading data_description.txt to /content
  0% 0.00/13.1k [00:00<?, ?B/s]
100% 13.1k/13.1k [00:00<00:00, 12.8MB/s]


## Necessary Data Manipulation

### Process Raw Data

In [0]:
# Import raw dataset
rawData = pd.read_csv("train.csv")
# Remove entries with missing sale prices
rawData = rawData[~rawData.SalePrice.isna()]
# Extract labels
dataLabels = rawData['SalePrice']
# Extract features
dataFeatures = rawData.drop(['SalePrice'], axis=1)

### Partition Raw Data

In [0]:
# Construct Training+Validation and Test Partitions
dataFeaturesTrainValid, dataFeaturesTest, \
dataLabelsTrainValid,  dataLabelsTest = train_test_split(dataFeatures.as_matrix(), dataLabels.as_matrix(), test_size=0.15)

##Construct Model Template

### Define Objective Function (for Auto-Tuner to Minimize)



In [0]:
def objective():
  """
  Construct the objective function for the optimizer to minimize
  """

In [0]:
rawData[:5]

In [0]:
!cat data_description.txt