<a href="https://colab.research.google.com/github/dcruzsteven/autotuning/blob/master/autotuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Automated Hyperparameter Tuning**

## Installation of Required Dependencies

In [1]:
# Download and install kaggle
!pip install kaggle
!pip install xgboost



## Required Imports and Library Definitions

In [0]:
import os # Library for operating system manipulation
import numpy as np # Library for processing numeric vectors
import imageio # Library for dealing with images
import matplotlib.pyplot as plt # Library for plotting images
import pandas as pd # Library for pandas dataframe support
from google.colab import files # Library for colab file upload/download
import xgboost as xgb # Library for gradient-boosted trees

## Process Kaggle Credentials

### Specify kaggle.json (downloaded from Kaggle Profile)

In [3]:
# Must search for kaggle.json file downloaded from Kaggle profile
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"dcruzsteven","key":"12ceeae5fa3155314f993a02fca2fe63"}'}

### Configure Kaggle Environment

For the purpose of this auto-tuning a really old house price
dataset (meant for regression exploration) will be used.  In
order to download the data one must first register for the
competition using the Kaggle account associated with the
kaggle.json file above

In [5]:
# Construct .kaggle subdirectory (required by the kaggle python library)
!mkdir -p ~/.kaggle
# Move downloaded kaggle.json file to .kaggle directory
!mv kaggle.json ~/.kaggle/
# Change permissions on file
!chmod 600 /root/.kaggle/kaggle.json
# Download kaggle competition
!kaggle competitions download -c house-prices-advanced-regression-techniques

mv: cannot stat 'kaggle.json': No such file or directory
Downloading sample_submission.csv to /content
  0% 0.00/31.2k [00:00<?, ?B/s]
100% 31.2k/31.2k [00:00<00:00, 28.3MB/s]
Downloading test.csv to /content
  0% 0.00/441k [00:00<?, ?B/s]
100% 441k/441k [00:00<00:00, 59.9MB/s]
Downloading train.csv to /content
  0% 0.00/450k [00:00<?, ?B/s]
100% 450k/450k [00:00<00:00, 58.2MB/s]
Downloading data_description.txt to /content
  0% 0.00/13.1k [00:00<?, ?B/s]
100% 13.1k/13.1k [00:00<00:00, 9.24MB/s]


## Necessary Data Manipulation

### Create Training and Test Datasets

In [7]:
!cat data_description.txt

MSSubClass: Identifies the type of dwelling involved in the sale.	

        20	1-STORY 1946 & NEWER ALL STYLES
        30	1-STORY 1945 & OLDER
        40	1-STORY W/FINISHED ATTIC ALL AGES
        45	1-1/2 STORY - UNFINISHED ALL AGES
        50	1-1/2 STORY FINISHED ALL AGES
        60	2-STORY 1946 & NEWER
        70	2-STORY 1945 & OLDER
        75	2-1/2 STORY ALL AGES
        80	SPLIT OR MULTI-LEVEL
        85	SPLIT FOYER
        90	DUPLEX - ALL STYLES AND AGES
       120	1-STORY PUD (Planned Unit Development) - 1946 & NEWER
       150	1-1/2 STORY PUD - ALL AGES
       160	2-STORY PUD - 1946 & NEWER
       180	PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
       190	2 FAMILY CONVERSION - ALL STYLES AND AGES

MSZoning: Identifies the general zoning classification of the sale.
		
       A	Agriculture
       C	Commercial
       FV	Floating Village Residential
       I	Industrial
       RH	Residential High Density
       RL	Residential Low Density
       RP	Residential Low Density Park 
       RM