# Boston Housing Data
We are going to solve this data by predicting the Median value of owner-occupied homes in 1000s. I'm going to use a Neural Network'd regressor.

## Data
samples: 506
features: real, positive
Total size = [506, 14]

We need to pull out the label
x: [506, 13]
y: [506, 1]

data = { [x, y] }

Then we need to split the data
80%/20%

Training set: 80%
training_set = { x_train, y_train }

Test set: 20%
test_set = { x_test, y_test }

## Implementation
1. Get the data
2. Preprocess the data
3. Setup model
4. Setup input pipelines
5. Train the model
6. Test the model
7. Make a prediction

In [1]:
# Dependancies
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
import os
import collections
import time
import numpy as np
import requests
import csv

## New stuff
from sklearn import datasets
from sklearn import model_selection

In [2]:
# Configurations
# instanitate dataset object
Dataset = collections.namedtuple('Dataset', ['data', 'target'])
    
# TF Logging
tf.logging.set_verbosity(tf.logging.INFO)

# Directories
model_dir_root = '/tmp/housing'
# model_dir = os.path.join(model_dir_root, "linear_" + str(int(time.time())))
model_dir = os.path.join(model_dir_root, "deep_" + str(int(time.time())))
data_file = 'boston_housing.csv'

# hyper parameters
batch_size = 10
num_steps = 2000

In [3]:
def get_data ():
#     raw_data = '[0,0,0,0,0,0,0,0,0,0,0,0,0,0]'
    raw_data = datasets.load_boston()
    
    return raw_data 

In [4]:
def explore_data (data):
    print('Description: {}\n'.format(data.DESCR))
    print('Features: {}\n'.format(data.feature_names))
    print('Example 0: {}\n'.format(data.data[0]))
    print('Target 0: {}\n'.format(data.target[0]))

In [5]:
def split_dataset (data, test_size):
    # set x and y
    x, y = data.data, data.target
    
    # split data
    x_train, x_test, y_train, y_test = model_selection.train_test_split(
        x, y, test_size=test_size, random_state=42)
    
    # Create the training set
    train_set = create_dataset(x_train, y_train)
    
    # Create the test set
    test_set = create_dataset(x_test, y_test)
    
    return test_set, train_set

In [6]:
def create_dataset (x, y):
    # convert to np
    target = np.array(y, dtype=np.int)
    data = np.array(x)
    
    # merge into Dataset object
    new_dataset = Dataset(data=data, target=target)
    
    return new_dataset

In [7]:
def construct_dataset ():
    # get the data
    raw_data = get_data()
    
    # explore the data
    explore_data(raw_data)
    
    # split the dataset 
    # 80%/20%
    test_set, training_set = split_dataset(raw_data, .2)


In [8]:
# Main Program
construct_dataset()

Description: Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio b

In [9]:
# tensorboard --logdir=/tmp/housing