# Problem Statement

## About the Dataset

**California Housing Data**

This data set contains information about all the block groups in California from the 1990 Census. In this sample a block group on average includes 1425.5 individuals living in a geographically compact area. 

*The Features:*
 
* housingMedianAge: continuous. 
* totalRooms: continuous. 
* totalBedrooms: continuous. 
* population: continuous. 
* households: continuous. 
* medianIncome: continuous. 
* medianHouseValue: continuous. 

## Task 
The task is to aproximate the median house value of each block from the values of the rest of the variables. 

 It has been obtained from the LIACC repository. The original page where the data set can be found is: http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html.
 

## Importing Required Packages

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import tensorflow.compat.v1 as tf

### Required Package Setup

In [2]:
tf.disable_eager_execution()

## Reading CSV

**Performing following actions:**

* Importing the cal_housing_clean.csv file with pandas. 
* Separating it into a training (70%) and testing set(30%).

In [3]:
cal_housing_data = pd.read_csv('cal_housing_clean.csv')

### Printing Dataset Head

In [4]:
cal_housing_data.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome,medianHouseValue
0,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0
1,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0
2,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0
3,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0
4,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0


### Describing the Dataset

In [5]:
cal_housing_data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
housingMedianAge,20640.0,28.639486,12.585558,1.0,18.0,29.0,37.0,52.0
totalRooms,20640.0,2635.763081,2181.615252,2.0,1447.75,2127.0,3148.0,39320.0
totalBedrooms,20640.0,537.898014,421.247906,1.0,295.0,435.0,647.0,6445.0
population,20640.0,1425.476744,1132.462122,3.0,787.0,1166.0,1725.0,35682.0
households,20640.0,499.53968,382.329753,1.0,280.0,409.0,605.0,6082.0
medianIncome,20640.0,3.870671,1.899822,0.4999,2.5634,3.5348,4.74325,15.0001
medianHouseValue,20640.0,206855.816909,115395.615874,14999.0,119600.0,179700.0,264725.0,500001.0


### Separating Features from Predicted Values

In [6]:
medianHouseValue = cal_housing_data['medianHouseValue']

In [7]:
X_data = cal_housing_data.drop(columns=['medianHouseValue'], axis=1)

### Performing Train-Test Split

In [8]:
from sklearn.model_selection import train_test_split

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X_data, medianHouseValue, test_size=0.3, random_state=101)

## Data Preprocessing

### Scaling the Feature Data

**Using sklearn preprocessing to create a MinMaxScaler for the feature data. Then using it to transform X_test and X_train.**

In [10]:
from sklearn.preprocessing import MinMaxScaler

In [11]:
scaler = MinMaxScaler(copy=True)

In [12]:
scaler.fit(X_train, y_train)

MinMaxScaler()

In [13]:
scaled_X_train = scaler.transform(X_train)
scaled_X_train_df = pd.DataFrame(scaled_X_train, columns=X_train.columns, index=X_train.index)

In [14]:
scaled_X_train_df.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome
6761,0.352941,0.069688,0.117163,0.048769,0.115442,0.142508
3010,0.607843,0.011242,0.015673,0.008367,0.014142,0.045027
7812,0.666667,0.02523,0.031347,0.020971,0.030258,0.212866
8480,0.666667,0.03253,0.03383,0.024752,0.030094,0.298651
1051,0.294118,0.031919,0.035692,0.019466,0.034863,0.272631


In [15]:
scaled_X_test = scaler.transform(X_test)
scaled_X_test_df = pd.DataFrame(scaled_X_test, columns=X_test.columns, index=X_test.index)

In [16]:
scaled_X_test_df.head()

Unnamed: 0,housingMedianAge,totalRooms,totalBedrooms,population,households,medianIncome
16086,0.686275,0.046264,0.045158,0.025873,0.048841,0.353133
8816,0.705882,0.027417,0.020795,0.012709,0.023187,0.770182
7175,0.901961,0.032326,0.040813,0.041662,0.042592,0.133626
16714,0.313725,0.043212,0.046089,0.03284,0.048018,0.263576
14491,0.411765,0.088433,0.069367,0.043728,0.072192,0.660046


## Training the model

### Creating Feature Columns

In [17]:
cal_housing_data.columns

Index(['housingMedianAge', 'totalRooms', 'totalBedrooms', 'population',
       'households', 'medianIncome', 'medianHouseValue'],
      dtype='object')

In [18]:
feat_cols = {}
for col in scaled_X_train_df.columns:
    feat_cols[col] = tf.feature_column.numeric_column(col)

### Creating Input Function

**Create the input function for the estimator object.**

In [19]:
train_input_func = tf.estimator.inputs.pandas_input_fn(x=scaled_X_train_df, y=y_train, batch_size=10, num_epochs=1000, shuffle=True)





### Creating the Estimator Model

**Creating the estimator model, using the DNNRegressor.**

In [20]:
regressor_model = tf.estimator.DNNRegressor([6, 6, 6], feature_columns=feat_cols.values())

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\niks8\\AppData\\Local\\Temp\\tmpge217w4s', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


### Running the Training Simulation

**Training the model.**

In [21]:
regressor_model.train(input_fn=train_input_func, steps=25000)

Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\niks8\AppData\Local\Temp\tmpge217w4s\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoi

INFO:tensorflow:global_step/sec: 306.714
INFO:tensorflow:loss = 58222350000.0, step = 5700 (0.318 sec)
INFO:tensorflow:global_step/sec: 336.655
INFO:tensorflow:loss = 50018566000.0, step = 5800 (0.307 sec)
INFO:tensorflow:global_step/sec: 312.262
INFO:tensorflow:loss = 91043610000.0, step = 5900 (0.329 sec)
INFO:tensorflow:global_step/sec: 256.629
INFO:tensorflow:loss = 123272280000.0, step = 6000 (0.376 sec)
INFO:tensorflow:global_step/sec: 284.544
INFO:tensorflow:loss = 105223740000.0, step = 6100 (0.357 sec)
INFO:tensorflow:global_step/sec: 286.681
INFO:tensorflow:loss = 67487486000.0, step = 6200 (0.338 sec)
INFO:tensorflow:global_step/sec: 236.617
INFO:tensorflow:loss = 155310180000.0, step = 6300 (0.423 sec)
INFO:tensorflow:global_step/sec: 256.29
INFO:tensorflow:loss = 102417600000.0, step = 6400 (0.404 sec)
INFO:tensorflow:global_step/sec: 275.277
INFO:tensorflow:loss = 110649990000.0, step = 6500 (0.356 sec)
INFO:tensorflow:global_step/sec: 299.571
INFO:tensorflow:loss = 15816

INFO:tensorflow:global_step/sec: 312.489
INFO:tensorflow:loss = 72377260000.0, step = 13600 (0.324 sec)
INFO:tensorflow:global_step/sec: 357.53
INFO:tensorflow:loss = 49766433000.0, step = 13700 (0.286 sec)
INFO:tensorflow:global_step/sec: 333.312
INFO:tensorflow:loss = 45937836000.0, step = 13800 (0.300 sec)
INFO:tensorflow:global_step/sec: 333.445
INFO:tensorflow:loss = 215395910000.0, step = 13900 (0.300 sec)
INFO:tensorflow:global_step/sec: 333.281
INFO:tensorflow:loss = 55581450000.0, step = 14000 (0.300 sec)
INFO:tensorflow:global_step/sec: 333.071
INFO:tensorflow:loss = 42609046000.0, step = 14100 (0.300 sec)
INFO:tensorflow:global_step/sec: 333.265
INFO:tensorflow:loss = 190684640000.0, step = 14200 (0.293 sec)
INFO:tensorflow:global_step/sec: 333.608
INFO:tensorflow:loss = 158195250000.0, step = 14300 (0.303 sec)
INFO:tensorflow:global_step/sec: 333.136
INFO:tensorflow:loss = 119274730000.0, step = 14400 (0.304 sec)
INFO:tensorflow:global_step/sec: 333.3
INFO:tensorflow:loss =

INFO:tensorflow:loss = 100383870000.0, step = 20900 (0.305 sec)
INFO:tensorflow:global_step/sec: 312.224
INFO:tensorflow:loss = 197423400000.0, step = 21000 (0.322 sec)
INFO:tensorflow:global_step/sec: 321.957
INFO:tensorflow:loss = 114947965000.0, step = 21100 (0.304 sec)
INFO:tensorflow:global_step/sec: 317.842
INFO:tensorflow:loss = 47187620000.0, step = 21200 (0.314 sec)
INFO:tensorflow:global_step/sec: 308.102
INFO:tensorflow:loss = 84838010000.0, step = 21300 (0.320 sec)
INFO:tensorflow:global_step/sec: 322.903
INFO:tensorflow:loss = 200026260000.0, step = 21400 (0.310 sec)
INFO:tensorflow:global_step/sec: 332.689
INFO:tensorflow:loss = 115525860000.0, step = 21500 (0.310 sec)
INFO:tensorflow:global_step/sec: 316.515
INFO:tensorflow:loss = 113987780000.0, step = 21600 (0.312 sec)
INFO:tensorflow:global_step/sec: 308.524
INFO:tensorflow:loss = 78432000000.0, step = 21700 (0.329 sec)
INFO:tensorflow:global_step/sec: 308.785
INFO:tensorflow:loss = 28307196000.0, step = 21800 (0.319 

<tensorflow_estimator.python.estimator.canned.dnn.DNNRegressor at 0x2001e62df70>

## Predictions on the model

### Creating Prediction Input Function

**Creating a prediction input function.**

In [22]:
pred_input_func = tf.estimator.inputs.pandas_input_fn(x=scaled_X_test_df, batch_size=10, num_epochs=1, shuffle=False)

### Running the Prediction on the model

In [23]:
predictions = regressor_model.predict(input_fn = pred_input_func)

### Conversion of Results

In [24]:
predictions = [pred['predictions'] for pred in list(predictions)]

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\niks8\AppData\Local\Temp\tmpge217w4s\model.ckpt-25000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


## Calculating Loss over Predictions

In [25]:
from sklearn.metrics import mean_squared_error

In [26]:
mean_squared_error(y_test, predictions)**0.5

99729.88207803387

## Conclusion:

In [27]:
print(f"The final loss on the evaluation set is {mean_squared_error(y_test, predictions)**0.5}")

The final loss on the evaluation set is 99729.88207803387
