# CIS 535 Analysis of Dataset "V"

Here, we consume file `./data/v.txt` to obtain a set of 1000 observations. The dataset has an unknown origin, and goals are not specified on whether we should expect any correlations in the data.

Each observation with 10 columns. The first 9 columns representing inputs, and the last column respresenting the output.

We will use Tensorflow to analyze these records and create a neural network.

In order to render graphs in our notebook, we need to install `matplotlib` into our tensorflow conda environment.


In [23]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

# Perhaps not necessary
import numpy
import matplotlib.pyplot as plt
rng = numpy.random

## Reading in Data

Here, we read our data from `./data/v.txt` into two separate variables: `v_inputs` and `v_output`.

In [21]:
# Data sets
V_TRAINING = "data/v_training.csv"
V_TEST     = "data/v_test.csv"
V_PREDICT  = "data/v_predict.csv"

# We do not know exactly what our data represents,
# but we do know our label is the last column.
COLUMNS  = "a b c d e f g h i j label".split(" ")
FEATURES = COLUMNS[:-1]
LABEL    = COLUMNS[-1]

print("COLUMNS", COLUMNS)
print("FEATURES", FEATURES)
print("LABEL", LABEL)

COLUMNS ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'label']
FEATURES ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
LABEL label


In [24]:
# Read in data

# skip header row
training_set = pd.read_csv(V_TRAINING, skipinitialspace=True,
                           skiprows=1, names=COLUMNS)
test_set = pd.read_csv(V_TEST, skipinitialspace=True,
                       skiprows=1, names=COLUMNS)
prediction_set = pd.read_csv(V_PREDICT, skipinitialspace=True,
                             skiprows=1, names=COLUMNS)


### Feature Columns

Now that we have our data, we will create a list of `FeatureColumn` for the input data.

All of our columns consist of continuous values, so we may declare all of them using:

```python
feature_cols = [tf.contrib.layers.real_valued_column(k)
                  for k in FEATURES]
```

In [25]:
feature_cols = [tf.contrib.layers.real_valued_column(k)
                for k in FEATURES]

### Regressor

And, we can create our regressor.

> Now, instantiate a `DNNRegressor` for the neural network regression model. You'll need to provide two arguments here: `hidden_units`, a hyperparameter specifying the number of nodes in each hidden layer (here, two hidden layers with 10 nodes each), and `feature_columns`, containing the list of FeatureColumns you just defined:



In [31]:
regressor = tf.contrib.learn.DNNRegressor(
    feature_columns=feature_cols, hidden_units=[10, 10])

Explicitly set `enable_centered_bias` to 'True' if you want to keep existing behaviour.
INFO:tensorflow:Using config: {'num_ps_replicas': 0, 'keep_checkpoint_every_n_hours': 10000, 'evaluation_master': '', 'tf_random_seed': None, '_is_chief': True, 'save_summary_steps': 100, 'master': '', 'tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1
}
, 'keep_checkpoint_max': 5, 'save_checkpoints_secs': 600, 'task': 0, '_job_name': None, 'cluster_spec': None}


### Input Function

The body of the input function contains the specific logic for preprocessing your input data, such as scrubbing out bad examples or feature scaling.

In [32]:
def input_fn(data_set):
    # Values of feature columns
    feature_cols = {k: tf.constant(data_set[k].values)
                    for k in FEATURES }
    # Return the label values of the passed data_set
    label_tensor = tf.constant(data_set[LABEL].values)
    
    # Return your data split apart for the Regressor,
    # This allows the columns to be any order, and to quickly change which
    # columns you want to observe.
    return feature_cols, label_tensor


Note that the input data is passed into input_fn in the data_set argument, which means the function can process any of the DataFrames you've imported: training_set, test_set, and prediction_set.

### Training the Regressor

In [33]:
regressor.fit(input_fn=lambda: input_fn(training_set), steps=5000)

INFO:tensorflow:Setting feature info to {'b': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(900)]), is_sparse=False), 'c': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(900)]), is_sparse=False), 'e': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(900)]), is_sparse=False), 'i': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(900)]), is_sparse=False), 'd': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(900)]), is_sparse=False), 'j': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(900)]), is_sparse=False), 'f': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(900)]), is_sparse=False), 'g': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(900)]), is_sparse=False), 'a': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(900)]), is_sparse=False), 'h': TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(900)]), is_sparse=False)}
INFO:tensorflow:Set

DNNRegressor(dropout=None, feature_columns=[_RealValuedColumn(column_name='a', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='b', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='c', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='d', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='e', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='f', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='g', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='h', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(column_name='i', dimension=1, default_value=None, dtype=tf.float32, normalizer=None), _RealValuedColumn(co

### Evaluating the Model

> Next, see how the trained model performs against the test data set. Run evaluate, and this time pass the test_set to the input_fn:

In [34]:
test_results

INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='a', dimension=1, default_value=None, dtype=tf.float32, normalizer=None)
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='b', dimension=1, default_value=None, dtype=tf.float32, normalizer=None)
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='c', dimension=1, default_value=None, dtype=tf.float32, normalizer=None)
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='d', dimension=1, default_value=None, dtype=tf.float32, normalizer=None)
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='e', dimension=1, default_value=None, dtype=tf.float32, normalizer=None)
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='f', dimension=1, default_value=None, dtype=tf.float32, normalizer=None)
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='g', dimension=1, default_value=None, dtype=

Now, we have created an evaluation, we can output a loss score from our `ev` results.

In [35]:
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))

Loss: 0.001041


### Making Predictions

So, now that we have trained and evaluated our model, we can start to try making predictions with it!

We can use `regressor.predict` to only read from the feature columns, and forecast the labels.

In [48]:
y = regressor.predict(input_fn=lambda: input_fn(prediction_set), as_iterable=False)

print("Predictions: {}".format(str(y)))

Instructions for updating:
The default behavior of predict() is changing. The default value for
as_iterable will change to True, and then the flag will be removed
altogether. The behavior of this flag is described below.
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='a', dimension=1, default_value=None, dtype=tf.float32, normalizer=None)
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='b', dimension=1, default_value=None, dtype=tf.float32, normalizer=None)
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='c', dimension=1, default_value=None, dtype=tf.float32, normalizer=None)
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='d', dimension=1, default_value=None, dtype=tf.float32, normalizer=None)
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='e', dimension=1, default_value=None, dtype=tf.float32, normalizer=None)
INFO:tensorflow:Transforming feature_column 

Predictions: [-0.03536016 -0.49997067 -0.0100856  -0.42367306 -0.36693686]


In [None]:
# Training Data
# first 9 columns
f = open('data/v.txt', 'r')
lns = f.readlines()

# Create a 2D array of the rows, and values
v_data = [ [ float(f)
        for f in l.split('\t') ]
            for l in lns ]

v_inputs = [record[0:10] for record in v_data]
v_output = [record[10] for record in v_data]


In [None]:

# Let us start building our neural network with a model

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")



In [None]:
# Construct a linear model
pred = tf.add(tf.mul(X, W), b)


In [28]:
#self.name = f.readline().rstrip()
#self.age = int(f.readline())
#self.height = float(f.readline())
#self.weight = int(f.readline())

# this input is of the 2D shape [any length], [9]
rX = tf.placeholder(tf.float32, [None, 9])
rY = tf.placeholder(tf.float32)


train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])

n_samples = train_X.shape[0]

In [8]:
print(train_Y)
print(n_samples)

[ 1.7    2.76   2.09   3.19   1.694  1.573  3.366  2.596  2.53   1.221
  2.827  3.465  1.65   2.904  2.42   2.94   1.3  ]
17


In [2]:
"""Model training for Iris data set using Validation Monitor."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import tensorflow as tf
from tensorflow.contrib.learn.python.learn.metric_spec import MetricSpec

tf.logging.set_verbosity(tf.logging.INFO)

# Data sets
V_TRAINING = "data/v_training.csv"
V_TEST = "data/v_test.csv"

# Load datasets.
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=V_TRAINING, target_dtype=np.float, features_dtype=np.float)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=V_TEST, target_dtype=np.float, features_dtype=np.float)

validation_metrics = {
    "accuracy": MetricSpec(
                        metric_fn=tf.contrib.metrics.streaming_accuracy,
                        prediction_key="classes"),
    "recall": MetricSpec(
                        metric_fn=tf.contrib.metrics.streaming_recall,
                        prediction_key="classes"),
    "precision": MetricSpec(
                        metric_fn=tf.contrib.metrics.streaming_precision,
                        prediction_key="classes")
                      }
validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(
    test_set.data,
    test_set.target,
    every_n_steps=50,
    metrics=validation_metrics,
    early_stopping_metric="loss",
    early_stopping_metric_minimize=True,
    early_stopping_rounds=200)

# Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=10)]

# Build 3 layer DNN with 10, 20, 10 units respectively.
estimator = tf.contrib.learn.DNNRegressor(feature_columns=feature_columns,
                                          hidden_units=[10, 20, 10],
                                          n_classes=3,
                                          model_dir="/tmp/v_model",
                                          config=tf.contrib.learn.RunConfig(
                                              save_checkpoints_secs=1))

# Fit model.
estimator.fit(x=training_set.data,
               y=training_set.target,
               steps=2000,
               monitors=[validation_monitor])

# Evaluate accuracy.
accuracy_score = estimator.evaluate(x=test_set.data,
                                     y=test_set.target)["accuracy"]
print("Accuracy: {0:f}".format(accuracy_score))

# Classify two new flower samples.
new_samples = np.array(
    [[0.865874708,0.820106596,0.150704431,0.627245984,0.19453678,0.913716318,0.569297894,0.643728125,0.275551608,0.649479039],
     # expecting 0.177251929
     [0.343617366,0.140894357,0.9011357,0.416433035,0.714652943,0.953001468,0.2806044,0.538037583,0.123400512,0.181083975]
     # expecting -0.633059251
    ], dtype=float)

y = list(estimator.predict(new_samples, as_iterable=True))
print("Predictions: {}".format(str(y)))

Explicitly set `enable_centered_bias` to 'True' if you want to keep existing behaviour.
INFO:tensorflow:Using config: {'num_ps_replicas': 0, 'keep_checkpoint_every_n_hours': 10000, 'evaluation_master': '', 'tf_random_seed': None, '_is_chief': True, 'save_summary_steps': 100, 'master': '', 'tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1
}
, 'keep_checkpoint_max': 5, 'save_checkpoints_secs': 1, 'task': 0, '_job_name': None, 'cluster_spec': None}
INFO:tensorflow:Setting feature info to TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(None), Dimension(10)]), is_sparse=False)
INFO:tensorflow:Setting targets info to TensorSignature(dtype=tf.float64, shape=TensorShape([Dimension(None)]), is_sparse=False)
INFO:tensorflow:Transforming feature_column _RealValuedColumn(column_name='', dimension=10, default_value=None, dtype=tf.float32, normalizer=None)


TypeError: DataType float64 for attr 'Tlabels' not in list of allowed values: int32, int64