## Challenge Exercise

Create a neural network that is capable of finding the volume of a cylinder given the radius of its base (r) and its height (h). Assume that the radius and height of the cylinder are both in the range 0.5 to 2.0. Unlike in the challenge exercise for b_estimator.ipynb, assume that your measurements of r, h and V are all rounded off to the nearest 0.1. Simulate the necessary training dataset. This time, you will need a lot more data to get a good predictor.

Hint (highlight to see):
<p style='color:white'>
Create random values for r and h and compute V. Then, round off r, h and V (i.e., the volume is computed from the true value of r and h; it's only your measurement that is rounded off). Your dataset will consist of the round values of r, h and V. Do this for both the training and evaluation datasets.
</p>

Now modify the "noise" so that instead of just rounding off the value, there is up to a 10% error (uniformly distributed) in the measurement followed by rounding off.

In [39]:
import numpy as np
import pandas as pd
import tensorflow as tf
import math
import shutil

In [40]:
df_train = pd.DataFrame(np.random.uniform(0.5, 2, 1000000), columns = ['radius'])
df_train['height'] = np.random.uniform(0.5, 2, 1000000)

df_valid = pd.DataFrame(np.random.uniform(0.5, 2, 10000), columns = ['radius'])
df_valid['height'] = np.random.uniform(0.5, 2, 10000)

df_test = pd.DataFrame(np.random.uniform(0.5, 2, 10000), columns = ['radius'])
df_test['height'] = np.random.uniform(0.5, 2, 10000)

In [41]:
labels = []
for index, row in df_train.iterrows():
  volume = math.pi * row['radius']**2 * row['height']
  labels.append(volume)
df_train['volume'] = labels

labels.clear()
for index, row in df_valid.iterrows():
  volume = math.pi * row['radius']**2 * row['height']
  labels.append(volume)
df_valid['volume'] = labels

labels.clear()
for index, row in df_test.iterrows():
  volume = math.pi * row['radius']**2 * row['height']
  labels.append(volume)
df_test['volume'] = labels

In [42]:
df_train = round(df_train, 1)
df_valid = round(df_valid, 1)
df_test = round(df_test, 1)

In [47]:
df_train.to_csv('df_train.csv', header = False)
df_valid.to_csv('df_valid.csv', header = False)
df_test.to_csv('df_test.csv', header = False)

In [48]:
CSV_COLUMNS = ['radius', 'height', 'volume', 'key']
DEFAULTS = [[1.0], [1.0], [5.0], ['nokey']]

def read_dataset(filename, mode, batch_size = 512) :
  def decode_csv(row):
    columns = tf.decode_csv(row, record_defaults = DEFAULTS)
    features = dict(zip(CSV_COLUMNS, columns))
    label = features.pop('volume')
    return features, label
  
  filenames_dataset = tf.data.Dataset.list_files(filename)
  textlines_dataset = filenames_dataset.flat_map(tf.data.TextLineDataset)
  dataset = textlines_dataset.map(decode_csv)
  
  if mode == tf.estimator.ModeKeys.TRAIN:
    num_epochs = None
    dataset = dataset.shuffle(buffer_size = 10 * batch_size)
  else:
    num_epochs = 1

  dataset = dataset.repeat(num_epochs).batch(batch_size)
  
  return dataset

def get_train_input_fn():
  return read_dataset('./df_train.csv', mode = tf.estimator.ModeKeys.TRAIN)

def get_valid_input_fn():
  return read_dataset('./df_valid.csv', mode = tf.estimator.ModeKeys.EVAL)

In [49]:
INPUT_COLUMNS = [
  tf.feature_column.numeric_column('radius'),
  tf.feature_column.numeric_column('height'),
]

def add_more_features(feats):
  # nothing yet
  return feats

feature_cols = add_more_features(INPUT_COLUMNS)

In [51]:
tf.logging.set_verbosity(tf.logging.INFO)
OUTDIR = "df_trained"
shutil.rmtree(OUTDIR, ignore_errors = True)
model = tf.estimator.DNNRegressor(
  hidden_units = [32, 8, 2],
  feature_columns = feature_cols,
  model_dir = OUTDIR)
model.train(input_fn = get_train_input_fn, steps = 1000)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_service': None, '_session_config': None, '_save_checkpoints_secs': 600, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f6c0e7e8c88>, '_evaluation_master': '', '_train_distribute': None, '_task_type': 'worker', '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_keep_checkpoint_max': 5, '_is_chief': True, '_global_id_in_cluster': 0, '_tf_random_seed': None, '_log_step_count_steps': 100, '_keep_checkpoint_every_n_hours': 10000, '_num_ps_replicas': 0, '_model_dir': 'df_trained', '_master': '', '_task_id': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into df_trained/model.ckpt.
INFO:tensorflow:loss = 881.8299, step = 

<tensorflow.python.estimator.canned.dnn.DNNRegressor at 0x7f6c0e7e8be0>

In [52]:
metrics = model.evaluate(input_fn = get_valid_input_fn, steps = None)
print('RMSE on dataset = {}'.format(np.sqrt(metrics['average_loss'])))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-11-14-12:29:32
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from df_trained/model.ckpt-1000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-11-14-12:29:33
INFO:tensorflow:Saving dict for global step 1000: average_loss = 0.19052127, global_step = 1000, loss = 95.260635
RMSE on dataset = 0.43648743629455566


Now modify the "noise" so that instead of just rounding off the value, there is up to a 10% error (uniformly distributed) in the measurement followed by rounding off.

In [None]:
# We go back to the generator step and add some noise after the round off

noise = np.random.uniform(-1, 1, 10000)

df_train2['volume'] = round(df_train['volume'] + noise*0.1, 1) 

# check 2d code lab to see how it works


Copyright 2017 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License