# TensorFlow Workbook - Example
This example uses the testing data put together by J. Czerniak and H.Zarzycki
See "Application of rough sets in the presumptive diagnosis of urinary system diseases, Artificial Intelligence and Security in Computing Systems,: ACS'2002 9th International Conference Proceedings, Kluwer Academic Publishers,2003, pp. 41-51.  
See also "Accute Inflammations" at OpenML http://www.openml.org/d/1455

> # Select Data File
Assumes Linux environment with Jupyter Notebook and TensorFlow installed  
Easiest to install Jupyter with Anaconda.  See http://jupyter.org/install.html  
TensorFlow is pretty painless too.  See https://www.tensorflow.org/get_started/os_setup  
Place the csv data file in the current directory:  
Assumes:   
-commas separated fields  
-column headers, also separated by columns  
-no empty or NaN values  
-all text fields converted to numbers  
-target (dependent variable) is in final column  

In [64]:
import tensorflow
import numpy
import pandas
import ipywidgets as widgets
from IPython.display import display
csv_file_list = !ls | egrep '.csv'
d_csv_choice = widgets.Dropdown(
    options = csv_file_list,
    values = csv_file_list, 
    description = 'Available .csv\'s',
)
display(d_csv_choice)

> # Inspect The Data
Make sure your data looks right  
Pandas library will assign an index on the right; all good

In [59]:
df = pandas.read_csv(d_csv_choice.value)
df

Unnamed: 0,temp,nausea,lumbar,pushing,micturition,burning,bladder_or_nephritis
0,35.5,0,1,0,0,0,0
1,35.9,0,0,1,1,1,1
2,35.9,0,1,0,0,0,0
3,36.0,0,0,1,1,1,1
4,36.0,0,1,0,0,0,0
5,36.0,0,1,0,0,0,0
6,36.2,0,0,1,1,1,1
7,36.2,0,1,0,0,0,0
8,36.3,0,0,1,1,1,1
9,36.6,0,0,1,1,1,1


> # Select 75/35 Split for Training/Test Data
This splits the labeled data into Training and Testing batches

In [60]:
training_df = df.sample(frac = .75)
training_df
training_indices_list = list(training_df.index)
the_range = len(df)
total_indices_list = list(range(the_range))
testing_indices_list = list(set(total_indices_list) - set(training_indices_list))
print(training_indices_list)
print(testing_indices_list)
testing_df = df.drop(df.index[training_indices_list])
testing_rows = len(testing_df)
training_rows = len(training_df)
training_df.to_csv('training_data.csv', index = False, header = False)
testing_df.to_csv('testing_data.csv', index = False, header = False)
the_columns = len(df.columns) - 1
training_length = len(training_df.index)
testing_length = len(testing_df.index)
print(the_columns)
##testing_df = df.drop(training_df)
##y = len(testing_df)
##print(y)

[5, 112, 56, 85, 82, 53, 73, 43, 88, 94, 47, 68, 74, 40, 116, 26, 44, 0, 25, 108, 1, 59, 60, 64, 17, 57, 22, 42, 54, 2, 20, 98, 45, 110, 101, 28, 66, 6, 3, 31, 27, 117, 104, 113, 78, 7, 24, 89, 39, 33, 95, 97, 63, 93, 38, 114, 9, 70, 13, 51, 12, 106, 65, 19, 14, 81, 37, 32, 91, 118, 11, 105, 69, 46, 55, 41, 52, 90, 50, 86, 34, 107, 58, 71, 67, 16, 111, 49, 76, 109]
[4, 8, 10, 15, 18, 21, 23, 29, 30, 35, 36, 48, 61, 62, 72, 75, 77, 79, 80, 83, 84, 87, 92, 96, 99, 100, 102, 103, 115, 119]
6


> # Set Dependent Variable
TensorFlow uses its own file format.  This makes target variable clear.

In [61]:
one = input('What is the "1" state of the thing being predicted? ')
zero = input('what is the "0" state of the thing being predicted? ')
training_string = str(training_length) + ',' + str(the_columns) + ',' + str(zero) + ',' + str(one)
testing_string = str(testing_length) + ',' + str(the_columns) + ',' + str(zero) + ',' + str(one)
! sed -i '1 i\{training_string}' training_data.csv
! sed -i '1 i\{testing_string}' testing_data.csv


What is the "1" state of the thing being predicted? bladder_inflammation
what is the "0" state of the thing being predicted? renal_nephritis


> # We are Feeding the DataSets to TensorFlow Below

In [67]:
TRAINING_DATA = 'training_data.csv'
TESTING_DATA = 'testing_data.csv'
training_set = tensorflow.contrib.learn.datasets.base.load_csv_with_header(
    filename=TRAINING_DATA,
    target_dtype=numpy.int,
    features_dtype=numpy.float32)
test_set = tensorflow.contrib.learn.datasets.base.load_csv_with_header(
    filename=TESTING_DATA,
    target_dtype = numpy.int,
    features_dtype=numpy.float32)

In [71]:
feature_columns = [tensorflow.contrib.layers.real_valued_column("", dimension=the_columns)]
classifier = tensorflow.contrib.learn.DNNClassifier(feature_columns=feature_columns,
                                            hidden_units=[10,20,10],
                                            n_classes=2,
                                            model_dir="/tmp/cancer_model")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'keep_checkpoint_every_n_hours': 10000, 'save_checkpoints_steps': None, '_task_id': 0, 'keep_checkpoint_max': 5, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fa6c41259e8>, '_num_ps_replicas': 0, 'tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1
}
, 'save_summary_steps': 100, '_task_type': None, 'tf_random_seed': None, '_environment': 'local', '_master': '', 'save_checkpoints_secs': 600, '_evaluation_master': '', '_is_chief': True}


> # Tell TensorFlow to Commence Training the Model

In [72]:
classifier.fit(x = training_set.data,
               y = training_set.target,
               steps = 2000)

Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
INFO:tensorflow:Summary name dnn/hiddenlayer_0:fraction_of_zero_values is illegal; using

<tensorflow.contrib.learn.python.learn.estimators.dnn.DNNClassifier at 0x7fa6c4125710>

> # Generate Accuracy Score
Note: will be checking more on this.
The stated accuracy is, ahem, a little high, even for TF!

In [74]:
accuracy_score = classifier.evaluate(x=test_set.data,
                                    y=test_set.target)["accuracy"]
print('Accuracy: {0:f}' .format(accuracy_score))

Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
INFO:tensorflow:Summary name dnn/hiddenlayer_0:fraction_of_zero_values is illegal; using