# Acute Inflammations Dataset 

Here use the Acute Inflammations Dataset and Linear classify model in tensorflow to make a prediction model.  

* Datasourse:　https://archive.ics.uci.edu/ml/datasets/Acute+Inflammations  


Data description(From the UCI website):  

**-- Attribute lines:**  
    For example, '35,9	no	no	yes	yes	yes	yes	no'  
    Where:  
    '35,9'	Temperature of patient  
    'no'	Occurrence of nausea  
    'no'	Lumbar pain  
    'yes'	Urine pushing (continuous need for urination)  
    'yes'	Micturition pains  
    'yes'	Burning of urethra, itch, swelling of urethra outlet  
    'yes'	decision: Inflammation of urinary bladder  
    'no'	decision: Nephritis of renal pelvis origin 

**Attribute Information:**

    a1	Temperature of patient { 35C-42C }	
    a2	Occurrence of nausea { yes, no }	
    a3	Lumbar pain { yes, no }	
    a4	Urine pushing (continuous need for urination) { yes, no }	
    a5	Micturition pains { yes, no }	
    a6	Burning of urethra, itch, swelling of urethra outlet { yes, no }	
    d1	decision: Inflammation of urinary bladder { yes, no }	
    d2	decision: Nephritis of renal pelvis origin { yes, no }

In [5]:
# import necessary library and pandas setting
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

%pylab inline
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy
  "\n`%matplotlib` prevents importing * from pylab and numpy"


In [4]:
# print tensorflow version
print('TensorFlow Version: {}'.format(tf.VERSION))

TensorFlow Version: 1.8.0


# 資料路徑與讀取資料

In [6]:
# data path
data_path = 'C:/Users/AdamChang/Documents/Python/TensorFlow/Data/diagnosis.csv'

In [7]:
#read the data and assign column name of data
col_name = ['Temperature','Nausea','LumbarPain','UrinePushing','MicturitionPains',
           'BurningOf', 'd1Decision', 'd2Decision']
df = pd.read_csv(data_path,sep='\t', names=col_name)

In [8]:
#view the data
df.head()

Unnamed: 0,Temperature,Nausea,LumbarPain,UrinePushing,MicturitionPains,BurningOf,d1Decision,d2Decision
0,355,no,yes,no,no,no,no,no
1,359,no,no,yes,yes,yes,yes,no
2,359,no,yes,no,no,no,no,no
3,360,no,no,yes,yes,yes,yes,no
4,360,no,yes,no,no,no,no,no


In [9]:
#check with all data type
df.dtypes

Temperature         object
Nausea              object
LumbarPain          object
UrinePushing        object
MicturitionPains    object
BurningOf           object
d1Decision          object
d2Decision          object
dtype: object

In [10]:
#view the data summary
df.describe()

Unnamed: 0,Temperature,Nausea,LumbarPain,UrinePushing,MicturitionPains,BurningOf,d1Decision,d2Decision
count,120,120,120,120,120,120,120,120
unique,44,2,2,2,2,2,2,2
top,370,no,yes,yes,no,no,no,no
freq,8,91,70,80,61,70,61,70


# 資料清理

In [11]:
#Here I make the feature engineering for the Temperature and Decision column
#Because Temperature column is string and point represent to "," , it need to replace to "."
#and change data type.
df['Temperature'] = df['Temperature'].str.replace(',','.')
df['Temperature'] = df['Temperature'].astype('float')

#Make the label as numerical data type
#Make no -> 0 and yes -> 1
#Machine learning have to use the numerical type in all data
df['d1Decision'] = df['d1Decision'].replace(['no','yes'],['0','1'])
df['d2Decision'] = df['d2Decision'].replace(['no','yes'],['0','1'])
#Change the data type to float
df['d1Decision'] = df['d1Decision'].astype('float')
df['d2Decision'] = df['d2Decision'].astype('float')

In [13]:
#Check the data again
df.head()

Unnamed: 0,Temperature,Nausea,LumbarPain,UrinePushing,MicturitionPains,BurningOf,d1Decision,d2Decision
0,35.5,no,yes,no,no,no,0.0,0.0
1,35.9,no,no,yes,yes,yes,1.0,0.0
2,35.9,no,yes,no,no,no,0.0,0.0
3,36.0,no,no,yes,yes,yes,1.0,0.0
4,36.0,no,yes,no,no,no,0.0,0.0


# 分割建立訓練集與測試集

In [None]:
#Shuffle the dataset
df = df.sample(frac=1)

In [16]:
#Split the data to training set and testing set
train = df[:96]
test = df[96:]

In [18]:
#Split data to feature and label
train_feature, train_label1, train_label2 = train, train.pop('d1Decision'), train.pop('d2Decision')
test_feature, test_label1, test_label2 = test, test.pop('d1Decision'), test.pop('d2Decision')

In [20]:
#Use one hot encoding to edit the data
train_feature = pd.get_dummies(train_feature)
test_feature = pd.get_dummies(test_feature)

In [21]:
#View the feature's column
train_feature.keys()

Index(['Temperature', 'Nausea_no', 'Nausea_yes', 'LumbarPain_no',
       'LumbarPain_yes', 'UrinePushing_no', 'UrinePushing_yes',
       'MicturitionPains_no', 'MicturitionPains_yes', 'BurningOf_no',
       'BurningOf_yes'],
      dtype='object')

In [22]:
#View the training set correct or not
train_feature.head()

Unnamed: 0,Temperature,Nausea_no,Nausea_yes,LumbarPain_no,LumbarPain_yes,UrinePushing_no,UrinePushing_yes,MicturitionPains_no,MicturitionPains_yes,BurningOf_no,BurningOf_yes
34,37.3,1,0,0,1,1,0,1,0,1,0
14,36.7,1,0,0,1,1,0,1,0,1,0
68,39.4,1,0,0,1,0,1,1,0,0,1
115,41.4,1,0,0,1,0,1,1,0,0,1
65,38.7,1,0,0,1,0,1,1,0,0,1


In [23]:
#Training set data type
train_feature.dtypes

Temperature             float64
Nausea_no                 uint8
Nausea_yes                uint8
LumbarPain_no             uint8
LumbarPain_yes            uint8
UrinePushing_no           uint8
UrinePushing_yes          uint8
MicturitionPains_no       uint8
MicturitionPains_yes      uint8
BurningOf_no              uint8
BurningOf_yes             uint8
dtype: object

# 建立dataset input 的Iterator

In [24]:
#Define the training input function.
#This one is use for transforming the feature and label to tensor.
#TensorFlow use the data type(tensor) to training the model.

#This one is basic step to make the training dataset.
#This function could also use for evaluating the model as input set.

#feature: the features of training dataset
#label: the label(target) of training dataset
#batch_size: how many data use for trainging per times
#num_epochs: hou many times does the model learn
def train_input_fn(feature, label, batch_size, num_epochs):
    #from_tensor_slices use for transforming the data to tensor
    dataset = tf.data.Dataset.from_tensor_slices((dict(feature),label))
    #.shuffle use for shuffle the data(buffer_size need to greater than quantities in dataset)
    #.repeat use for how many time repeat for learning
    #.batch use for assign how many data learning per times
    dataset = dataset.shuffle(buffer_size=1000).repeat(num_epochs).batch(batch_size)
    #.make_one_shot_iterator use for create an iterator to extract dataset
    dataset = dataset.make_one_shot_iterator()
    #.get_next use for getting the data
    features, labels = dataset.get_next()
    #return the features and labels to train the model
    return features, labels

# 建立Feature columns

In [25]:
#transform the raw data to the form that model could use
fc = []
for key in train_feature.keys():
    fc.append(tf.feature_column.numeric_column(key=key,dtype=tf.float32))

In [26]:
#see all feature column
fc

[_NumericColumn(key='Temperature', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Nausea_no', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='Nausea_yes', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='LumbarPain_no', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='LumbarPain_yes', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='UrinePushing_no', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='UrinePushing_yes', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='MicturitionPains_no', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='MicturitionPains_yes', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(ke

# 宣告要使用的模型

# Linear Classifier 的參數
__init__(
    feature_columns,
    model_dir=None,
    n_classes=2,
    weight_column=None,
    label_vocabulary=None,
    optimizer='Ftrl',
    config=None,
    partitioner=None,
    warm_start_from=None,
    loss_reduction=losses.Reduction.SUM
)

In [27]:
#Choose the model from the estimator and assign the feature columns
classifier = tf.estimator.LinearClassifier(feature_columns=fc)

#還可以加入這些參數：
#optimizer=tf.train.FtrlOptimizer(
#      learning_rate=0.1,
#      l1_regularization_strength=0.001

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_steps': None, '_task_type': 'worker', '_save_checkpoints_secs': 600, '_log_step_count_steps': 100, '_evaluation_master': '', '_keep_checkpoint_max': 5, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001BF29D1FEF0>, '_global_id_in_cluster': 0, '_num_ps_replicas': 0, '_session_config': None, '_task_id': 0, '_train_distribute': None, '_keep_checkpoint_every_n_hours': 10000, '_is_chief': True, '_num_worker_replicas': 1, '_master': '', '_save_summary_steps': 100, '_service': None, '_tf_random_seed': None, '_model_dir': 'C:\\Users\\ADAMCH~1\\AppData\\Local\\Temp\\tmpzyodp7c5'}


# 模型訓練

In [29]:
#Training the model
#Use the train_input_fn to transform the training set to the dataset that TensorFlow accept
#Set batch size = 32 and repeat times = 201
classifier.train(input_fn=lambda:train_input_fn(train_feature, train_label1, 32, 201))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\ADAMCH~1\AppData\Local\Temp\tmpzyodp7c5\model.ckpt-1503
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1504 into C:\Users\ADAMCH~1\AppData\Local\Temp\tmpzyodp7c5\model.ckpt.
INFO:tensorflow:step = 1504, loss = 0.4573232
INFO:tensorflow:global_step/sec: 361.085
INFO:tensorflow:step = 1604, loss = 0.39714018 (0.278 sec)
INFO:tensorflow:global_step/sec: 797.877
INFO:tensorflow:step = 1704, loss = 0.347484 (0.125 sec)
INFO:tensorflow:global_step/sec: 764.253
INFO:tensorflow:step = 1804, loss = 0.3749317 (0.131 sec)
INFO:tensorflow:global_step/sec: 797.871
INFO:tensorflow:step = 1904, loss = 0.31738678 (0.126 sec)
INFO:tensorflow:global_step/sec: 807.563
INFO:tensorflow:step = 2004, loss = 0.2993583 (0.124 sec)
INFO

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x1bf29faf5f8>

# 驗證

In [30]:
#Use the testing set to evaluate the model
result = classifier.evaluate(input_fn=lambda: train_input_fn(test_feature, test_label1, 32, 201))
#This code could get the result of testing
for key in sorted(result):
    print('{}:{}'.format(key, result[key]))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-05-11-13:33:09
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\ADAMCH~1\AppData\Local\Temp\tmpzyodp7c5\model.ckpt-2106
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-05-11-13:33:10
INFO:tensorflow:Saving dict for global step 2106: accuracy = 1.0, accuracy_baseline = 0.5416666, auc = 1.0, auc_precision_recall = 1.0, average_loss = 0.009259224, global_step = 2106, label/mean = 0.45833334, loss = 0.29580462, precision = 1.0, prediction/mean = 0.4563088, recall = 1.0
accuracy:1.0
accuracy_baseline:0.5416666269302368
auc:1.0
auc_precision_recall:1.0
average_loss:0.009259223937988281
global_step:2106
label/mean:0.4583333432674408
loss:0.29580461978912354
precision:1.0
prediction/mean:0.4563088119029999
recall:1.0


# 預測

In [33]:
#Make the predict data
predict = [40,1,0,0,1,0,1,1,0,1,0]

pre = {}
for k,p in zip(list(test_feature.keys()),predict):
    pre.update({k:p})

In [34]:
#see the predict data
pre

{'BurningOf_no': 1,
 'BurningOf_yes': 0,
 'LumbarPain_no': 0,
 'LumbarPain_yes': 1,
 'MicturitionPains_no': 1,
 'MicturitionPains_yes': 0,
 'Nausea_no': 1,
 'Nausea_yes': 0,
 'Temperature': 40,
 'UrinePushing_no': 0,
 'UrinePushing_yes': 1}

In [35]:
#make it as dataframe type
pre2 = pd.DataFrame(pre, index=[0])

In [36]:
pre2

Unnamed: 0,BurningOf_no,BurningOf_yes,LumbarPain_no,LumbarPain_yes,MicturitionPains_no,MicturitionPains_yes,Nausea_no,Nausea_yes,Temperature,UrinePushing_no,UrinePushing_yes
0,1,0,0,1,1,0,1,0,40,0,1


In [38]:
#Define the function to transform the predict data to tensor type
#Just make a record of different transform version for input data 
def eval_fn(features, labels = None, batch_size=None, counts=1):
    #To check whether have the label or not
    #If we use predict to predict the class, the label could be none.
    #If we want to predict the class, we only need the features.
    if labels is None:
        inputs = dict(features)
    else:
        inputs = (dict(features), labels)
    
    dataset = tf.data.Dataset.from_tensor_slices(inputs)
    
    # assert batch_size is not None, "batch_size must not be None"
    dataset = dataset.repeat(counts)
    dataset = dataset.batch(batch_size)
    dataset = dataset.make_one_shot_iterator()
    dataset = dataset.get_next()    
    return dataset

In [39]:
#test the function
a = eval_fn(pre2, labels=None, batch_size=1, repeat_num=1)
a

{'BurningOf_no': <tf.Tensor 'IteratorGetNext:0' shape=(?,) dtype=int64>,
 'BurningOf_yes': <tf.Tensor 'IteratorGetNext:1' shape=(?,) dtype=int64>,
 'LumbarPain_no': <tf.Tensor 'IteratorGetNext:2' shape=(?,) dtype=int64>,
 'LumbarPain_yes': <tf.Tensor 'IteratorGetNext:3' shape=(?,) dtype=int64>,
 'MicturitionPains_no': <tf.Tensor 'IteratorGetNext:4' shape=(?,) dtype=int64>,
 'MicturitionPains_yes': <tf.Tensor 'IteratorGetNext:5' shape=(?,) dtype=int64>,
 'Nausea_no': <tf.Tensor 'IteratorGetNext:6' shape=(?,) dtype=int64>,
 'Nausea_yes': <tf.Tensor 'IteratorGetNext:7' shape=(?,) dtype=int64>,
 'Temperature': <tf.Tensor 'IteratorGetNext:8' shape=(?,) dtype=int64>,
 'UrinePushing_no': <tf.Tensor 'IteratorGetNext:9' shape=(?,) dtype=int64>,
 'UrinePushing_yes': <tf.Tensor 'IteratorGetNext:10' shape=(?,) dtype=int64>}

In [47]:
#Predict the class from the model
pre_iter = classifier.predict(input_fn=lambda:eval_fn(pre2, labels=None, batch_size=1, repeat_num=1))

In [48]:
label_list = ['no','yes']
#predict return the iterator, so we need to use follow code to get the prediction
for i,p in enumerate(pre_iter):
    print("Prediction : {}, Probablities : {}".format(p['classes'], p["probabilities"]))
    # or use 'class_ids' to get the prediction id

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\ADAMCH~1\AppData\Local\Temp\tmpzyodp7c5\model.ckpt-2106
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Prediction : [b'0'], Probablities : [0.95707625 0.04292373]


# 預測另一個case

In [49]:
#try to predict the label2
classifier.train(input_fn=lambda: train_input_fn(train_feature, train_label2, 32, 201))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\ADAMCH~1\AppData\Local\Temp\tmpzyodp7c5\model.ckpt-2106
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 2107 into C:\Users\ADAMCH~1\AppData\Local\Temp\tmpzyodp7c5\model.ckpt.
INFO:tensorflow:step = 2107, loss = 95.645164
INFO:tensorflow:global_step/sec: 304.157
INFO:tensorflow:step = 2207, loss = 23.70279 (0.330 sec)
INFO:tensorflow:global_step/sec: 720.103
INFO:tensorflow:step = 2307, loss = 5.788143 (0.138 sec)
INFO:tensorflow:global_step/sec: 801.071
INFO:tensorflow:step = 2407, loss = 3.2477956 (0.125 sec)
INFO:tensorflow:global_step/sec: 588.395
INFO:tensorflow:step = 2507, loss = 1.7364392 (0.170 sec)
INFO:tensorflow:global_step/sec: 727.997
INFO:tensorflow:step = 2607, loss = 1.9186767 (0.138 sec)
INFO:te

<tensorflow.python.estimator.canned.linear.LinearClassifier at 0x1bf29faf5f8>

In [50]:
#Use the testing set to evaluate the model
result2 = classifier.evaluate(input_fn=lambda: train_input_fn(test_feature, test_label2, 32, 201))
#This code could get the result of testing
for key in sorted(result2):
    print('{}:{}'.format(key, result[key]))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-05-11-13:47:33
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\ADAMCH~1\AppData\Local\Temp\tmpzyodp7c5\model.ckpt-2709
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-05-11-13:47:34
INFO:tensorflow:Saving dict for global step 2709: accuracy = 1.0, accuracy_baseline = 0.6666666, auc = 1.0, auc_precision_recall = 1.0, average_loss = 0.045237172, global_step = 2709, label/mean = 0.33333334, loss = 1.4451928, precision = 1.0, prediction/mean = 0.33876723, recall = 1.0
accuracy:1.0
accuracy_baseline:0.5416666269302368
auc:1.0
auc_precision_recall:1.0
average_loss:0.009259223937988281
global_step:2106
label/mean:0.4583333432674408
loss:0.29580461978912354
precision:1.0
prediction/mean:0.4563088119029999
recall:1.0


In [53]:
pre_iter2 = classifier.predict(input_fn=lambda:eval_fn(pre2, labels=None, batch_size=1, repeat_num=1))

In [54]:
for i,p in enumerate(pre_iter2):
    print("Prediction : {}, Probablities : {}".format(p['classes'], p["probabilities"]))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\ADAMCH~1\AppData\Local\Temp\tmpzyodp7c5\model.ckpt-2709
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Prediction : [b'1'], Probablities : [0.3870985 0.6129015]
