# Problem set: Tensorflow
> Tangqi Feng

These problems relate to the Python package [Tensorflow](https://www.tensorflow.org/).
We will again use the famous [iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set).


# Load and convert data

In [2]:
# adapt from https://github.com/antonrufino/TensorFlow-IrisNN/blob/master/iris_nn.py
# inport numpy
import numpy as np
# load Iris data set
OriginalData = np.loadtxt("iris.csv",str, delimiter=",", skiprows=1, unpack=True)
Iris = OriginalData.transpose()
# change 4th column data (species name) to a number, make it calculatable
#  setosa     ->   0
#  virginica  ->   1
#  versicolor ->   2
Iris[Iris[:,4] == 'setosa',4] = '0';
Iris[Iris[:,4] == 'versicolor',4] = '1';
Iris[Iris[:,4] == 'virginica',4] = '2';
# convert array(Iris) from type(str) to float
Iris = np.array(Iris).astype(np.float)
# the data stored like this:
for i in range(5):
    print(Iris[i,:])

[ 5.1  3.5  1.4  0.2  1. ]
[ 4.9  3.   1.4  0.2  1. ]
[ 4.7  3.2  1.3  0.2  1. ]
[ 4.6  3.1  1.5  0.2  1. ]
[ 5.   3.6  1.4  0.2  1. ]


## 1. Use Tensorflow to create model
Use Tensorflow to create a model to predict the species of Iris from a flower's sepal width, sepal length, petal width, and petal length.

In [2]:
# import tensorflow
import tensorflow as tf
# a set of data contains: sepal_length, sepal_width, petal_length, petal_width and species
# create a model
x = tf.placeholder(tf.float32,[None,4])  #input_data  (sepal_length, sepal_width, petal_length, petal_width)
y = tf.placeholder(tf.float32,[None,1])  #output_data (species)
# tf.truncated_normal method from:
# https://www.tensorflow.org/api_docs/python/tf/truncated_normal
W = tf.Variable(tf.truncated_normal([4,1],stddev=0.1))  # Weight ([4,1]: 4 input and 1 output)
b = tf.Variable(tf.zeros([1]) + 1)                      # bias
prediction = tf.nn.softmax(tf.matmul(x,W) + b)

## 2. Split the data into training and testing
Split the data set into a training set and a testing set.
You should investigate the best way to do this, and list any online references used in your notebook.
If you wish to, you can write some code to randomly separate the data on the fly.


In [3]:
# Randomly split the data into training and testing
# Adapt from : https://stackoverflow.com/questions/17412439/how-to-split-data-into-trainset-and-testset-randomly
np.random.shuffle(Iris)
# define 100 set of data for training, and 50 for test
training, test = Iris[:100], Iris[100:] 
print(training.shape)
print(test.shape)

(100, 5)
(50, 5)


## 3. Train the model
Use the training set to train your model.

In [4]:
# get input data_set and output data_set
train_in = training[:,:4]    #(sepal_length, sepal_width, petal_length, petal_width)
train_out = training[:,4:]   #(species)
# use cross_entropy method: tf.nn.softmax_cross_entropy_with_logits method
# https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=prediction))
# use GradientDescentOptimizer to train
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# initial glabal variables
init = tf.global_variables_initializer()
# create loop to train
with tf.Session() as sess:
    sess.run(init)
    for eposh in range(100):    # times for training all training)set
        sess.run(train_step,{x:train_in, y:train_out})


## 4. Test the model
Use the testing set to test your model, clearly calculating and displaying the error rate.

In [5]:
# get input data_set and output data_set
test_in = test[:,:4]    #(sepal_length, sepal_width, petal_length, petal_width)
test_out = test[:,4:]   #(species)
# calculate accuracy
correct_prediction = tf.equal(y,prediction) # correct return true, otherwise return false
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) # true->1.0   false->0
# create loop to test
with tf.Session() as sess:
    sess.run(init)
    for eposh in range(100):    # times for training all training)set
        sess.run(train_step,{x:train_in, y:train_out})                  #train               
        acc = sess.run(accuracy,{x:test_in, y:test_out})                #test
    print("Iter " + str(eposh) + ",Testing Accuracy " + str(acc) + ",Testing Error Rate " + str(1-acc))


Iter 99,Testing Accuracy 0.44,Testing Error Rate 0.560000002384


## Analysis
### there are some points infecting the Accuracy:
### * define a number to represent a species (string cannot be used to calculate )
### * randomly split train/test data set
### * different cost-calculate method
### * different Optimizer
### * ... ...

# End