# 第一个样例

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

如果你有看TensorFlow的源码的话，可能经常会看见这三行，原因嘛，这是TensorFlow家自定的<a href="https://www.tensorflow.org/community/style_guide?hl=zh-cn">文档规范</a> 
<br>有人可能对 _future_ 有疑问, 从是啥，为啥两个方面来回答好了
<br>1）是啥
<br>答：<a href = "https://docs.python.org/3/library/__future__.html">是一个叫_future_的module呀</a>
<br>2) 为啥
<br>答：解决版本间的不兼容问题，例如我比较懒，下载python之后就懒得更新，当别人都已经用3.99时，我还在用3.00，3.99版本的print也许在输出时会自带十行回车，那么懒惰的我，从_future_这个module中导入print_function,就可以用3.99版本自带十行回车的炫酷输出了


In [2]:
import tensorflow as tf
import iris_data
import pandas as pd

尽管数据都在<a href> iris_data</a>中预处理了，我们还是需要看一眼

In [3]:
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
#s=requests.get(TRAIN_URL).content
c=pd.read_csv(TRAIN_URL)
c.head(10)

Unnamed: 0,120,4,setosa,versicolor,virginica
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0
5,4.4,3.2,1.3,0.2,0
6,5.4,3.4,1.5,0.4,0
7,6.9,3.1,5.1,2.3,2
8,6.7,3.1,4.4,1.4,1
9,5.1,3.7,1.5,0.4,0


不知道为什么的，这个标题行似乎并不对--
<br>按照<a>官方网站的解释</a>，这五列分别是：
<br>萼片长；萼片宽；花瓣长；花瓣宽；种类（标签） //注：标签是0，1，2

恩，既然已经透漏了四个特征值，那么我们要做的任务就是：
<br>通过萼片长，萼片宽，花瓣长和花瓣宽来判断鸢尾花的种类 

有一点需要说明，该tutorial建立在<a>Estimator</a>基础上完成，关于Estimator的更多解释及代码，可以参考我翻译的另一个tutorial
### 那么接下来，我们需要做以下工作：
#### (1)创建输入函数
#### (2)定义模型的特征列
#### (3)初始化Estimator，指定特征列以及各种超参（如果不知道超参是什么的话，点击<a>这里</a>）
#### (4)在Estimator上调用优化方法，传递适当的输入函数

In [4]:
def input_evaluation_set():
    features = {'SepalLength': np.array([6.4, 5.0]),
                'SepalWidth':  np.array([2.8, 2.3]),
                'PetalLength': np.array([5.6, 3.3]),
                'PetalWidth':  np.array([2.2, 1.0])}
    labels = np.array([2, 1])
    return features, labels

In [5]:
def train_input_fn(features, labels, batch_size):
    """An input function for training"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle, repeat, and batch the examples.
    return dataset.shuffle(1000).repeat().batch(batch_size)

以上只是从官网上copy下的样例，通常情况下，我们先要获取数据，在input_evaluation_set()将数据分离成合适的features部分，及label部分，在将这部分传入包含<a>tf.Dataset</a> API的train_input_fn()中，输出可以作为真正训练过程的输入
<BR>这里为了方便起见就直接调用iris_data.load_data来获取训练及测试集啦

In [7]:
(train_x, train_y), (test_x, test_y) = iris_data.load_data()

接下来了，我们来定义特征列，在真实数据中，特征列的处理更为复杂，但这里只是举个🌰 
<br> 以及，这里用到tf.feature_column来处理，具体请参照领一份tutorial<a> </a>

In [8]:
my_feature_columns = []
for key in train_x.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

你可能想看一下特征列中都有啥

In [9]:
my_feature_columns

[_NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

接下来是第三步，定义一个Estimator

In [10]:
# Build a DNN with 2 hidden layers and 10 nodes in each hidden layer.
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # Two hidden layers of 10 nodes each.
    hidden_units=[10, 10],
    # The model must choose between 3 classes.
    n_classes=3)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/8l/j0ddzz_x75n85vq8tp86dwpr0000gn/T/tmptzmtf89p', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x120686c18>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


### 如果这是你第一次接触TensorFlow的话，恭喜你，你的第一个DNN已经定义好了 ~
<br> 虽然我觉得非常明显，但还是解释一下好了，你刚刚定义了一个带有两层双隐层，每层有10个unit，输出层有三个unit(就是有三种类别)，特征列就是你刚刚看见的my_feature_columns 的小白版DNN分类器 -v-
<BR>接下来，制定batch_size以及train_steps

In [11]:
batch_size = 100
train_steps = 1000

### 接下来的工作是：
#### 1）训练模型
#### 2）评估训练模型准确率
#### 3）使用已经训练的模型做预测

In [12]:
classifier.train(
    input_fn=lambda:iris_data.train_input_fn(train_x, train_y, batch_size),
    steps=train_steps)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into /var/folders/8l/j0ddzz_x75n85vq8tp86dwpr0000gn/T/tmptzmtf89p/model.ckpt.
INFO:tensorflow:loss = 161.19363, step = 1
INFO:tensorflow:global_step/sec: 562.711
INFO:tensorflow:loss = 11.6809025, step = 101 (0.179 sec)
INFO:tensorflow:global_step/sec: 631.298
INFO:tensorflow:loss = 9.929311, step = 201 (0.160 sec)
INFO:tensorflow:global_step/sec: 557.128
INFO:tensorflow:loss = 5.9321713, step = 301 (0.177 sec)
INFO:tensorflow:global_step/sec: 573.302
INFO:tensorflow:loss = 5.851366, step = 401 (0.176 sec)
INFO:tensorflow:global_step/sec: 581.393
INFO:tensorflow:loss = 2.5129728, step = 501 (0.171 sec)
INFO:tensorflow:global_step/sec: 579.041
INFO:tensorflow:loss = 5.1492662, step = 601 (0.173 sec)
INFO:tensorflow:global_step/sec: 515.368
INFO:tensorflow:loss = 4.1985073, step = 701 (0.194 sec)
INFO:tensorflow:global_step/sec: 623.516
INFO:tensorflow:loss = 3.329792, step = 801 (0.160 sec)
INFO:tensorf

<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x1206867f0>

### 训练完成后，我们用测试集来评估该模型的准确率

In [13]:
eval_result = classifier.evaluate(
    input_fn=lambda:iris_data.eval_input_fn(test_x, test_y, batch_size = batch_size))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

INFO:tensorflow:Starting evaluation at 2018-03-09-09:16:30
INFO:tensorflow:Restoring parameters from /var/folders/8l/j0ddzz_x75n85vq8tp86dwpr0000gn/T/tmptzmtf89p/model.ckpt-1000
INFO:tensorflow:Finished evaluation at 2018-03-09-09:16:31
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.96666664, average_loss = 0.054533932, global_step = 1000, loss = 1.6360179

Test set accuracy: 0.967



# 93.3%的准确率！
(其实一点也不高 你很有可能得到比我高的结果

接下来，做个预测

In [15]:
# Generate predictions from the model
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

predictions = classifier.predict(
    input_fn=lambda:iris_data.eval_input_fn(predict_x,
                                            labels=None,
                                            batch_size=batch_size))

In [16]:
predictions

<generator object Estimator.predict at 0x12104d570>

predict返回的是一个python的iterable(⊙v⊙)
<br>你可能对这种数据类型有疑问
<br>也可能没有

In [None]:
for pred_dict, expec in zip(predictions, expected):
    #print(pred_dict)
    template = ('\nPrediction is "{}" ({:.1f}%), expected "{}"')

    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print(template.format(iris_data.SPECIES[class_id],
                          100 * probability, expec))