# 宽度深度模型/wide and deep model

## 介绍

在之前的代码里大家看到了如何用tensorflow自带的op来构建灵活的神经网络，这里用tf中的高级接口，用更简单的方式完成wide&deep模型。

大家都知道google官方给出的典型wide&deep模型结构如下：
![](https://img-blog.csdn.net/20170502135611349)

更一般的拼接模型ctr预估结构可以如下：
![](https://yxzf.github.io/images/deeplearning/dnn_ctr/embeding.png)

## 导入工具库

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import time

import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)
print("Using TensorFlow version %s\n" % (tf.__version__))

# 我们这里使用的是criteo数据集，X的部分包括13个连续值列和26个类别型值的列
CONTINUOUS_COLUMNS =  ["I"+str(i) for i in range(1,14)] # 1-13 inclusive
CATEGORICAL_COLUMNS = ["C"+str(i) for i in range(1,27)] # 1-26 inclusive
# 标签是clicked
LABEL_COLUMN = ["clicked"]

# 训练集由 label列 + 连续值列 + 离散值列 构成
TRAIN_DATA_COLUMNS = LABEL_COLUMN + CONTINUOUS_COLUMNS + CATEGORICAL_COLUMNS
#TEST_DATA_COLUMNS = CONTINUOUS_COLUMNS + CATEGORICAL_COLUMNS

# 特征列就是 连续值列+离散值列
FEATURE_COLUMNS = CONTINUOUS_COLUMNS + CATEGORICAL_COLUMNS

# 输出一些信息
print('Feature columns are: ', FEATURE_COLUMNS, '\n')

# 数据示例
sample = [ 0, 127, 1, 3, 1683, 19, 26, 17, 475, 0, 9, 0, 3, "05db9164", "8947f767", "11c9d79e", "52a787c8", "4cf72387", "fbad5c96", "18671b18", "0b153874", "a73ee510", "ceb10289", "77212bd7", "79507c6b", "7203f04e", "07d13a8f", "2c14c412", "49013ffe", "8efede7f", "bd17c3da", "f6a3e43b", "a458ea53", "35cd95c9", "ad3062eb", "c7dc6720", "3fdb382b", "010f6491", "49d68486"]

print('Columns and data as a dict: ', dict(zip(FEATURE_COLUMNS, sample)), '\n')

Using TensorFlow version 1.12.0

Feature columns are:  ['I1', 'I2', 'I3', 'I4', 'I5', 'I6', 'I7', 'I8', 'I9', 'I10', 'I11', 'I12', 'I13', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9', 'C10', 'C11', 'C12', 'C13', 'C14', 'C15', 'C16', 'C17', 'C18', 'C19', 'C20', 'C21', 'C22', 'C23', 'C24', 'C25', 'C26'] 

Columns and data as a dict:  {'C19': 'f6a3e43b', 'C18': 'bd17c3da', 'C13': '7203f04e', 'C12': '79507c6b', 'C11': '77212bd7', 'C10': 'ceb10289', 'C17': '8efede7f', 'C16': '49013ffe', 'C15': '2c14c412', 'C14': '07d13a8f', 'I9': 475, 'I8': 17, 'I1': 0, 'I3': 1, 'I2': 127, 'I5': 1683, 'I4': 3, 'I7': 26, 'I6': 19, 'C9': 'a73ee510', 'C8': '0b153874', 'C3': '11c9d79e', 'C2': '8947f767', 'C1': '05db9164', 'C7': '18671b18', 'C6': 'fbad5c96', 'C5': '4cf72387', 'C4': '52a787c8', 'C22': 'ad3062eb', 'C23': 'c7dc6720', 'C20': 'a458ea53', 'C21': '35cd95c9', 'C26': '49d68486', 'C24': '3fdb382b', 'C25': '010f6491', 'I11': 9, 'I10': 0, 'I13': 3, 'I12': 0} 



## 输入文件解析

我们把数据送进`Reader`然后从文件里一次读一个batch 

对`_input_fn()`函数做了特殊的封装处理，使得它更适合不同类型的文件读取

注意一下：这里的文件是直接通过tensorflow读取的，我们没有用pandas这种工具，也没有一次性把所有数据读入内存，这样对于非常大规模的数据文件训练，是合理的。

### 关于input_fn函数

这个函数定义了我们怎么读取数据用于训练和测试。这里的返回结果是一个pair对，第一个元素是列名到具体取值的映射字典，第二个元素是label的序列。

抽象一下，大概是这么个东西 `map(column_name => [Tensor of values]) , [Tensor of labels])`

举个例子就长这样：

    { 
      'age':            [ 39, 50, 38, 53, 28, … ], 
      'marital_status': [ 'Married-civ-spouse', 'Never-married', 'Widowed', 'Widowed' … ],
       ...
      'gender':           ['Male', 'Female', 'Male', 'Male', 'Female',, … ], 
    } , 
    [ 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1]

### High-level structure of input functions for CSV-style data
1. Queue file(s)
2. Read a batch of data from the next file
3. Create record defaults, generally 0 for continuous values, and "" for categorical. You can use named types if you prefer
4. Decode the CSV and restructure it to be appropriate for the graph's input format
    * `zip()` column headers with the data
    * `pop()` off the label column(s)
    * Remove/pop any unneeded column(s)
    * Run `tf.expand_dims()` on categorical columns
    5. Return the pair: `(feature_dict, label_array)`
    

In [2]:
BATCH_SIZE = 2000

def generate_input_fn(filename, batch_size=BATCH_SIZE):
    def _input_fn():
        filename_queue = tf.train.string_input_producer([filename])
        reader = tf.TextLineReader()
        # 只读batch_size行
        key, value = reader.read_up_to(filename_queue, num_records=batch_size)
        
        # 1个int型的label, 13个连续值, 26个字符串类型
        cont_defaults = [ [0] for i in range(1,14) ]
        cate_defaults = [ [" "] for i in range(1,27) ]
        label_defaults = [ [0] ]
        column_headers = TRAIN_DATA_COLUMNS
        
        # 第一列数据是label
        record_defaults = label_defaults + cont_defaults + cate_defaults

        # 解析读出的csv数据
        # 我们要手动把数据和header去zip在一起
        columns = tf.decode_csv(
            value, record_defaults=record_defaults, field_delim='\t')
        
        # 最终是列名到数据张量的映射字典
        all_columns = dict(zip(column_headers, columns))
        
        # 弹出和保存label标签
        labels = all_columns.pop(LABEL_COLUMN[0])
        
        # 其余列就是特征
        features = all_columns 

        # 类别型的列我们要做一个类似one-hot的扩展操作
        for feature_name in CATEGORICAL_COLUMNS:
            features[feature_name] = tf.expand_dims(features[feature_name], -1)

        return features, labels

    return _input_fn

print('input function configured')

input function configured


## 构建特征列
这个部分我们来看一下用tensorflow的高级接口，如何方便地对特征进行处理

#### 稀疏列/Sparse Columns
我们先构建稀疏列(针对类别型)

对于所有类别取值都清楚的我们用`sparse_column_with_keys()`处理

对于类别可能比较多，没办法枚举的可以试试用`sparse_column_with_hash_bucket()`处理这个映射

In [3]:
# Sparse base columns.
# C1 = tf.contrib.layers.sparse_column_with_hash_bucket('C1', hash_bucket_size=1000)
# C2 = tf.contrib.layers.sparse_column_with_hash_bucket('C2', hash_bucket_size=1000)
# C3 = tf.contrib.layers.sparse_column_with_hash_bucket('C3', hash_bucket_size=1000)
# ...
# Cn = tf.contrib.layers.sparse_column_with_hash_bucket('Cn', hash_bucket_size=1000)
# wide_columns = [C1, C2, C3, ... , Cn]

wide_columns = []
for name in CATEGORICAL_COLUMNS:
    wide_columns.append(tf.contrib.layers.sparse_column_with_hash_bucket(
            name, hash_bucket_size=1000))

print('Wide/Sparse columns configured')

Wide/Sparse columns configured


#### 连续值列/Continuous columns
通过`real_valued_column()`设定连续值列

In [4]:
# Continuous base columns.
# I1 = tf.contrib.layers.real_valued_column("I1")
# I2 = tf.contrib.layers.real_valued_column("I2")
# I3 = tf.contrib.layers.real_valued_column("I3")
# ...
# In = tf.contrib.layers.real_valued_column("In")
# deep_columns = [I1, I2, I3, ... , In]

deep_columns = []
for name in CONTINUOUS_COLUMNS:
    deep_columns.append(tf.contrib.layers.real_valued_column(name))

print('deep/continuous columns configured')

deep/continuous columns configured


#### 特征工程变换
因为这是一份做过脱敏处理的数据，所以我们做下面的2个操作
 
* **分桶/bucketizing** 对连续值离散化和分桶
* **生成交叉特征/feature crossing** 对2列或者多列去构建交叉组合特征(注意只有离散的特征才能交叉，所以如果连续值特征要用这个处理，要先离散化) 

In [5]:
# No known Transformations. Can add some if desired. 
# Examples from other datasets are shown below.

# age_buckets = tf.contrib.layers.bucketized_column(age,
#             boundaries=[ 18, 25, 30, 35, 40, 45, 50, 55, 60, 65 ])
# education_occupation = tf.contrib.layers.crossed_column([education, occupation], 
#                                                         hash_bucket_size=int(1e4))
# age_race_occupation = tf.contrib.layers.crossed_column([age_buckets, race, occupation], 
#                                                        hash_bucket_size=int(1e6))
# country_occupation = tf.contrib.layers.crossed_column([native_country, occupation], 
#                                                       hash_bucket_size=int(1e4))

print('Transformations complete')

Transformations complete


### Group feature columns into 2 objects

The wide columns are the sparse, categorical columns that we specified, as well as our hashed, bucket, and feature crossed columns. 

The deep columns are composed of embedded categorical columns along with the continuous real-valued columns. **Column embeddings** transform a sparse, categorical tensor into a low-dimensional and dense real-valued vector. The embedding values are also trained along with the rest of the model. For more information about embeddings, see the TensorFlow tutorial on [Vector Representations Words](https://www.tensorflow.org/tutorials/word2vec/), or [Word Embedding](https://en.wikipedia.org/wiki/Word_embedding) on Wikipedia.

The higher the dimension of the embedding is, the more degrees of freedom the model will have to learn the representations of the features. We are starting with an 8-dimension embedding for simplicity, but later you can come back and increase the dimensionality if you wish.



In [6]:
# Wide columns and deep columns.
# wide_columns = [gender, race, native_country,
#       education, occupation, workclass,
#       marital_status, relationship,
#       age_buckets, education_occupation,
#       age_race_occupation, country_occupation]

# deep_columns = [
#   tf.contrib.layers.embedding_column(workclass, dimension=8),
#   tf.contrib.layers.embedding_column(education, dimension=8),
#   tf.contrib.layers.embedding_column(marital_status, dimension=8),
#   tf.contrib.layers.embedding_column(gender, dimension=8),
#   tf.contrib.layers.embedding_column(relationship, dimension=8),
#   tf.contrib.layers.embedding_column(race, dimension=8),
#   tf.contrib.layers.embedding_column(native_country, dimension=8),
#   tf.contrib.layers.embedding_column(occupation, dimension=8),
#   age,
#   education_num,
#   capital_gain,
#   capital_loss,
#   hours_per_week,
# ]

# Embeddings for wide columns into deep columns
for col in wide_columns:
    deep_columns.append(tf.contrib.layers.embedding_column(col, 
                                                           dimension=8))

print('wide and deep columns configured')

wide and deep columns configured


## 构建模型

你可以根据实际情况构建“宽模型”、“深模型”、“深度宽度模型”

* **Wide**: 相当于逻辑回归
* **Deep**: 相当于多层感知器
* **Wide & Deep**: 组合两种结构

这里有2个参数`hidden_units` 或者 `dnn_hidden_units`可以指定隐层的节点个数，比如`[12, 20, 15]`构建3层神经元个数分别为12、20、15的隐层。

In [7]:
def create_model_dir(model_type):
    # 返回类似这样的结果 models/model_WIDE_AND_DEEP_1493043407
    return './models/model_' + model_type + '_' + str(int(time.time()))

# 指定模型文件夹
def get_model(model_type, model_dir):
    print("Model directory = %s" % model_dir)
    
    # 对checkpoint去做设定
    runconfig = tf.estimator.RunConfig(
        save_checkpoints_secs=None,
        save_checkpoints_steps = 100,
    )
    
    m = None
    
    # 宽模型
    if model_type == 'WIDE':
        m = tf.estimator.LinearClassifier(
            model_dir=model_dir, 
            feature_columns=wide_columns)

    # 深度模型
    if model_type == 'DEEP':
        m = tf.estimator.DNNClassifier(
            model_dir=model_dir,
            feature_columns=deep_columns,
            hidden_units=[100, 50, 25])

    # 宽度深度模型
    if model_type == 'WIDE_AND_DEEP':
        m = tf.contrib.learn.DNNLinearCombinedClassifier(
            model_dir=model_dir,
            linear_feature_columns=wide_columns,
            dnn_feature_columns=deep_columns,
            dnn_hidden_units=[100, 70, 50, 25],
            config=runconfig)
        
    print('estimator built')
    
    return m
    

MODEL_TYPE = 'WIDE_AND_DEEP'
model_dir = create_model_dir(model_type=MODEL_TYPE)
m = get_model(model_type=MODEL_TYPE, model_dir=model_dir)

Model directory = ./models/model_WIDE_AND_DEEP_1551328373
Instructions for updating:
Please set fix_global_step_increment_bug=True and update training steps in your pipeline. See pydoc for details.
Instructions for updating:
Please switch to tf.contrib.estimator.*_head.
Instructions for updating:
Please replace uses of any Estimator from tf.contrib.learn with an Estimator from tf.estimator.*
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f2f31b0f9d0>, '_model_dir': './models/model_WIDE_AND_DEEP_1551328373', '_protocol': None, '_save_checkpoints_steps': 100, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_save_summary_steps': 100, '_device_fn': None, '_experimental_distribute': None, '_num_worker_replicas': 1, '

In [8]:
# 评估
from tensorflow.contrib.learn.python.learn import evaluable, trainable
isinstance(m, evaluable.Evaluable)
isinstance(m, trainable.Trainable)

True

## 拟合与模型训练

执行`fit()`函数训练模型，可以试试不同的`train_steps`和`BATCH_SIZE`参数，会影响速度和结果

In [9]:
# 训练文件与测试文件
train_file = "./data/criteo/criteo_data/train.txt"
eval_file  = "./data/criteo/criteo_data/train.txt"
!head -5 ./data/criteo/criteo_data/criteo_train.txt

test_file= "./data/criteo/criteo_data/test.txt"


0,4,14,7,28,36,28,4,43,47,2,2,0,28,05db9164,39dfaa0d,dd17c91c,82a61820,25c83c98,fe6b92e5,8f99333a,5b392875,a73ee510,3b08e48b,3ad41aaa,75529ad8,4ca13ee8,07d13a8f,60fa10e5,5eea53aa,e5ba7672,df4fffb7,21ddcdc9,5840adea,0f78ab39,,32c7478e,cafb4e4d,010f6491,99f4f64c
1,,0,11,,87896,,,7,,,,,,05db9164,0a519c5c,ad4b77ff,d16679b9,25c83c98,7e0ccccf,fa44c4cf,0b153874,7cc72ec2,30fdb872,2b9f131d,a2f4e8b5,aca10c14,07d13a8f,5a7d5bd8,89052618,d4bb7bd8,eea3ab97,,,d4703ebd,,3a171ecb,aee52b6f,,
0,2,2,28,7,1,1,21,7,393,1,4,0,1,05db9164,bccb7a1a,3cc14b5b,8e9c10ae,25c83c98,7e0ccccf,36b21dc8,51d76abe,a73ee510,451bd4e4,0f1fa8b8,8527be14,e4e9ce3a,b28479f6,1302f720,3aaae0a8,07c540c4,d51975d7,21ddcdc9,5840adea,6a4bdd9b,,3a171ecb,340d03c3,e8b83407,96911ece
1,,-1,4,25,32258,111,2,36,80,,1,0,25,05db9164,8084ee93,d032c263,c18be181,25c83c98,fbad5c96,3fbde16c,0b153874,a73ee510,83ff688a,087dfcfd,dfbb09fb,5317f239,07d13a8f,422c8577,84898b2a,d4bb7bd8,52e44668,,,0014c32a,,3a171ecb,3b183c5c,,
0,0,0,28,7,1457,29,5,14,232,

In [10]:
# This can be found with
# wc -l train.csv
train_sample_size = 2000000
train_steps = train_sample_size/BATCH_SIZE*20

m.fit(input_fn=generate_input_fn(train_file, BATCH_SIZE), steps=train_steps)

print('fit done')

Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensor_slices(string_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs)`. If `shuffle=False`, omit the `.shuffle(...)`.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.Dataset.from_tensors(tensor).repeat(num_epochs)`.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
Queue-based input pipelines have been replaced by `tf.data`. Use `tf.data.TextLineDataset`.
Instructions for updating:
When switc

INFO:tensorflow:loss = 0.500255, step = 1200 (8.425 sec)
INFO:tensorflow:Saving checkpoints for 1224 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 23.7949
INFO:tensorflow:Saving checkpoints for 1326 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.3801
INFO:tensorflow:loss = 0.494852, step = 1400 (7.396 sec)
INFO:tensorflow:Saving checkpoints for 1428 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 25.0794
INFO:tensorflow:Saving checkpoints for 1530 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 26.7893
INFO:tensorflow:loss = 0.495444, step = 1600 (7.787 sec)
INFO:tensorflow:Saving checkpoints for 1632 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 24.66
INFO:tensorflow:Saving checkpoints for 1734 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:glob

INFO:tensorflow:Saving checkpoints for 6120 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 26.0768
INFO:tensorflow:loss = 0.489546, step = 6200 (7.537 sec)
INFO:tensorflow:Saving checkpoints for 6222 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.1431
INFO:tensorflow:Saving checkpoints for 6324 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.8287
INFO:tensorflow:loss = 0.483443, step = 6400 (7.322 sec)
INFO:tensorflow:Saving checkpoints for 6426 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 25.9824
INFO:tensorflow:Saving checkpoints for 6528 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.351
INFO:tensorflow:loss = 0.469083, step = 6600 (7.493 sec)
INFO:tensorflow:Saving checkpoints for 6630 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:glo

INFO:tensorflow:Saving checkpoints for 11016 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.2331
INFO:tensorflow:Saving checkpoints for 11118 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 26.2125
INFO:tensorflow:loss = 0.473511, step = 11200 (7.439 sec)
INFO:tensorflow:Saving checkpoints for 11220 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.4216
INFO:tensorflow:Saving checkpoints for 11322 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.3704
INFO:tensorflow:loss = 0.47522, step = 11400 (7.293 sec)
INFO:tensorflow:Saving checkpoints for 11424 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 26.2469
INFO:tensorflow:Saving checkpoints for 11526 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.0522
INFO:tensorflow:loss = 0.47

INFO:tensorflow:global_step/sec: 25.953
INFO:tensorflow:Saving checkpoints for 15912 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.1723
INFO:tensorflow:loss = 0.506171, step = 16000 (7.508 sec)
INFO:tensorflow:Saving checkpoints for 16014 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.8632
INFO:tensorflow:Saving checkpoints for 16116 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 25.1423
INFO:tensorflow:loss = 0.486923, step = 16200 (7.669 sec)
INFO:tensorflow:Saving checkpoints for 16218 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.2721
INFO:tensorflow:Saving checkpoints for 16320 into ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt.
INFO:tensorflow:global_step/sec: 27.6035
INFO:tensorflow:loss = 0.479029, step = 16400 (7.295 sec)
INFO:tensorflow:Saving checkpoints for 16422 into ./models/model_WIDE_A

## 评估模型准确率
评估准确率

In [11]:
eval_sample_size = 500000 # this can be found with a 'wc -l eval.csv'
eval_steps = eval_sample_size/BATCH_SIZE

results = m.evaluate(input_fn=generate_input_fn(eval_file), 
                     steps=eval_steps)
print('evaluate done')

print('Accuracy: %s' % results['accuracy'])
print(results)

INFO:tensorflow:Starting evaluation at 2019-02-28-04:46:02
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./models/model_WIDE_AND_DEEP_1551328373/model.ckpt-20002
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Evaluation [25/250]
INFO:tensorflow:Evaluation [50/250]
INFO:tensorflow:Evaluation [75/250]
INFO:tensorflow:Evaluation [100/250]
INFO:tensorflow:Evaluation [125/250]
INFO:tensorflow:Evaluation [150/250]
INFO:tensorflow:Evaluation [175/250]
INFO:tensorflow:Evaluation [200/250]
INFO:tensorflow:Evaluation [225/250]
INFO:tensorflow:Evaluation [250/250]
INFO:tensorflow:Finished evaluation at 2019-02-28-04:46:16
INFO:tensorflow:Saving dict for global step 20002: accuracy = 0.77797, accuracy/baseline_label_mean = 0.25339, accuracy/threshold_0.500000_mean = 0.77797, auc = 0.762874, auc_precision_recall = 0.540307, global_step = 20002, labels/actual_label_mean = 0.25339, labels/prediction_mean = 0.262542, 

进行预估

In [12]:
# def pred_input_fn():
#     sample = [0, 127, 1, 3, 1683, 19, 26, 17, 475, 0, 9, 0, 3, "05db9164", "8947f767", "11c9d79e", "52a787c8", "4cf72387", "fbad5c96", "18671b18", "0b153874", "a73ee510", "ceb10289", "77212bd7", "79507c6b", "7203f04e", "07d13a8f", "2c14c412", "49013ffe", "8efede7f", "bd17c3da", "f6a3e43b", "a458ea53", "35cd95c9", "ad3062eb", "c7dc6720", "3fdb382b", "010f6491", "49d68486"]
#     sample_dict = dict(zip(FEATURE_COLUMNS, sample))
# #     print('Columns and data as a dict: ', sample_dict, '\n')
#     for feature_name in CATEGORICAL_COLUMNS:
# #         print(sample_dict[feature_name])
#         sample_dict[feature_name] = tf.expand_dims(sample_dict[feature_name], -1)

#     for feature_name in CONTINUOUS_COLUMNS:
#         sample_dict[feature_name] = tf.constant(sample_dict[feature_name], dtype=tf.int32)
#     print(sample_dict)
# #     print('Columns and data as a dict: ', sample_dict, '\n')
#     return sample_dict
def pred_input_fn():
    def _input_fn():
        # 1个int型的label, 13个连续值, 26个字符串类型
        cont_defaults = [ [0] for i in range(1,14) ]
        cate_defaults = [ [" "] for i in range(1,27) ]
        label_defaults = [ [0] ]
        column_headers = TRAIN_DATA_COLUMNS
        

        
        # 第一列数据是label
        record_defaults = label_defaults + cont_defaults + cate_defaults

        # 解析读出的csv数据
        # 我们要手动把数据和header去zip在一起
        sample = [0, 127, 1, 3, 1683, 19, 26, 17, 475, 0, 9, 0, 3, "05db9164", "8947f767", "11c9d79e", "52a787c8", "4cf72387", "fbad5c96", "18671b18", "0b153874", "a73ee510", "ceb10289", "77212bd7", "79507c6b", "7203f04e", "07d13a8f", "2c14c412", "49013ffe", "8efede7f", "bd17c3da", "f6a3e43b", "a458ea53", "35cd95c9", "ad3062eb", "c7dc6720", "3fdb382b", "010f6491", "49d68486"]
        sample_dict = dict(zip(FEATURE_COLUMNS, sample))
        
        # 弹出和保存label标签
#         labels = all_columns.pop(LABEL_COLUMN[0])
        
        # 其余列就是特征
        features = sample_dict

        return features
    return _input_fn

    wide_columns = []
    for name in CATEGORICAL_COLUMNS:
        wide_columns.append(tf.contrib.layers.sparse_column_with_hash_bucket(
                name, hash_bucket_size=1000))
    
    deep_columns = []
    for name in CONTINUOUS_COLUMNS:
        deep_columns.append(tf.contrib.layers.real_valued_column(name))

    print('deep/continuous columns configured')
    for col in wide_columns:
        deep_columns.append(tf.contrib.layers.embedding_column(col, dimension=8))
    print('wide and deep columns configured')
    
result = m.predict_classes(x=pred_input_fn)
print(result)
for i in result:
    print(i)

Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
Instructions for updating:
Please use tensorflow/transform or tf.data.
Instructions for updating:
Please use tensorflow/transform or tf.data.


AttributeError: 'function' object has no attribute 'dtype'