# 使用 Estimator 构建线性模型

本教程使用 TensorFlow 中的 tf.estimator API 来解决基准二元分类问题。Estimator 是可扩展性最强且面向生产的 TensorFlow 模型类型。如需了解详情，请参阅 [Estimator 指南](https://www.tensorflow.org/guide/estimators)。

## 概述

我们会使用包含个人年龄、受教育程度、婚姻状况和职业（即特征）数据在内的普查数据，尝试预测个人年收入是否超过 5 万美元（即目标标签）。我们会训练一个逻辑回归模型，若给出某个人的信息，该模型会输出一个介于 0 到 1 之间的值，可将该值解读为个人年收入超过 5 万美元的概率。

- 要点：作为建模者兼开发者，需要考虑如何使用这些数据，以及模型预测可能会带来哪些潜在益处和危害。类似这样的模型可能会加深社会偏见，扩大社会差异。每个特征是否与您要解决的问题相关，或者是否会引入偏见？要了解详情，请参阅机器学习公平性。

## 设置

导入 TensorFlow、特征列支持和支持模块：


In [3]:
import tensorflow as tf
import tensorflow.feature_column as fc

import os
import sys

import matplotlib.pyplot as plt
from IPython.display import clear_output


然后启用 Eager Execution，以在此程序运行时检查此程序：

In [4]:
tf.enable_eager_execution()

## 下载官方实现

我们将使用 TensorFlow [模型代码库](https://github.com/tensorflow/models/)中提供的[宽度模型和深度模型](https://github.com/tensorflow/models/tree/master/official/wide_deep/)。下载代码、将根目录添加到 Python 路径，然后跳转到 wide_deep 目录：

In [5]:
! pip install -q requests
! git clone --depth 1 https://github.com/tensorflow/models

mkl-random 1.0.1 requires cython, which is not installed.
twisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.
You are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
fatal: destination path 'models' already exists and is not an empty directory.


In [6]:
# 将该代码库的根目录添加到 Python 路径：
models_path = os.path.join(os.getcwd(), 'models')

sys.path.append(models_path)

In [7]:
# 下载数据集：
from official.wide_deep import census_dataset
from official.wide_deep import census_main

census_dataset.download("/tmp/census_data/")

### 命令行用法

该代码库包含一个完整的程序，可用于对此类模型进行实验。

要从命令行执行教程代码，先将 tensorflow/models 路径添加到您的 PYTHONPATH。

In [8]:
#export PYTHONPATH = ${PYTHONPATH}:"$(pwd)/models"
#running from python you need to set the 'os.environ' or the 
# subprocess will not see the directory.

if "PYTHONPATH" in os.environ:
    os.environ['PYTHONPATH'] += os.pathsep + models_path
else:
    os.environ['PYTHONPATH'] = models_path

使用 --help 查看可用的命令行选项：

In [9]:
!python -m official.wide_deep.census_main --help

Train DNN on census income dataset.
flags:


2018-11-20 09:14:24.534483: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Traceback (most recent call last):
  File "D:\QQPCmgr\anaconda\envs\tensorflow\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "D:\QQPCmgr\anaconda\envs\tensorflow\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Administrator\Documents\GitHub\LearnMLfromScratch\tf\ML at production scale\models\official\wide_deep\census_main.py", line 116, in <module>
    absl_app.run(main)
  File "D:\QQPCmgr\anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 294, in run
    flags_parser,
  File "D:\QQPCmgr\anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 351, in _run_init
    flags_parser=flags_parser,
  File "D:\QQPCmgr\anaconda\envs\tensorflow\lib\site-packages\absl\app.py", line 213, in _register_and_parse_flags_with_usage
    args_to_main = f