##### Copyright 2018 The AdaNet Authors.

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# The AdaNet objective

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/adanet/blob/master/adanet/examples/tutorials/adanet_objective.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/adanet/blob/master/adanet/examples/tutorials/adanet_objective.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

One of key contributions from *AdaNet: Adaptive Structural Learning of Neural
Networks* [[Cortes et al., ICML 2017](https://arxiv.org/abs/1607.01097)] is
defining an algorithm that aims to directly minimize the DeepBoost
generalization bound from *Deep Boosting*
[[Cortes et al., ICML 2014](http://proceedings.mlr.press/v32/cortesb14.pdf)]
when applied to neural networks. This algorithm, called **AdaNet**, adaptively
grows a neural network as an ensemble of subnetworks that minimizes the AdaNet
objective (a.k.a. AdaNet loss):

$$F(w) = \frac{1}{m} \sum_{i=1}^{m} \Phi \left(\sum_{j=1}^{N}w_jh_j(x_i), y_i \right) + \sum_{j=1}^{N} \left(\lambda r(h_j) + \beta \right) |w_j| $$

where $w$ is the set of mixture weights, one per subnetwork $h$,
$\Phi$ is a surrogate loss function such as logistic loss or MSE, $r$ is a
function for measuring a subnetwork's complexity, and $\lambda$ and $\beta$
are hyperparameters.

## Mixture weights

So what are mixture weights? When forming an ensemble $f$ of subnetworks $h$,
we need to somehow combine the their predictions. This is done by multiplying
the outputs of subnetwork $h_i$ with mixture weight $w_i$, and summing the
results:

$$f(x) = \sum_{j=1}^{N}w_jh_j(x)$$

In practice, most commonly used set of mixture weight is **uniform average
weighting**:

$$f(x) = \frac{1}{N}\sum_{j=1}^{N}h_j(x)$$

However, we can also solve a convex optimization problem to learn the mixture
weights that minimize the loss function $\Phi$:

$$F(w) = \frac{1}{m} \sum_{i=1}^{m} \Phi \left(\sum_{j=1}^{N}w_jh_j(x_i), y_i \right)$$

This is the first term in the AdaNet objective. The second term applies L1
regularization to the mixture weights:

$$\sum_{j=1}^{N} \left(\lambda r(h_j) + \beta \right) |w_j|$$

When $\lambda > 0$ this penalty serves to prevent the optimization from
assigning too much weight to more complex subnetworks according to the
complexity measure function $r$.

## How AdaNet uses the objective

This objective function serves two purposes:

1.  To **learn to scale/transform the outputs of each subnetwork $h$** as part
    of the ensemble.
2.  To **select the best candidate subnetwork $h$** at each AdaNet iteration
    to include in the ensemble.

Effectively, when learning mixture weights $w$, AdaNet solves a convex
combination of the outputs of the frozen subnetworks $h$. For $\lambda >0$,
AdaNet penalizes more complex subnetworks with greater L1 regularization on
their mixture weight, and will be less likely to select more complex subnetworks
to add to the ensemble at each iteration.

In this tutorial, in you will observe the benefits of using AdaNet to learn the
ensemble's mixture weights and to perform candidate selection.



In [2]:
#@test {"skip": true}
# If you're running this in Colab, first install the adanet package:
!pip install adanet

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple


In [3]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import functools
import os
import shutil

import adanet
import tensorflow as tf

# The random seed to use.
RANDOM_SEED = 42

LOG_DIR = '/tmp/models'

## Boston Housing dataset

In this example, we will solve a regression task known as the [Boston Housing dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) to predict the price of suburban houses in Boston, MA in the 1970s. There are 13 numerical features, the labels are in thousands of dollars, and there are only 506 examples.


## Download the data
Conveniently, the data is available via Keras:

In [6]:
(x_train, y_train), (x_test, y_test) = (
    tf.keras.datasets.boston_housing.load_data())

# Preview the first example from the training data
print('Model inputs: %s \n' % x_train[0])
print('Model output (house price): $%s ' % (y_train[0] * 1000))


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz
Model inputs: [   1.23247    0.         8.14       0.         0.538      6.142     91.7
    3.9769     4.       307.        21.       396.9       18.72   ] 

Model output (house price): $15200.0 


## Supply the data in TensorFlow

Our first task is to supply the data in TensorFlow. Using the
tf.estimator.Estimator convention, we will define a function that returns an
input_fn which returns feature and label Tensors.

We will also use the tf.data.Dataset API to feed the data into our models.

Also, as a preprocessing step, we will apply `tf.log1p` to log-scale the
features and labels for improved numerical stability during training. To recover
the model's predictions in the correct scale, you can apply `tf.math.expm1` to the
prediction.

In [7]:
FEATURES_KEY = "x"


def input_fn(partition, training, batch_size):
  """Generate an input function for the Estimator."""

  def _input_fn():

    if partition == "train":
      dataset = tf.data.Dataset.from_tensor_slices(({
          FEATURES_KEY: tf.log1p(x_train)
      }, tf.log1p(y_train)))
    else:
      dataset = tf.data.Dataset.from_tensor_slices(({
          FEATURES_KEY: tf.log1p(x_test)
      }, tf.log1p(y_test)))

    # We call repeat after shuffling, rather than before, to prevent separate
    # epochs from blending together.
    if training:
      dataset = dataset.shuffle(10 * batch_size, seed=RANDOM_SEED).repeat()

    dataset = dataset.batch(batch_size)
    iterator = dataset.make_one_shot_iterator()
    features, labels = iterator.get_next()
    return features, labels

  return _input_fn

## Define the subnetwork generator

Let's define a subnetwork generator similar to the one in
[[Cortes et al., ICML 2017](https://arxiv.org/abs/1607.01097)] and in
`simple_dnn.py` which creates two candidate fully-connected neural networks at
each iteration with the same width, but one an additional hidden layer. To make
our generator *adaptive*, each subnetwork will have at least the same number
of hidden layers as the most recently added subnetwork to the
`previous_ensemble`.

We define the complexity measure function $r$ to be $r(h) = \sqrt{d(h)}$, where
$d$ is the number of hidden layers in the neural network $h$, to approximate the
Rademacher bounds from
[[Golowich et. al, 2017](https://arxiv.org/abs/1712.06541)]. So subnetworks
with more hidden layers, and therefore more capacity, will have more heavily
regularized mixture weights.

In [8]:
_NUM_LAYERS_KEY = "num_layers"


class _SimpleDNNBuilder(adanet.subnetwork.Builder):
  """Builds a DNN subnetwork for AdaNet."""

  def __init__(self, optimizer, layer_size, num_layers, learn_mixture_weights,
               seed):
    """Initializes a `_DNNBuilder`.

    Args:
      optimizer: An `Optimizer` instance for training both the subnetwork and
        the mixture weights.
      layer_size: The number of nodes to output at each hidden layer.
      num_layers: The number of hidden layers.
      learn_mixture_weights: Whether to solve a learning problem to find the
        best mixture weights, or use their default value according to the
        mixture weight type. When `False`, the subnetworks will return a no_op
        for the mixture weight train op.
      seed: A random seed.

    Returns:
      An instance of `_SimpleDNNBuilder`.
    """

    self._optimizer = optimizer
    self._layer_size = layer_size
    self._num_layers = num_layers
    self._learn_mixture_weights = learn_mixture_weights
    self._seed = seed

  def build_subnetwork(self,
                       features,
                       logits_dimension,
                       training,
                       iteration_step,
                       summary,
                       previous_ensemble=None):
    """See `adanet.subnetwork.Builder`."""

    input_layer = tf.to_float(features[FEATURES_KEY])
    kernel_initializer = tf.glorot_uniform_initializer(seed=self._seed)
    last_layer = input_layer
    for _ in range(self._num_layers):
      last_layer = tf.layers.dense(
          last_layer,
          units=self._layer_size,
          activation=tf.nn.relu,
          kernel_initializer=kernel_initializer)
    logits = tf.layers.dense(
        last_layer,
        units=logits_dimension,
        kernel_initializer=kernel_initializer)

    persisted_tensors = {_NUM_LAYERS_KEY: tf.constant(self._num_layers)}
    return adanet.Subnetwork(
        last_layer=last_layer,
        logits=logits,
        complexity=self._measure_complexity(),
        persisted_tensors=persisted_tensors)

  def _measure_complexity(self):
    """Approximates Rademacher complexity as the square-root of the depth."""
    return tf.sqrt(tf.to_float(self._num_layers))

  def build_subnetwork_train_op(self, subnetwork, loss, var_list, labels,
                                iteration_step, summary, previous_ensemble):
    """See `adanet.subnetwork.Builder`."""
    return self._optimizer.minimize(loss=loss, var_list=var_list)

  def build_mixture_weights_train_op(self, loss, var_list, logits, labels,
                                     iteration_step, summary):
    """See `adanet.subnetwork.Builder`."""

    if not self._learn_mixture_weights:
      return tf.no_op()
    return self._optimizer.minimize(loss=loss, var_list=var_list)

  @property
  def name(self):
    """See `adanet.subnetwork.Builder`."""

    if self._num_layers == 0:
      # A DNN with no hidden layers is a linear model.
      return "linear"
    return "{}_layer_dnn".format(self._num_layers)


class SimpleDNNGenerator(adanet.subnetwork.Generator):
  """Generates a two DNN subnetworks at each iteration.

  The first DNN has an identical shape to the most recently added subnetwork
  in `previous_ensemble`. The second has the same shape plus one more dense
  layer on top. This is similar to the adaptive network presented in Figure 2 of
  [Cortes et al. ICML 2017](https://arxiv.org/abs/1607.01097), without the
  connections to hidden layers of networks from previous iterations.
  """

  def __init__(self,
               optimizer,
               layer_size=64,
               learn_mixture_weights=False,
               seed=None):
    """Initializes a DNN `Generator`.

    Args:
      optimizer: An `Optimizer` instance for training both the subnetwork and
        the mixture weights.
      layer_size: Number of nodes in each hidden layer of the subnetwork
        candidates. Note that this parameter is ignored in a DNN with no hidden
        layers.
      learn_mixture_weights: Whether to solve a learning problem to find the
        best mixture weights, or use their default value according to the
        mixture weight type. When `False`, the subnetworks will return a no_op
        for the mixture weight train op.
      seed: A random seed.

    Returns:
      An instance of `Generator`.
    """

    self._seed = seed
    self._dnn_builder_fn = functools.partial(
        _SimpleDNNBuilder,
        optimizer=optimizer,
        layer_size=layer_size,
        learn_mixture_weights=learn_mixture_weights)

  def generate_candidates(self, previous_ensemble, iteration_number,
                          previous_ensemble_reports, all_reports):
    """See `adanet.subnetwork.Generator`."""

    num_layers = 0
    seed = self._seed
    if previous_ensemble:
      num_layers = tf.contrib.util.constant_value(
          previous_ensemble.weighted_subnetworks[
              -1].subnetwork.persisted_tensors[_NUM_LAYERS_KEY])
    if seed is not None:
      seed += iteration_number
    return [
        self._dnn_builder_fn(num_layers=num_layers, seed=seed),
        self._dnn_builder_fn(num_layers=num_layers + 1, seed=seed),
    ]

## Launch TensorBoard

Let's run [TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard) to visualize model training over time. We'll use [ngrok](https://ngrok.com/) to tunnel traffic to localhost.

*The instructions for setting up Tensorboard were obtained from https://www.dlology.com/blog/quick-guide-to-run-tensorboard-in-google-colab/*

Run the next cells and follow the link to see the TensorBoard in a new tab.

In [9]:
#@test {"skip": true}

get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)

# Install ngrok binary.
! wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
! unzip ngrok-stable-linux-amd64.zip

# Delete old logs dir.
shutil.rmtree(LOG_DIR, ignore_errors=True)

print("Follow this link to open TensorBoard in a new tab.")
get_ipython().system_raw('./ngrok http 6006 &')
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"



--2019-01-05 19:09:07--  https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
Resolving bin.equinox.io (bin.equinox.io)... 52.22.145.207, 52.22.34.127, 52.3.53.115, ...
Connecting to bin.equinox.io (bin.equinox.io)|52.22.145.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5363700 (5.1M) [application/octet-stream]
Saving to: ‘ngrok-stable-linux-amd64.zip’


2019-01-05 19:10:12 (85.0 KB/s) - ‘ngrok-stable-linux-amd64.zip’ saved [5363700/5363700]

Archive:  ngrok-stable-linux-amd64.zip
  inflating: ngrok                   
Follow this link to open TensorBoard in a new tab.
/bin/sh: 1: curl: not found
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/json/__init__.py", line 299, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/d

## Train and evaluate

Next we create an `adanet.Estimator` using the `SimpleDNNGenerator` we just defined.

In this section we will show the effects of two hyperparamters: **learning mixture weights** and **complexity regularization**.

On the righthand side you will be able to play with the hyperparameters of this model. Until you reach the end of this section, we ask that you not change them. 

At first we will not learn the mixture weights, using their default initial value. Here they will be scalars initialized to $1/N$ where $N$ is the number of subnetworks in the ensemble, effectively creating a **uniform average ensemble**.

In [10]:
#@title AdaNet parameters
LEARNING_RATE = 0.001  #@param {type:"number"}
TRAIN_STEPS = 60000  #@param {type:"integer"}
BATCH_SIZE = 32  #@param {type:"integer"}

LEARN_MIXTURE_WEIGHTS = False  #@param {type:"boolean"}
ADANET_LAMBDA = 0  #@param {type:"number"}
ADANET_ITERATIONS = 3  #@param {type:"integer"}


def train_and_evaluate(experiment_name, learn_mixture_weights=LEARN_MIXTURE_WEIGHTS,
                       adanet_lambda=ADANET_LAMBDA):
  """Trains an `adanet.Estimator` to predict housing prices."""

  model_dir = os.path.join(LOG_DIR, experiment_name)

  estimator = adanet.Estimator(
      # Since we are predicting housing prices, we'll use a regression
      # head that optimizes for MSE.
      head=tf.contrib.estimator.regression_head(
          loss_reduction=tf.losses.Reduction.SUM_OVER_BATCH_SIZE),

      # Define the generator, which defines our search space of subnetworks
      # to train as candidates to add to the final AdaNet model.
      subnetwork_generator=SimpleDNNGenerator(
          optimizer=tf.train.RMSPropOptimizer(learning_rate=LEARNING_RATE),
          learn_mixture_weights=learn_mixture_weights,
          seed=RANDOM_SEED),

      # Lambda is a the strength of complexity regularization. A larger
      # value will penalize more complex subnetworks.
      adanet_lambda=adanet_lambda,

      # The number of train steps per iteration.
      max_iteration_steps=TRAIN_STEPS // ADANET_ITERATIONS,

      # The evaluator will evaluate the model on the full training set to
      # compute the overall AdaNet loss (train loss + complexity
      # regularization) to select the best candidate to include in the
      # final AdaNet model.
      evaluator=adanet.Evaluator(
          input_fn=input_fn("train", training=False, batch_size=BATCH_SIZE)),

      # Configuration for Estimators.
      config=tf.estimator.RunConfig(
          save_summary_steps=5000,
          save_checkpoints_steps=5000,
          tf_random_seed=RANDOM_SEED,
          model_dir=model_dir))

  # Train and evaluate using using the tf.estimator tooling.
  train_spec = tf.estimator.TrainSpec(
      input_fn=input_fn("train", training=True, batch_size=BATCH_SIZE),
      max_steps=TRAIN_STEPS)
  eval_spec = tf.estimator.EvalSpec(
      input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
      steps=None,
      start_delay_secs=1,
      throttle_secs=1,
  )
  return tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)


def ensemble_architecture(result):
  """Extracts the ensemble architecture from evaluation results."""

  architecture = result["architecture/adanet/ensembles"]
  # The architecture is a serialized Summary proto for TensorBoard.
  summary_proto = tf.summary.Summary.FromString(architecture)
  return summary_proto.value[0].tensor.string_val[0]

In [11]:
results, _ = train_and_evaluate("uniform_average_ensemble_baseline")
print("Loss:", results["average_loss"])
print("Architecture:", ensemble_architecture(results))

INFO:tensorflow:Using config: {'_model_dir': '/tmp/models/uniform_average_ensemble_baseline', '_tf_random_seed': 42, '_save_summary_steps': 5000, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7ff8720039b0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start tra

INFO:tensorflow:loss = 0.0619735, step = 3600 (0.161 sec)
INFO:tensorflow:global_step/sec: 461.044
INFO:tensorflow:loss = 0.02754, step = 3700 (0.217 sec)
INFO:tensorflow:global_step/sec: 513.318
INFO:tensorflow:loss = 0.0260662, step = 3800 (0.195 sec)
INFO:tensorflow:global_step/sec: 668.856
INFO:tensorflow:loss = 0.0227778, step = 3900 (0.150 sec)
INFO:tensorflow:global_step/sec: 670.475
INFO:tensorflow:loss = 0.0608849, step = 4000 (0.149 sec)
INFO:tensorflow:global_step/sec: 669.397
INFO:tensorflow:loss = 0.0275511, step = 4100 (0.149 sec)
INFO:tensorflow:global_step/sec: 495.74
INFO:tensorflow:loss = 0.0118034, step = 4200 (0.202 sec)
INFO:tensorflow:global_step/sec: 431.196
INFO:tensorflow:loss = 0.0336256, step = 4300 (0.232 sec)
INFO:tensorflow:global_step/sec: 686.568
INFO:tensorflow:loss = 0.0324305, step = 4400 (0.146 sec)
INFO:tensorflow:global_step/sec: 638.29
INFO:tensorflow:loss = 0.0284791, step = 4500 (0.157 sec)
INFO:tensorflow:global_step/sec: 432.017
INFO:tensorflo

INFO:tensorflow:loss = 0.0216826, step = 8300 (0.188 sec)
INFO:tensorflow:global_step/sec: 615.514
INFO:tensorflow:loss = 0.0301984, step = 8400 (0.163 sec)
INFO:tensorflow:global_step/sec: 445.262
INFO:tensorflow:loss = 0.0391429, step = 8500 (0.225 sec)
INFO:tensorflow:global_step/sec: 635.402
INFO:tensorflow:loss = 0.0157975, step = 8600 (0.157 sec)
INFO:tensorflow:global_step/sec: 479.081
INFO:tensorflow:loss = 0.0214923, step = 8700 (0.209 sec)
INFO:tensorflow:global_step/sec: 594.741
INFO:tensorflow:loss = 0.0233191, step = 8800 (0.168 sec)
INFO:tensorflow:global_step/sec: 659.825
INFO:tensorflow:loss = 0.0418531, step = 8900 (0.152 sec)
INFO:tensorflow:global_step/sec: 668.222
INFO:tensorflow:loss = 0.0204977, step = 9000 (0.149 sec)
INFO:tensorflow:global_step/sec: 673.363
INFO:tensorflow:loss = 0.033745, step = 9100 (0.149 sec)
INFO:tensorflow:global_step/sec: 671.266
INFO:tensorflow:loss = 0.0415148, step = 9200 (0.149 sec)
INFO:tensorflow:global_step/sec: 665.41
INFO:tensorf

INFO:tensorflow:loss = 0.0408655, step = 13300 (0.161 sec)
INFO:tensorflow:global_step/sec: 622.899
INFO:tensorflow:loss = 0.00818089, step = 13400 (0.161 sec)
INFO:tensorflow:global_step/sec: 660.431
INFO:tensorflow:loss = 0.0175235, step = 13500 (0.151 sec)
INFO:tensorflow:global_step/sec: 660.669
INFO:tensorflow:loss = 0.013067, step = 13600 (0.151 sec)
INFO:tensorflow:global_step/sec: 668.759
INFO:tensorflow:loss = 0.0261159, step = 13700 (0.150 sec)
INFO:tensorflow:global_step/sec: 684.289
INFO:tensorflow:loss = 0.0563093, step = 13800 (0.146 sec)
INFO:tensorflow:global_step/sec: 506.25
INFO:tensorflow:loss = 0.0266691, step = 13900 (0.198 sec)
INFO:tensorflow:global_step/sec: 627.723
INFO:tensorflow:loss = 0.0186643, step = 14000 (0.159 sec)
INFO:tensorflow:global_step/sec: 624.116
INFO:tensorflow:loss = 0.0139172, step = 14100 (0.160 sec)
INFO:tensorflow:global_step/sec: 667.783
INFO:tensorflow:loss = 0.0105233, step = 14200 (0.150 sec)
INFO:tensorflow:global_step/sec: 651.673
I

INFO:tensorflow:global_step/sec: 426.908
INFO:tensorflow:loss = 0.0226313, step = 18300 (0.234 sec)
INFO:tensorflow:global_step/sec: 552.612
INFO:tensorflow:loss = 0.0143345, step = 18400 (0.181 sec)
INFO:tensorflow:global_step/sec: 633.529
INFO:tensorflow:loss = 0.0182505, step = 18500 (0.158 sec)
INFO:tensorflow:global_step/sec: 444.69
INFO:tensorflow:loss = 0.0156011, step = 18600 (0.225 sec)
INFO:tensorflow:global_step/sec: 621.205
INFO:tensorflow:loss = 0.0246773, step = 18700 (0.161 sec)
INFO:tensorflow:global_step/sec: 682.087
INFO:tensorflow:loss = 0.0106525, step = 18800 (0.147 sec)
INFO:tensorflow:global_step/sec: 451.408
INFO:tensorflow:loss = 0.0188893, step = 18900 (0.222 sec)
INFO:tensorflow:global_step/sec: 605.254
INFO:tensorflow:loss = 0.0131465, step = 19000 (0.165 sec)
INFO:tensorflow:global_step/sec: 662.863
INFO:tensorflow:loss = 0.0262485, step = 19100 (0.151 sec)
INFO:tensorflow:global_step/sec: 648.378
INFO:tensorflow:loss = 0.0144431, step = 19200 (0.154 sec)
I

INFO:tensorflow:Building subnetwork '1_layer_dnn'
INFO:tensorflow:Building subnetwork '2_layer_dnn'
INFO:tensorflow:Overwriting checkpoint with new graph for iteration 1 to /tmp/models/uniform_average_ensemble_baseline/model.ckpt-20000
Instructions for updating:
Use standard file utilities to get mtimes.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
INFO:tensorflow:Finished bookkeeping phase for iteration 0
INFO:tensorflow:Beginning training AdaNet iteration 1
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Importing architecture from /tmp/models/uniform_average_ensemble_baseline/architecture-0.txt: ['0:1_layer_dnn'].
INFO:tensorflow:Rebuilding iteration 0
INFO:tensorflow:Building subnetwork '1_layer_dnn'
INFO:tensorflow:Building iteration 1
INFO:tensorflow:Building subnetwork '1_layer_dnn'
INFO:tensorflow:Building subnetwork '2_layer_dnn'
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was

INFO:tensorflow:Saving candidate 't1_1_layer_dnn' dict for global step 25000: architecture/adanet/ensembles = b'\no\n>adanet/iteration_1/ensemble_t1_1_layer_dnn/architecture/adanetB#\x08\x07\x12\x00B\x1d| 1_layer_dnn | 1_layer_dnn |J\x08\n\x06\n\x04text', average_loss/adanet/adanet_weighted_ensemble = 0.0409389, average_loss/adanet/subnetwork = 0.0511917, average_loss/adanet/uniform_average_ensemble = 0.0409389, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss/adanet/adanet_weighted_ensemble = 0.054904, loss/adanet/subnetwork = 0.0732483, loss/adanet/uniform_average_ensemble = 0.054904, prediction/mean/adanet/adanet_weighted_ensemble = 3.12688, prediction/mean/adanet/subnetwork = 3.1233, prediction/mean/adanet/uniform_average_ensemble = 3.12688
INFO:tensorflow:Saving candidate 't1_2_layer_dnn' dict for global step 25000: architecture/adanet/ensembles = b'\no\n>adanet/iteration_1/ense

INFO:tensorflow:Starting evaluation at 2019-01-05T11:12:01Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/models/uniform_average_ensemble_baseline/model.ckpt-30000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving candidate 't0_1_layer_dnn' dict for global step 30000: architecture/adanet/ensembles = b'\na\n>adanet/iteration_0/ensemble_t0_1_layer_dnn/architecture/adanetB\x15\x08\x07\x12\x00B\x0f| 1_layer_dnn |J\x08\n\x06\n\x04text', average_loss/adanet/adanet_weighted_ensemble = 0.0344671, average_loss/adanet/subnetwork = 0.0344671, average_loss/adanet/uniform_average_ensemble = 0.0344671, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss/adanet/adanet_weighted_ensemble = 0.0422994, loss/adanet/subnetwork = 0.0422994, loss/adanet/uniform_average_ensemble = 0.0422994, prediction/mean/adanet/ada

INFO:tensorflow:global_step/sec: 408.807
INFO:tensorflow:loss = 0.0133163, step = 34400 (0.245 sec)
INFO:tensorflow:global_step/sec: 537.449
INFO:tensorflow:loss = 0.00476117, step = 34500 (0.186 sec)
INFO:tensorflow:global_step/sec: 465.49
INFO:tensorflow:loss = 0.0101089, step = 34600 (0.215 sec)
INFO:tensorflow:global_step/sec: 398.881
INFO:tensorflow:loss = 0.00926355, step = 34700 (0.251 sec)
INFO:tensorflow:global_step/sec: 536.418
INFO:tensorflow:loss = 0.00818098, step = 34800 (0.186 sec)
INFO:tensorflow:global_step/sec: 413.946
INFO:tensorflow:loss = 0.011169, step = 34900 (0.242 sec)
INFO:tensorflow:Saving checkpoints for 35000 into /tmp/models/uniform_average_ensemble_baseline/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Importing architecture from /tmp/models/uniform_average_ensemble_baseline/architecture-0.txt: ['0:1_layer_dnn'].
INFO:tensorflow:Rebuilding iteration 0
INFO:tensorflow:Building subnetwork '1_layer_dnn'
INFO:tensorflow:Building iteration 1
IN

INFO:tensorflow:loss = 0.00677477, step = 38200 (0.226 sec)
INFO:tensorflow:global_step/sec: 438.205
INFO:tensorflow:loss = 0.0155496, step = 38300 (0.228 sec)
INFO:tensorflow:global_step/sec: 527.254
INFO:tensorflow:loss = 0.0112253, step = 38400 (0.190 sec)
INFO:tensorflow:global_step/sec: 473.08
INFO:tensorflow:loss = 0.0156263, step = 38500 (0.212 sec)
INFO:tensorflow:global_step/sec: 513.797
INFO:tensorflow:loss = 0.00959016, step = 38600 (0.195 sec)
INFO:tensorflow:global_step/sec: 517.656
INFO:tensorflow:loss = 0.0143097, step = 38700 (0.193 sec)
INFO:tensorflow:global_step/sec: 490.5
INFO:tensorflow:loss = 0.00566421, step = 38800 (0.204 sec)
INFO:tensorflow:global_step/sec: 513.165
INFO:tensorflow:loss = 0.00861143, step = 38900 (0.195 sec)
INFO:tensorflow:global_step/sec: 536.468
INFO:tensorflow:loss = 0.00832968, step = 39000 (0.186 sec)
INFO:tensorflow:global_step/sec: 515.391
INFO:tensorflow:loss = 0.013808, step = 39100 (0.194 sec)
INFO:tensorflow:global_step/sec: 536.6
I

INFO:tensorflow:Warm-starting variable: adanet/iteration_0/ensemble_t0_1_layer_dnn/weighted_subnetwork_0/subnetwork/dense_1/kernel; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: adanet/iteration_0/ensemble_t0_1_layer_dnn/weighted_subnetwork_0/subnetwork/dense_1/bias; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: adanet/iteration_0/ensemble_t0_1_layer_dnn/weighted_subnetwork_0/logits/mixture_weight; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: adanet/iteration_0/ensemble_t0_1_layer_dnn/bias; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: adanet/iteration_0/candidate_t0_1_layer_dnn/adanet_loss; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: adanet/iteration_0/candidate_t0_1_layer_dnn/adanet/iteration_0/candidate_t0_1_layer_dnn/adanet_loss/biased; prev_var_name: Unchanged
INFO:tensorflow:Warm-starting variable: adanet/iteration_0/candidate_t0_1_layer_dnn/adanet/iteration_0/candidate_t0_1_layer_dn

INFO:tensorflow:global_step/sec: 473.773
INFO:tensorflow:loss = 0.0112171, step = 43100 (0.211 sec)
INFO:tensorflow:global_step/sec: 490.396
INFO:tensorflow:loss = 0.00700136, step = 43200 (0.204 sec)
INFO:tensorflow:global_step/sec: 489.867
INFO:tensorflow:loss = 0.00731963, step = 43300 (0.204 sec)
INFO:tensorflow:global_step/sec: 490.62
INFO:tensorflow:loss = 0.0078993, step = 43400 (0.204 sec)
INFO:tensorflow:global_step/sec: 484.772
INFO:tensorflow:loss = 0.00561361, step = 43500 (0.206 sec)
INFO:tensorflow:global_step/sec: 487.828
INFO:tensorflow:loss = 0.0118251, step = 43600 (0.205 sec)
INFO:tensorflow:global_step/sec: 488.25
INFO:tensorflow:loss = 0.0134412, step = 43700 (0.205 sec)
INFO:tensorflow:global_step/sec: 485.857
INFO:tensorflow:loss = 0.00950726, step = 43800 (0.206 sec)
INFO:tensorflow:global_step/sec: 487.95
INFO:tensorflow:loss = 0.00763447, step = 43900 (0.205 sec)
INFO:tensorflow:global_step/sec: 477.256
INFO:tensorflow:loss = 0.00590193, step = 44000 (0.210 se

INFO:tensorflow:global_step/sec: 462.916
INFO:tensorflow:loss = 0.00460268, step = 46800 (0.216 sec)
INFO:tensorflow:global_step/sec: 484.504
INFO:tensorflow:loss = 0.00562541, step = 46900 (0.206 sec)
INFO:tensorflow:global_step/sec: 482.217
INFO:tensorflow:loss = 0.00740233, step = 47000 (0.207 sec)
INFO:tensorflow:global_step/sec: 485.587
INFO:tensorflow:loss = 0.00787901, step = 47100 (0.206 sec)
INFO:tensorflow:global_step/sec: 484.433
INFO:tensorflow:loss = 0.00550168, step = 47200 (0.206 sec)
INFO:tensorflow:global_step/sec: 456.617
INFO:tensorflow:loss = 0.00488928, step = 47300 (0.219 sec)
INFO:tensorflow:global_step/sec: 490.705
INFO:tensorflow:loss = 0.00651755, step = 47400 (0.204 sec)
INFO:tensorflow:global_step/sec: 487.281
INFO:tensorflow:loss = 0.00876663, step = 47500 (0.205 sec)
INFO:tensorflow:global_step/sec: 482.142
INFO:tensorflow:loss = 0.00614077, step = 47600 (0.207 sec)
INFO:tensorflow:global_step/sec: 481.485
INFO:tensorflow:loss = 0.00703282, step = 47700 (0

INFO:tensorflow:global_step/sec: 473.063
INFO:tensorflow:loss = 0.009572, step = 50500 (0.211 sec)
INFO:tensorflow:global_step/sec: 481.333
INFO:tensorflow:loss = 0.0119475, step = 50600 (0.208 sec)
INFO:tensorflow:global_step/sec: 482.998
INFO:tensorflow:loss = 0.00751737, step = 50700 (0.207 sec)
INFO:tensorflow:global_step/sec: 484.955
INFO:tensorflow:loss = 0.00859794, step = 50800 (0.206 sec)
INFO:tensorflow:global_step/sec: 472.238
INFO:tensorflow:loss = 0.00595855, step = 50900 (0.212 sec)
INFO:tensorflow:global_step/sec: 467.06
INFO:tensorflow:loss = 0.00900679, step = 51000 (0.214 sec)
INFO:tensorflow:global_step/sec: 478.734
INFO:tensorflow:loss = 0.00474099, step = 51100 (0.209 sec)
INFO:tensorflow:global_step/sec: 483.247
INFO:tensorflow:loss = 0.00513857, step = 51200 (0.207 sec)
INFO:tensorflow:global_step/sec: 473.94
INFO:tensorflow:loss = 0.00417655, step = 51300 (0.211 sec)
INFO:tensorflow:global_step/sec: 490.251
INFO:tensorflow:loss = 0.00754387, step = 51400 (0.204 

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 55000: /tmp/models/uniform_average_ensemble_baseline/model.ckpt-55000
INFO:tensorflow:global_step/sec: 28.6728
INFO:tensorflow:loss = 0.00510286, step = 55000 (3.488 sec)
INFO:tensorflow:global_step/sec: 420.85
INFO:tensorflow:loss = 0.00512285, step = 55100 (0.237 sec)
INFO:tensorflow:global_step/sec: 440.207
INFO:tensorflow:loss = 0.0045343, step = 55200 (0.228 sec)
INFO:tensorflow:global_step/sec: 405.494
INFO:tensorflow:loss = 0.00591409, step = 55300 (0.245 sec)
INFO:tensorflow:global_step/sec: 397.091
INFO:tensorflow:loss = 0.00509956, step = 55400 (0.252 sec)
INFO:tensorflow:global_step/sec: 397.104
INFO:tensorflow:loss = 0.007003, step = 55500 (0.252 sec)
INFO:tensorflow:global_step/sec: 378.823
INFO:tensorflow:loss = 0.00399513, step = 55600 (0.264 sec)
INFO:tensorflow:global_step/sec: 425.58
INFO:tensorflow:loss = 0.00425756, step = 55700 (0.235 sec)
INFO:tensorflow:global_step/sec: 474.504
INFO:tensorflow:loss 

INFO:tensorflow:Finished evaluation at 2019-01-05-11:13:31
INFO:tensorflow:Saving dict for global step 60000: average_loss = 0.0368088, average_loss/adanet/adanet_weighted_ensemble = 0.0368088, average_loss/adanet/subnetwork = 0.0474099, average_loss/adanet/uniform_average_ensemble = 0.0368088, global_step = 60000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0477215, loss/adanet/adanet_weighted_ensemble = 0.0477215, loss/adanet/subnetwork = 0.0628075, loss/adanet/uniform_average_ensemble = 0.0477215, prediction/mean = 3.11894, prediction/mean/adanet/adanet_weighted_ensemble = 3.11894, prediction/mean/adanet/subnetwork = 3.12847, prediction/mean/adanet/uniform_average_ensemble = 3.11894
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 60000: /tmp/models/uniform_average_ensemble_baseline/model.ckpt-60000
INFO:tensorflow:Loss for final step

These hyperparameters preduce a model that achieves **0.0393** MSE on the test
set (exact MSE will vary depending on the hardware you're using to train the model). Notice that the ensemble is composed of 3 subnetworks, each one a hidden
layer deeper than the previous. The most complex subnetwork is made of 3 hidden
layers.

Since `SimpleDNNGenerator` produces subnetworks of varying complexity, and our
model gives each one an equal weight, AdaNet selected the subnetwork that most
lowered the ensemble's training loss at each iteration, likely the one with the
most hidden layers, since it has the most capacity, and we aren't penalizing
more complex subnetworks (yet).

Next, instead of assigning equal weight to each subnetwork, let's learn the
mixture weights as a convex optimization problem using SGD:

In [12]:
#@test {"skip": true}
results, _ = train_and_evaluate("learn_mixture_weights", learn_mixture_weights=True)
print("Loss:", results["average_loss"])
print("Uniform average loss:", results["average_loss/adanet/uniform_average_ensemble"])
print("Architecture:", ensemble_architecture(results))

INFO:tensorflow:Using config: {'_model_dir': '/tmp/models/learn_mixture_weights', '_tf_random_seed': 42, '_save_summary_steps': 5000, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7ff5df1c3da0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evalu

INFO:tensorflow:Finished evaluation at 2019-01-05-11:15:17
INFO:tensorflow:Saving dict for global step 5000: average_loss = 0.0483934, average_loss/adanet/adanet_weighted_ensemble = 0.0483934, average_loss/adanet/subnetwork = 0.0483964, average_loss/adanet/uniform_average_ensemble = 0.0483964, global_step = 5000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0644798, loss/adanet/adanet_weighted_ensemble = 0.0644798, loss/adanet/subnetwork = 0.0645082, loss/adanet/uniform_average_ensemble = 0.0645082, prediction/mean = 3.11186, prediction/mean/adanet/adanet_weighted_ensemble = 3.11186, prediction/mean/adanet/subnetwork = 3.11214, prediction/mean/adanet/uniform_average_ensemble = 3.11214
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5000: /tmp/models/learn_mixture_weights/model.ckpt-5000
INFO:tensorflow:global_step/sec: 40.8562
INFO:tenso

INFO:tensorflow:Saving dict for global step 10000: average_loss = 0.050968, average_loss/adanet/adanet_weighted_ensemble = 0.050968, average_loss/adanet/subnetwork = 0.0506364, average_loss/adanet/uniform_average_ensemble = 0.0506364, global_step = 10000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0560723, loss/adanet/adanet_weighted_ensemble = 0.0560723, loss/adanet/subnetwork = 0.0558819, loss/adanet/uniform_average_ensemble = 0.0558819, prediction/mean = 3.01671, prediction/mean/adanet/adanet_weighted_ensemble = 3.01671, prediction/mean/adanet/subnetwork = 3.01854, prediction/mean/adanet/uniform_average_ensemble = 3.01854
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmp/models/learn_mixture_weights/model.ckpt-10000
INFO:tensorflow:global_step/sec: 61.125
INFO:tensorflow:loss = 0.0386869, step = 10000 (1.636 sec)
INFO:tens

INFO:tensorflow:Saving dict for global step 15000: average_loss = 0.0447519, average_loss/adanet/adanet_weighted_ensemble = 0.0447519, average_loss/adanet/subnetwork = 0.0444429, average_loss/adanet/uniform_average_ensemble = 0.0444429, global_step = 15000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0455886, loss/adanet/adanet_weighted_ensemble = 0.0455886, loss/adanet/subnetwork = 0.0453769, loss/adanet/uniform_average_ensemble = 0.0453769, prediction/mean = 3.02335, prediction/mean/adanet/adanet_weighted_ensemble = 3.02335, prediction/mean/adanet/subnetwork = 3.02517, prediction/mean/adanet/uniform_average_ensemble = 3.02517
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 15000: /tmp/models/learn_mixture_weights/model.ckpt-15000
INFO:tensorflow:global_step/sec: 55.7935
INFO:tensorflow:loss = 0.0256926, step = 15000 (1.794 sec)
INFO:t

INFO:tensorflow:Saving dict for global step 20000: average_loss = 0.0345221, average_loss/adanet/adanet_weighted_ensemble = 0.0345221, average_loss/adanet/subnetwork = 0.0344671, average_loss/adanet/uniform_average_ensemble = 0.0344671, global_step = 20000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.042434, loss/adanet/adanet_weighted_ensemble = 0.042434, loss/adanet/subnetwork = 0.0422994, loss/adanet/uniform_average_ensemble = 0.0422994, prediction/mean = 3.13169, prediction/mean/adanet/adanet_weighted_ensemble = 3.13169, prediction/mean/adanet/subnetwork = 3.13045, prediction/mean/adanet/uniform_average_ensemble = 3.13045
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 20000: /tmp/models/learn_mixture_weights/model.ckpt-20000
INFO:tensorflow:Loss for final step: 0.0215809.
INFO:tensorflow:Finished training Adanet iteration 0
INFO:te

INFO:tensorflow:global_step/sec: 480.744
INFO:tensorflow:loss = 0.0069699, step = 23200 (0.208 sec)
INFO:tensorflow:global_step/sec: 487.82
INFO:tensorflow:loss = 0.0108084, step = 23300 (0.205 sec)
INFO:tensorflow:global_step/sec: 457.674
INFO:tensorflow:loss = 0.00903264, step = 23400 (0.218 sec)
INFO:tensorflow:global_step/sec: 486.017
INFO:tensorflow:loss = 0.00747694, step = 23500 (0.206 sec)
INFO:tensorflow:global_step/sec: 475.393
INFO:tensorflow:loss = 0.0160814, step = 23600 (0.210 sec)
INFO:tensorflow:global_step/sec: 483.734
INFO:tensorflow:loss = 0.0172106, step = 23700 (0.207 sec)
INFO:tensorflow:global_step/sec: 480.465
INFO:tensorflow:loss = 0.0135797, step = 23800 (0.208 sec)
INFO:tensorflow:global_step/sec: 466.508
INFO:tensorflow:loss = 0.0128129, step = 23900 (0.214 sec)
INFO:tensorflow:global_step/sec: 489.067
INFO:tensorflow:loss = 0.0190506, step = 24000 (0.204 sec)
INFO:tensorflow:global_step/sec: 479.893
INFO:tensorflow:loss = 0.00907893, step = 24100 (0.208 sec

INFO:tensorflow:global_step/sec: 474.745
INFO:tensorflow:loss = 0.0118227, step = 27100 (0.211 sec)
INFO:tensorflow:global_step/sec: 478.448
INFO:tensorflow:loss = 0.00810991, step = 27200 (0.209 sec)
INFO:tensorflow:global_step/sec: 492.967
INFO:tensorflow:loss = 0.00860655, step = 27300 (0.203 sec)
INFO:tensorflow:global_step/sec: 448.968
INFO:tensorflow:loss = 0.00784027, step = 27400 (0.223 sec)
INFO:tensorflow:global_step/sec: 485.142
INFO:tensorflow:loss = 0.0127357, step = 27500 (0.206 sec)
INFO:tensorflow:global_step/sec: 488.527
INFO:tensorflow:loss = 0.0117622, step = 27600 (0.205 sec)
INFO:tensorflow:global_step/sec: 475.437
INFO:tensorflow:loss = 0.0100407, step = 27700 (0.210 sec)
INFO:tensorflow:global_step/sec: 490.359
INFO:tensorflow:loss = 0.011281, step = 27800 (0.204 sec)
INFO:tensorflow:global_step/sec: 463.883
INFO:tensorflow:loss = 0.0122046, step = 27900 (0.216 sec)
INFO:tensorflow:global_step/sec: 483.219
INFO:tensorflow:loss = 0.0082148, step = 28000 (0.207 sec

INFO:tensorflow:global_step/sec: 465.557
INFO:tensorflow:loss = 0.013513, step = 31000 (0.215 sec)
INFO:tensorflow:global_step/sec: 474.044
INFO:tensorflow:loss = 0.00479635, step = 31100 (0.211 sec)
INFO:tensorflow:global_step/sec: 484.341
INFO:tensorflow:loss = 0.0118226, step = 31200 (0.206 sec)
INFO:tensorflow:global_step/sec: 484.079
INFO:tensorflow:loss = 0.00794757, step = 31300 (0.207 sec)
INFO:tensorflow:global_step/sec: 483.568
INFO:tensorflow:loss = 0.0150284, step = 31400 (0.207 sec)
INFO:tensorflow:global_step/sec: 488.702
INFO:tensorflow:loss = 0.0121822, step = 31500 (0.205 sec)
INFO:tensorflow:global_step/sec: 488.454
INFO:tensorflow:loss = 0.00950074, step = 31600 (0.205 sec)
INFO:tensorflow:global_step/sec: 486.772
INFO:tensorflow:loss = 0.0100899, step = 31700 (0.205 sec)
INFO:tensorflow:global_step/sec: 484.461
INFO:tensorflow:loss = 0.0134653, step = 31800 (0.206 sec)
INFO:tensorflow:global_step/sec: 491.517
INFO:tensorflow:loss = 0.015575, step = 31900 (0.203 sec)

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 35000: /tmp/models/learn_mixture_weights/model.ckpt-35000
INFO:tensorflow:global_step/sec: 40.0505
INFO:tensorflow:loss = 0.00972248, step = 35000 (2.498 sec)
INFO:tensorflow:global_step/sec: 477.479
INFO:tensorflow:loss = 0.0106463, step = 35100 (0.208 sec)
INFO:tensorflow:global_step/sec: 440.148
INFO:tensorflow:loss = 0.00598201, step = 35200 (0.227 sec)
INFO:tensorflow:global_step/sec: 485.058
INFO:tensorflow:loss = 0.00663923, step = 35300 (0.206 sec)
INFO:tensorflow:global_step/sec: 484.029
INFO:tensorflow:loss = 0.0104822, step = 35400 (0.207 sec)
INFO:tensorflow:global_step/sec: 478.719
INFO:tensorflow:loss = 0.00684492, step = 35500 (0.209 sec)
INFO:tensorflow:global_step/sec: 487.012
INFO:tensorflow:loss = 0.0103031, step = 35600 (0.205 sec)
INFO:tensorflow:global_step/sec: 475.834
INFO:tensorflow:loss = 0.0104701, step = 35700 (0.210 sec)
INFO:tensorflow:global_step/sec: 489.688
INFO:tensorflow:loss = 0.0129404

INFO:tensorflow:Finished evaluation at 2019-01-05-11:16:45
INFO:tensorflow:Saving dict for global step 40000: average_loss = 0.0341454, average_loss/adanet/adanet_weighted_ensemble = 0.0341454, average_loss/adanet/subnetwork = 0.036496, average_loss/adanet/uniform_average_ensemble = 0.0338111, global_step = 40000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0413429, loss/adanet/adanet_weighted_ensemble = 0.0413429, loss/adanet/subnetwork = 0.0473107, loss/adanet/uniform_average_ensemble = 0.0426334, prediction/mean = 3.08747, prediction/mean/adanet/adanet_weighted_ensemble = 3.08747, prediction/mean/adanet/subnetwork = 3.0979, prediction/mean/adanet/uniform_average_ensemble = 3.11418
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 40000: /tmp/models/learn_mixture_weights/model.ckpt-40000
INFO:tensorflow:Loss for final step: 0.00585763.


INFO:tensorflow:loss = 0.00406611, step = 40100 (0.552 sec)
INFO:tensorflow:global_step/sec: 428.599
INFO:tensorflow:loss = 0.00932836, step = 40200 (0.233 sec)
INFO:tensorflow:global_step/sec: 428.598
INFO:tensorflow:loss = 0.00852714, step = 40300 (0.233 sec)
INFO:tensorflow:global_step/sec: 437.631
INFO:tensorflow:loss = 0.00584516, step = 40400 (0.229 sec)
INFO:tensorflow:global_step/sec: 416.058
INFO:tensorflow:loss = 0.00899395, step = 40500 (0.240 sec)
INFO:tensorflow:global_step/sec: 436.089
INFO:tensorflow:loss = 0.00561061, step = 40600 (0.229 sec)
INFO:tensorflow:global_step/sec: 399.387
INFO:tensorflow:loss = 0.00902176, step = 40700 (0.251 sec)
INFO:tensorflow:global_step/sec: 426.553
INFO:tensorflow:loss = 0.0121505, step = 40800 (0.234 sec)
INFO:tensorflow:global_step/sec: 406.815
INFO:tensorflow:loss = 0.00987363, step = 40900 (0.246 sec)
INFO:tensorflow:global_step/sec: 429.234
INFO:tensorflow:loss = 0.00656554, step = 41000 (0.233 sec)
INFO:tensorflow:global_step/sec:

INFO:tensorflow:Finished evaluation at 2019-01-05-11:17:09
INFO:tensorflow:Saving dict for global step 45000: average_loss = 0.0347311, average_loss/adanet/adanet_weighted_ensemble = 0.0347311, average_loss/adanet/subnetwork = 0.0525496, average_loss/adanet/uniform_average_ensemble = 0.0370273, global_step = 45000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0434979, loss/adanet/adanet_weighted_ensemble = 0.0434979, loss/adanet/subnetwork = 0.0701633, loss/adanet/uniform_average_ensemble = 0.0485679, prediction/mean = 3.09727, prediction/mean/adanet/adanet_weighted_ensemble = 3.09727, prediction/mean/adanet/subnetwork = 3.20452, prediction/mean/adanet/uniform_average_ensemble = 3.14429
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 45000: /tmp/models/learn_mixture_weights/model.ckpt-45000
INFO:tensorflow:global_step/sec: 26.7507
INFO:t

INFO:tensorflow:Saving candidate 't2_3_layer_dnn' dict for global step 50000: architecture/adanet/ensembles = b'\n}\n>adanet/iteration_2/ensemble_t2_3_layer_dnn/architecture/adanetB1\x08\x07\x12\x00B+| 1_layer_dnn | 2_layer_dnn | 3_layer_dnn |J\x08\n\x06\n\x04text', average_loss/adanet/adanet_weighted_ensemble = 0.0367825, average_loss/adanet/subnetwork = 0.0531309, average_loss/adanet/uniform_average_ensemble = 0.0368988, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss/adanet/adanet_weighted_ensemble = 0.0443333, loss/adanet/subnetwork = 0.0555319, loss/adanet/uniform_average_ensemble = 0.0437967, prediction/mean/adanet/adanet_weighted_ensemble = 3.06822, prediction/mean/adanet/subnetwork = 3.01709, prediction/mean/adanet/uniform_average_ensemble = 3.08182
INFO:tensorflow:Finished evaluation at 2019-01-05-11:17:24
INFO:tensorflow:Saving dict for global step 50000: average_loss = 0.

INFO:tensorflow:Saving candidate 't2_2_layer_dnn' dict for global step 55000: architecture/adanet/ensembles = b'\n}\n>adanet/iteration_2/ensemble_t2_2_layer_dnn/architecture/adanetB1\x08\x07\x12\x00B+| 1_layer_dnn | 2_layer_dnn | 2_layer_dnn |J\x08\n\x06\n\x04text', average_loss/adanet/adanet_weighted_ensemble = 0.0353376, average_loss/adanet/subnetwork = 0.0424374, average_loss/adanet/uniform_average_ensemble = 0.0335247, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss/adanet/adanet_weighted_ensemble = 0.0438527, loss/adanet/subnetwork = 0.039453, loss/adanet/uniform_average_ensemble = 0.0378332, prediction/mean/adanet/adanet_weighted_ensemble = 3.08637, prediction/mean/adanet/subnetwork = 3.0255, prediction/mean/adanet/uniform_average_ensemble = 3.08462
INFO:tensorflow:Saving candidate 't2_3_layer_dnn' dict for global step 55000: architecture/adanet/ensembles = b'\n}\n>adanet/iter

INFO:tensorflow:Building subnetwork '2_layer_dnn'
INFO:tensorflow:Building subnetwork '3_layer_dnn'
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-01-05T11:17:53Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/models/learn_mixture_weights/model.ckpt-60000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving candidate 't1_2_layer_dnn' dict for global step 60000: architecture/adanet/ensembles = b'\no\n>adanet/iteration_1/ensemble_t1_2_layer_dnn/architecture/adanetB#\x08\x07\x12\x00B\x1d| 1_layer_dnn | 2_layer_dnn |J\x08\n\x06\n\x04text', average_loss/adanet/adanet_weighted_ensemble = 0.0341454, average_loss/adanet/subnetwork = 0.036496, average_loss/adanet/uniform_average_ensemble = 0.0338111, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss/adanet/adanet_weight

Learning the mixture weights produces a model with **0.0391** MSE, a bit better
than the uniform average model, which the `adanet.Estimator` always compute as a
baseline. The mixture weights were learned without regularization, so they
risk overfitting the training set.

Observe that AdaNet learned the same ensemble composition as the previous run.
Without complexity regularization, AdaNet will favor more complex subnetworks,
which may have worse generalization despite improving the empirical error.

Finally, let's apply some **complexity regularization** by using $\lambda > 0$.
Since this will penalize more complex subnetworks, AdaNet will select the
candidate subnetwork that most improves the objective for its marginal
complexity:

In [13]:
#@test {"skip": true}
results, _ = train_and_evaluate("learn_mixture_weights_with_complexity_regularization", learn_mixture_weights=True, adanet_lambda=.015)
print("Loss:", results["average_loss"])
print("Uniform average loss:", results["average_loss/adanet/uniform_average_ensemble"])
print("Architecture:", ensemble_architecture(results))

INFO:tensorflow:Using config: {'_model_dir': '/tmp/models/learn_mixture_weights_with_complexity_regularization', '_tf_random_seed': 42, '_save_summary_steps': 5000, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7ff5d2a60208>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:t

INFO:tensorflow:Finished evaluation at 2019-01-05-11:18:39
INFO:tensorflow:Saving dict for global step 5000: average_loss = 0.0492677, average_loss/adanet/adanet_weighted_ensemble = 0.0492677, average_loss/adanet/subnetwork = 0.0491596, average_loss/adanet/uniform_average_ensemble = 0.0491596, global_step = 5000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0618374, loss/adanet/adanet_weighted_ensemble = 0.0618374, loss/adanet/subnetwork = 0.061841, loss/adanet/uniform_average_ensemble = 0.061841, prediction/mean = 3.08488, prediction/mean/adanet/adanet_weighted_ensemble = 3.08488, prediction/mean/adanet/subnetwork = 3.08741, prediction/mean/adanet/uniform_average_ensemble = 3.08741
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5000: /tmp/models/learn_mixture_weights_with_complexity_regularization/model.ckpt-5000
INFO:tensorflow:global

INFO:tensorflow:Finished evaluation at 2019-01-05-11:18:51
INFO:tensorflow:Saving dict for global step 10000: average_loss = 0.0499138, average_loss/adanet/adanet_weighted_ensemble = 0.0499138, average_loss/adanet/subnetwork = 0.0499111, average_loss/adanet/uniform_average_ensemble = 0.0499111, global_step = 10000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0640858, loss/adanet/adanet_weighted_ensemble = 0.0640858, loss/adanet/subnetwork = 0.064067, loss/adanet/uniform_average_ensemble = 0.064067, prediction/mean = 3.11206, prediction/mean/adanet/adanet_weighted_ensemble = 3.11206, prediction/mean/adanet/subnetwork = 3.11176, prediction/mean/adanet/uniform_average_ensemble = 3.11176
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmp/models/learn_mixture_weights_with_complexity_regularization/model.ckpt-10000
INFO:tensorflow:gl

INFO:tensorflow:Finished evaluation at 2019-01-05-11:19:01
INFO:tensorflow:Saving dict for global step 15000: average_loss = 0.0499413, average_loss/adanet/adanet_weighted_ensemble = 0.0499413, average_loss/adanet/subnetwork = 0.0499403, average_loss/adanet/uniform_average_ensemble = 0.0499403, global_step = 15000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0634246, loss/adanet/adanet_weighted_ensemble = 0.0634246, loss/adanet/subnetwork = 0.0634311, loss/adanet/uniform_average_ensemble = 0.0634311, prediction/mean = 3.10368, prediction/mean/adanet/adanet_weighted_ensemble = 3.10368, prediction/mean/adanet/subnetwork = 3.10382, prediction/mean/adanet/uniform_average_ensemble = 3.10382
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 15000: /tmp/models/learn_mixture_weights_with_complexity_regularization/model.ckpt-15000
INFO:tensorflow:

INFO:tensorflow:Finished evaluation at 2019-01-05-11:19:11
INFO:tensorflow:Saving dict for global step 20000: average_loss = 0.0344167, average_loss/adanet/adanet_weighted_ensemble = 0.0344167, average_loss/adanet/subnetwork = 0.0344671, average_loss/adanet/uniform_average_ensemble = 0.0344671, global_step = 20000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0421717, loss/adanet/adanet_weighted_ensemble = 0.0421717, loss/adanet/subnetwork = 0.0422994, loss/adanet/uniform_average_ensemble = 0.0422994, prediction/mean = 3.12925, prediction/mean/adanet/adanet_weighted_ensemble = 3.12925, prediction/mean/adanet/subnetwork = 3.13045, prediction/mean/adanet/uniform_average_ensemble = 3.13045
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 20000: /tmp/models/learn_mixture_weights_with_complexity_regularization/model.ckpt-20000
INFO:tensorflow:

INFO:tensorflow:global_step/sec: 445.635
INFO:tensorflow:loss = 0.024911, step = 22900 (0.224 sec)
INFO:tensorflow:global_step/sec: 467.464
INFO:tensorflow:loss = 0.0180074, step = 23000 (0.214 sec)
INFO:tensorflow:global_step/sec: 460.894
INFO:tensorflow:loss = 0.0181891, step = 23100 (0.217 sec)
INFO:tensorflow:global_step/sec: 451.71
INFO:tensorflow:loss = 0.00844219, step = 23200 (0.221 sec)
INFO:tensorflow:global_step/sec: 457.496
INFO:tensorflow:loss = 0.0128498, step = 23300 (0.219 sec)
INFO:tensorflow:global_step/sec: 461.731
INFO:tensorflow:loss = 0.0107765, step = 23400 (0.216 sec)
INFO:tensorflow:global_step/sec: 452.35
INFO:tensorflow:loss = 0.0101247, step = 23500 (0.221 sec)
INFO:tensorflow:global_step/sec: 463.107
INFO:tensorflow:loss = 0.0207482, step = 23600 (0.216 sec)
INFO:tensorflow:global_step/sec: 418.749
INFO:tensorflow:loss = 0.021596, step = 23700 (0.239 sec)
INFO:tensorflow:global_step/sec: 456.785
INFO:tensorflow:loss = 0.0158563, step = 23800 (0.219 sec)
INF

INFO:tensorflow:global_step/sec: 440.036
INFO:tensorflow:loss = 0.0153991, step = 26700 (0.227 sec)
INFO:tensorflow:global_step/sec: 440.275
INFO:tensorflow:loss = 0.00664225, step = 26800 (0.227 sec)
INFO:tensorflow:global_step/sec: 275.942
INFO:tensorflow:loss = 0.00828389, step = 26900 (0.362 sec)
INFO:tensorflow:global_step/sec: 342.503
INFO:tensorflow:loss = 0.0107016, step = 27000 (0.292 sec)
INFO:tensorflow:global_step/sec: 434.043
INFO:tensorflow:loss = 0.0115486, step = 27100 (0.230 sec)
INFO:tensorflow:global_step/sec: 437.056
INFO:tensorflow:loss = 0.00834976, step = 27200 (0.229 sec)
INFO:tensorflow:global_step/sec: 452.481
INFO:tensorflow:loss = 0.00904393, step = 27300 (0.221 sec)
INFO:tensorflow:global_step/sec: 446.031
INFO:tensorflow:loss = 0.00836775, step = 27400 (0.224 sec)
INFO:tensorflow:global_step/sec: 450.021
INFO:tensorflow:loss = 0.0132565, step = 27500 (0.222 sec)
INFO:tensorflow:global_step/sec: 429.141
INFO:tensorflow:loss = 0.0126525, step = 27600 (0.233 

INFO:tensorflow:global_step/sec: 341.241
INFO:tensorflow:loss = 0.0139281, step = 30500 (0.293 sec)
INFO:tensorflow:global_step/sec: 457.336
INFO:tensorflow:loss = 0.0160698, step = 30600 (0.219 sec)
INFO:tensorflow:global_step/sec: 375.904
INFO:tensorflow:loss = 0.0150846, step = 30700 (0.266 sec)
INFO:tensorflow:global_step/sec: 391.257
INFO:tensorflow:loss = 0.00950522, step = 30800 (0.255 sec)
INFO:tensorflow:global_step/sec: 353.125
INFO:tensorflow:loss = 0.0114827, step = 30900 (0.283 sec)
INFO:tensorflow:global_step/sec: 414.94
INFO:tensorflow:loss = 0.0147881, step = 31000 (0.241 sec)
INFO:tensorflow:global_step/sec: 334.619
INFO:tensorflow:loss = 0.00551273, step = 31100 (0.299 sec)
INFO:tensorflow:global_step/sec: 408.607
INFO:tensorflow:loss = 0.0130365, step = 31200 (0.245 sec)
INFO:tensorflow:global_step/sec: 333.03
INFO:tensorflow:loss = 0.00880915, step = 31300 (0.300 sec)
INFO:tensorflow:global_step/sec: 437.03
INFO:tensorflow:loss = 0.0133297, step = 31400 (0.229 sec)


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 35000: /tmp/models/learn_mixture_weights_with_complexity_regularization/model.ckpt-35000
INFO:tensorflow:global_step/sec: 38.0375
INFO:tensorflow:loss = 0.0107538, step = 35000 (2.629 sec)
INFO:tensorflow:global_step/sec: 445.681
INFO:tensorflow:loss = 0.00995159, step = 35100 (0.224 sec)
INFO:tensorflow:global_step/sec: 454.721
INFO:tensorflow:loss = 0.0071594, step = 35200 (0.220 sec)
INFO:tensorflow:global_step/sec: 454.453
INFO:tensorflow:loss = 0.00723665, step = 35300 (0.220 sec)
INFO:tensorflow:global_step/sec: 458.734
INFO:tensorflow:loss = 0.0110164, step = 35400 (0.218 sec)
INFO:tensorflow:global_step/sec: 448.426
INFO:tensorflow:loss = 0.00891222, step = 35500 (0.223 sec)
INFO:tensorflow:global_step/sec: 466.739
INFO:tensorflow:loss = 0.00961117, step = 35600 (0.214 sec)
INFO:tensorflow:global_step/sec: 430.056
INFO:tensorflow:loss = 0.0120052, step = 35700 (0.233 sec)
INFO:tensorflow:global_step/sec: 459.489
I

INFO:tensorflow:Finished evaluation at 2019-01-05-11:20:13
INFO:tensorflow:Saving dict for global step 40000: average_loss = 0.0343947, average_loss/adanet/adanet_weighted_ensemble = 0.0343947, average_loss/adanet/subnetwork = 0.036496, average_loss/adanet/uniform_average_ensemble = 0.0338111, global_step = 40000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0402376, loss/adanet/adanet_weighted_ensemble = 0.0402376, loss/adanet/subnetwork = 0.0473107, loss/adanet/uniform_average_ensemble = 0.0426334, prediction/mean = 3.07975, prediction/mean/adanet/adanet_weighted_ensemble = 3.07975, prediction/mean/adanet/subnetwork = 3.0979, prediction/mean/adanet/uniform_average_ensemble = 3.11418
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 40000: /tmp/models/learn_mixture_weights_with_complexity_regularization/model.ckpt-40000
INFO:tensorflow:Lo

INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 40000 into /tmp/models/learn_mixture_weights_with_complexity_regularization/model.ckpt.
INFO:tensorflow:loss = 0.00639489, step = 40000
INFO:tensorflow:global_step/sec: 162.367
INFO:tensorflow:loss = 0.00515783, step = 40100 (0.616 sec)
INFO:tensorflow:global_step/sec: 389.275
INFO:tensorflow:loss = 0.0111799, step = 40200 (0.257 sec)
INFO:tensorflow:global_step/sec: 398.554
INFO:tensorflow:loss = 0.00910988, step = 40300 (0.251 sec)
INFO:tensorflow:global_step/sec: 367.948
INFO:tensorflow:loss = 0.00745093, step = 40400 (0.272 sec)
INFO:tensorflow:global_step/sec: 393.808
INFO:tensorflow:loss = 0.0111087, step = 40500 (0.254 sec)
INFO:tensorflow:global_step/sec: 383.839
INFO:tensorflow:loss = 0.00850278, step = 40600 (0.261 sec)
INFO:tensorflow:global_step/sec: 403.941
INFO:tensorflow:loss = 0.0108078, step = 40700 (0.248 sec)
INFO:tensorflow:global_step/sec: 402.124
INFO:tensorflow:loss = 0.0144247, st

INFO:tensorflow:Finished evaluation at 2019-01-05-11:20:38
INFO:tensorflow:Saving dict for global step 45000: average_loss = 0.0349834, average_loss/adanet/adanet_weighted_ensemble = 0.0349834, average_loss/adanet/subnetwork = 0.045771, average_loss/adanet/uniform_average_ensemble = 0.0354388, global_step = 45000, label/mean = 3.10495, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss = 0.0418847, loss/adanet/adanet_weighted_ensemble = 0.0418847, loss/adanet/subnetwork = 0.0596194, loss/adanet/uniform_average_ensemble = 0.0456684, prediction/mean = 3.07493, prediction/mean/adanet/adanet_weighted_ensemble = 3.07493, prediction/mean/adanet/subnetwork = 3.1881, prediction/mean/adanet/uniform_average_ensemble = 3.13882
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 45000: /tmp/models/learn_mixture_weights_with_complexity_regularization/model.ckpt-45000
INFO:tensorflow:gl

INFO:tensorflow:Saving candidate 't2_3_layer_dnn' dict for global step 50000: architecture/adanet/ensembles = b'\n}\n>adanet/iteration_2/ensemble_t2_3_layer_dnn/architecture/adanetB1\x08\x07\x12\x00B+| 1_layer_dnn | 2_layer_dnn | 3_layer_dnn |J\x08\n\x06\n\x04text', average_loss/adanet/adanet_weighted_ensemble = 0.035349, average_loss/adanet/subnetwork = 0.0531309, average_loss/adanet/uniform_average_ensemble = 0.0368988, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss/adanet/adanet_weighted_ensemble = 0.0421257, loss/adanet/subnetwork = 0.0555319, loss/adanet/uniform_average_ensemble = 0.0437967, prediction/mean/adanet/adanet_weighted_ensemble = 3.07052, prediction/mean/adanet/subnetwork = 3.01709, prediction/mean/adanet/uniform_average_ensemble = 3.08182
INFO:tensorflow:Finished evaluation at 2019-01-05-11:20:56
INFO:tensorflow:Saving dict for global step 50000: average_loss = 0.0

INFO:tensorflow:Saving candidate 't2_2_layer_dnn' dict for global step 55000: architecture/adanet/ensembles = b'\n}\n>adanet/iteration_2/ensemble_t2_2_layer_dnn/architecture/adanetB1\x08\x07\x12\x00B+| 1_layer_dnn | 2_layer_dnn | 2_layer_dnn |J\x08\n\x06\n\x04text', average_loss/adanet/adanet_weighted_ensemble = 0.0345905, average_loss/adanet/subnetwork = 0.0424374, average_loss/adanet/uniform_average_ensemble = 0.0335247, label/mean/adanet/adanet_weighted_ensemble = 3.10495, label/mean/adanet/subnetwork = 3.10495, label/mean/adanet/uniform_average_ensemble = 3.10495, loss/adanet/adanet_weighted_ensemble = 0.0420342, loss/adanet/subnetwork = 0.039453, loss/adanet/uniform_average_ensemble = 0.0378332, prediction/mean/adanet/adanet_weighted_ensemble = 3.08276, prediction/mean/adanet/subnetwork = 3.0255, prediction/mean/adanet/uniform_average_ensemble = 3.08462
INFO:tensorflow:Saving candidate 't2_3_layer_dnn' dict for global step 55000: architecture/adanet/ensembles = b'\n}\n>adanet/iter

INFO:tensorflow:Building subnetwork '2_layer_dnn'
INFO:tensorflow:Building iteration 2
INFO:tensorflow:Building subnetwork '2_layer_dnn'
INFO:tensorflow:Building subnetwork '3_layer_dnn'
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-01-05T11:21:27Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/models/learn_mixture_weights_with_complexity_regularization/model.ckpt-60000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving candidate 't1_2_layer_dnn' dict for global step 60000: architecture/adanet/ensembles = b'\no\n>adanet/iteration_1/ensemble_t1_2_layer_dnn/architecture/adanetB#\x08\x07\x12\x00B\x1d| 1_layer_dnn | 2_layer_dnn |J\x08\n\x06\n\x04text', average_loss/adanet/adanet_weighted_ensemble = 0.0343947, average_loss/adanet/subnetwork = 0.036496, average_loss/adanet/uniform_average_ensemble = 0.0338111, label/mean/adanet/adanet_weighted_ensemble = 3.10495, l

Learning the mixture weights with $\lambda > 0$ produces a model with **0.0354**
MSE.

Inspecting the ensemble architecture demonstrates the effects of complexity
regularization on candidate selection. The selected subnetworks are relatively
less complex: unlike in previous runs, the deepest subnetwork has only 2 hidden layers.

## Conclusion

In this tutorial, you were able to explore training an AdaNet model's mixture
weights with $\lambda \ge 0$. You were also able to compare against building an
ensemble formed by always choosing the best candidate subnetwork at each
iteration based on it's ability to improve the ensemble's loss on the training
set, and averaging their results.

Uniform average ensembles work unreasonably well in practice, yet learning the
mixture weights with the correct values of $\lambda$ and $\beta$ should always
produce a better model when candidates have varying complexity. However, this
does require some additional hyperparameter tuning, so practically you can train
an AdaNet with the default mixture weights and $\lambda=0$ first, and once you
have confirmed that the subnetworks are training correctly, you can tune the
mixture weight hyperparameters.

While this example explored a regression task, these observations apply to using
AdaNet on other tasks like binary-classification and multi-class classification.