# Starter notebook: Fast or Slow with TensorFlow GNN

This tutorial is designed to walk competitors of [predict-ai-model-runtime](https://kaggle.com/competitions/predict-ai-model-runtime) through the dataset and using [TensorFlow-GNN](https://github.com/tensorflow/gnn).

In summary, you will:

- `pip install` libraries
- imports helper functions from another project, for reading data (`{layout, tile}_data`) and easier programming of GNN models (`implicit`).
- read batches of graphs from the dataset, prints them on screen and explains them.
- go through details for writing a GNN model and train it
- produce an inference `csv` file on the test set.


In [1]:
!pip install tensorflow_gnn --pre
!pip install tensorflow_ranking

Collecting tensorflow_gnn
  Downloading tensorflow_gnn-0.6.0-py3-none-any.whl (803 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m803.9/803.9 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting google-vizier>=0.0.13 (from tensorflow_gnn)
  Downloading google_vizier-0.1.12-py3-none-any.whl (733 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m734.0/734.0 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting ml-collections (from tensorflow_gnn)
  Downloading ml_collections-0.1.1.tar.gz (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting dill<0.3.2,>=0.3.1.1 (from apache-beam<2.47.0->tensorflow_gnn)
  Downloading dill-0.3.1.1.tar.gz (151 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m152.0/152.0 kB[0m [31m12.2 MB/s[0m eta [36m0:00:00[

In [2]:
!nvidia-smi

Wed Oct 18 03:20:15 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.47                 Driver Version: 531.68       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3090 Ti      On | 00000000:01:00.0  On |                  Off |
|  0%   46C    P5               33W / 450W|   1765MiB / 24564MiB |     30%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [3]:
# Install standard modules

import os
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_gnn as tfgnn
import tensorflow_ranking as tfr

2023-10-18 03:20:22.934643: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-18 03:20:23.127417: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


The utility modules are based on the code on [github]():


In [8]:
# Install utility modules.

# import tpugraphsv1_layout_data_py as layout_data
# import tpugraphsv1_tile_data_py as tile_data
import tpugraphsv1_implicit_py as implicit

ModuleNotFoundError: No module named 'tpugraphsv1_implicit_py'

# Training Pipelines

The following code is organized as:

1.  Helper functions: MLP (`_mlp`) and Embedding layer (`_Opembedding`). The embedding layer amends a feature on the `op` nodes, with name `op_e`, by embedding the integral op IDs.
1.  Pipeline code for training on the Layout collections.
1.  Pipeline code for training on the Tile collection.


## Helper functions, for both Layout and Tile collections.


In [None]:
def _mlp(dims, hidden_activation, l2reg=1e-4, use_bias=True):
    """Helper function for multi-layer perceptron (MLP)."""
    layers = []
    for i, dim in enumerate(dims):
        if i > 0:
            layers.append(tf.keras.layers.Activation(hidden_activation))
        layers.append(
            tf.keras.layers.Dense(
                dim,
                kernel_regularizer=tf.keras.regularizers.l2(l2reg),
                use_bias=use_bias,
            )
        )
    return tf.keras.Sequential(layers)


class _OpEmbedding(tf.keras.Model):
    """Embeds GraphTensor.node_sets['op']['op'] nodes into feature 'op_e'."""

    def __init__(self, num_ops: int, embed_d: int, l2reg: float = 1e-4):
        super().__init__()
        self.embedding_layer = tf.keras.layers.Embedding(
            num_ops, embed_d, activity_regularizer=tf.keras.regularizers.l2(l2reg)
        )

    def call(
        self, graph: tfgnn.GraphTensor, training: bool = False
    ) -> tfgnn.GraphTensor:
        op_features = dict(graph.node_sets["op"].features)
        op_features["op_e"] = self.embedding_layer(
            tf.cast(graph.node_sets["op"]["op"], tf.int32)
        )
        return graph.replace_features(node_sets={"op": op_features})

# Layout Training Pipeline

We start by defining constants:

1. Batch sizes = num graphs, num sampled nodes per graph, and num configurations per graph.
1. Collection to train on: source (`xla` versus `nlp`) and search stragey (`random` versus `default`).

Then, boilerplate code to prepare the datasets.

Then, we dive deeper into the dataset examples (a batch of graphs from the tiles collection).

Finally, details on defining a model.


## Define constants and choose subcollection

We load `BATCH_SIZE` graphs per batch. Each will have


In [None]:
LAYOUT_DATA_ROOT = "/kaggle/input/predict-ai-model-runtime/npz_all/npz/layout"
SOURCE = "xla"  # Can be "xla" or "nlp"
SEARCH = "random"  # Can be "random" or "default"

# Batch size information.
BATCH_SIZE = 16  # Number of graphs per batch.
CONFIGS_PER_GRAPH = (
    5  # Number of configurations (features and target values) per graph.
)
MAX_KEEP_NODES = 1000  # Useful for dropout.
# `MAX_KEEP_NODES` is (or, is not) useful for Segment Dropout, if model uses
# edges "sampled_config" and "sampled_feed" (or, "config" and "feed")

## Prepare `tf.data.Dataset` instances

Specifically, `layout_train_ds` and `layout_valid_ds`.

It can take ~10 minutes if you are running for the first time, for the caches to be created.


In [None]:
layout_data_root_dir = os.path.join(
    os.path.expanduser(LAYOUT_DATA_ROOT), SOURCE, SEARCH
)

layout_npz_dataset = layout_data.get_npz_dataset(
    layout_data_root_dir,
    min_train_configs=CONFIGS_PER_GRAPH,
    max_train_configs=500,  # If any graph has more than this configurations, it will be filtered [speeds up loading + training]
    cache_dir="cache",
)


def pair_layout_graph_with_label(graph: tfgnn.GraphTensor):
    """Extracts label from graph (`tfgnn.GraphTensor`) and returns a pair of `(graph, label)`"""
    # Return runtimes divded over large number: only ranking is required. The
    # runtimes are in the 100K range
    label = tf.cast(graph.node_sets["g"]["runtimes"], tf.float32) / 1e7
    return graph, label


layout_train_ds = (
    layout_npz_dataset.train.get_graph_tensors_dataset(
        CONFIGS_PER_GRAPH, max_nodes=MAX_KEEP_NODES
    )
    .shuffle(100, reshuffle_each_iteration=True)
    .batch(BATCH_SIZE, drop_remainder=False)
    .map(tfgnn.GraphTensor.merge_batch_to_components)
    .map(pair_layout_graph_with_label)
)

layout_valid_ds = (
    layout_npz_dataset.validation.get_graph_tensors_dataset(CONFIGS_PER_GRAPH)
    .batch(BATCH_SIZE, drop_remainder=False)
    .map(tfgnn.GraphTensor.merge_batch_to_components)
    .map(pair_layout_graph_with_label)
)

### Familiarize yourself with data

Lets obtain an example from the dataset `layout_train_ds`, i.e., an instance of `GraphTensor` which encodes a batch
of graphs. Luckily, using TF-GNN, we can describe our model as-if we are operating on a single graph, and naturally the
model extends to multiple graphs!

Let's take one example (containing a batch) and print it.


In [None]:
graph_batch, config_runtimes = next(iter(layout_train_ds.take(1)))

print("graph_batch = ")
print(graph_batch)
print("\n\n")
print("config_runtimes=")
print(config_runtimes)

**< Crash-course on TF-GNN >**

Each `GraphTensor` contains three fields:

1. `node_sets`, can be thought of `dict` from node type (str) in the graph (batch) to feature tensors for that node type.
1. `edge_sets`, can be thought of `dict` from edge type (str) in the graph (batch) to adjacency, as two integer vectors: source IDs and target IDs -- i.e., all edges are directed, unless explicitly undirected by the model. If edge set `e` connects from node-set `n1` to node-set `n2`, then if `graph.edge_sets["e1"].adjacency.source = [0, 13, ...]` and `graph.edge_sets["e1"].adjacency.target = [1, 14, ...]` (must be of equal length), then node `0` from node-set `n1` points to node `1` from node-set `n2`. The IDs are zero-based, and used to index into the feature tensors at `graph.node_sets["n1"]` and `graph.node_sets["n2"]`.
1. `context`, contains information per graph in the batch. We do not use this, for the layout collection, as we have singleton nodeset per graph with name `"g"` (with features accessible as `graph.node_sets["g"]`)

**</ Crash-course on TF-GNN >**


Now, lets print the node-sets and the edge-sets of the example `graph_batch`:


In [None]:
# The `graph_batch` contains node-sets and edge-sets.
# There are no context features for layout collection
print("graph_batch.context =", graph_batch.context)
# Note: graph_batch.context.sizes must be equal to BATCH_SIZE.
# Lets print-out all features for all nodesets.

for node_set_name in sorted(graph_batch.node_sets.keys()):
    print(f'\n\n #####  NODE SET "{node_set_name}" #########')
    print("** Has sizes: ", graph_batch.node_sets[node_set_name].sizes)
    for feature_name in graph_batch.node_sets[node_set_name].features.keys():
        print(f'\n Feature "{feature_name}" has values')
        print(graph_batch.node_sets[node_set_name][feature_name])

**The node set `'g'` corresponds to the "graph-level"**. Since `BATCH_SIZE==16`, each tensor in `'g'` should have a leading dimension of `16`. The `graph_id` feature contains model names. Since `CONFIGS_PER_GRAPH_PER_EPOCH=5`, then feature 'runtimes' must be of shape `(16, 5)` with `graph_batch.node_sets['g']['runtimes'][i, j]` indicating the runtime when compiling graph `i` with configuration features `j`. These specific feature values must be found in `nconfig` node-set, as explained next.

**Lets look at nodes per graph**. For instance, node set `op` contains the operation nodes in the tensorflow graph (e.g., element-wise add, matrix multiply, etc). Op-codes are stored in `graph_batch.node_sets['op']['op']`. Since each graph has variable number of nodes, the array `graph_batch.node_sets['op'].sizes` gives the number of `op` nodes per (of the `16`) graphs.

Some nodes are configurable. The (_virtual_) node-set `nconfig` contains features for configurable nodes. The features are in `graph_batch.node_sets['nconfig']['feats']`.

The edge-set `'config'` (next) indicates the correspondence between `nconfig` features and `op` nodes. Specifically, each (_virtual_) `config` node has degree of 1 and each `op` node has degree of 0 or 1 (on edge-set `'config'`).


Let's print-out all the edge-sets.


In [None]:
print("\n config edge set: ", graph_batch.edge_sets["config"])
print("\n config source nodes: ", graph_batch.edge_sets["config"].adjacency.source)
print("\n config target nodes: ", graph_batch.edge_sets["config"].adjacency.target)
print("\n g_op edge set: ", graph_batch.edge_sets["g_op"])
print("\n g_config edge set: ", graph_batch.edge_sets["g_config"])

The edge-set `'config'` pairs each `"nconfig"` node with one `"op"` node. To list the correspondences, you print the `.adjacency.source` and `.adjacency.target`:


In [None]:
print(
    graph_batch.edge_sets["config"]
)  # Holds directed adjacency as list of pairs of indices: nconfig->op
print(
    graph_batch.edge_sets["config"].adjacency.source
)  # Print nconfig indices (should be a range())
print(
    graph_batch.edge_sets["config"].adjacency.target
)  # Print corresponding `op` indices.

Other than `config` edges, the remainder of the edge-sets are:

```
'feed', 'g_op', 'g_config', 'sampled_config', 'sampled_feed'
```

The first (`feed`) is the actual computation graph! `op` nodes feed into `op` nodes. **Note: The "transpose" of this adjacency (implicit) matrix indicates the direction of information flow (models are later in the tutorial).**. The second (`g_op`) and third (`g_config`), group by graph, respectively, `op` nodes and the (virtual) `nconfig` nodes. This edge-set can be helpful for global-pooling operations.

_Segment-level Training_: Finally, to implement some version of **dropout**, `sampled_config` and `sampled_feed` edge-sets contain edges to randomly-sampled `op` nodes. To do full-graph (training or inference), you may use `config` and `feed`. To do training with segment dropout (e.g., a naive version of https://arxiv.org/abs/2308.13490, to appear @ NeurIPS'23), you may use `sampled_config` and `sampled_feed`. You may adjust the number of **keep** nodes by setting `MAX_KEEP_NODES`. An edge only survives in `sampled_feed` only if both of its endpoints survived (segment-level) dropout. In our naive implementation here, nodes with contiguous indices are kept. However, you are welcome to re-implement a better segmentation strategy.

_NOTE: When using TF-GNN (models to follow), you dont have to worry about `sizes`: just write your model code as-if you are operating on a single graph, and the code naturally extends to a batch of graphs._


## Modeling

Before we define the full model (`ResModel`), lets run some modeling functions. For example, let's embed the op-codes.

We have `layout_npz_dataset.num_ops` unique number of op codes, which determines the embedding size.


In [None]:
num_ops = layout_npz_dataset.num_ops
print("number of ops in the dataset=", num_ops)

embedding_layer = _OpEmbedding(
    num_ops, 16
)  # 16-dimensional embedding, for demonstration.
graph_batch_embedded_ops = embedding_layer(graph_batch)

print('\n\n Before embedding, node-set "op"=\n', graph_batch.node_sets["op"])
print(
    '\n\n After embedding, node-set "op"=\n', graph_batch_embedded_ops.node_sets["op"]
)

_Note: after embedding, an additional feature `"op_e"` shows-up._

Now, lets concatenate the configuration features with the embedding features.


In [None]:
op_e = graph_batch_embedded_ops.node_sets["op"]["op_e"]
config_features = graph_batch_embedded_ops.node_sets["nconfig"]["feats"]

print("op_e.shape ==", op_e.shape)
print("config_features.shape ==", config_features.shape)

There are two differences in the shapes, yet, we concatenate them.

1. `op_e` has more nodes: every node has an op-code, but not every node is configurable. We first to resize the leading dimension of `config_features` to equal the leading dimension of `op_e`, by filling zeros for nodes that are not configurable.
1. `config_features` is cuboid. The middle dimension identifies the configuration: there are `CONFIGS_PER_GRAPH` of them.

For the first, we can multiply by the (sparse) "config" adjacency matrix -- a binary matrix where every is a one-hot and most rows are zero. If adjacency entry at `[i, j]` is set, then `graph.node_sets["nconfig"]["feats"][j]` contain configuration features for node `i` of `graph.node_sets["op"]`.


In [None]:
config_adj = implicit.AdjacencyMultiplier(graph_batch_embedded_ops, "config")
print("config_adj.shape =", config_adj.shape)
resized_config_features = config_adj @ config_features
print("resized_config_features.shape =", resized_config_features.shape)

Now, we want to broadcast the `op_e` feature matrix to a cuboid, by replicating on a (new) inner dimension so that we can finally combine the config features with op-embeddings.


In [None]:
broadcasted_op_e = tf.stack([op_e] * CONFIGS_PER_GRAPH, axis=1)

combined_features = tf.concat([broadcasted_op_e, resized_config_features], axis=-1)

print("combined_features.shape = ", combined_features.shape)

Now, we want to do graph convolution layer (i.e., message-passing followed by non-linearity) among the `feed` edges. Usually, this can be done by left-multiplying the feature tensor with **some form** of an adjacency matrix. The exact form will determine the pooling (e.g., sum VS average). Let us use the symmetrically-normalized adjacency matrix with self-connections added (by Kipf & Welling, ICLR'17).

We can compute such a matrix $\widehat{A}$ as:

$$A_\textrm{undirected.w.selfconnections} \leftarrow A + A^\top + I$$

$$D \leftarrow \mathbf{1}^\top A_\textrm{undirected.w.selfconnections}$$

$$\widehat{A} \leftarrow D^{-\frac{1}{2}} (A_\textrm{undirected.w.selfconnections}) D^{-\frac{1}{2}} $$

Which is acheivable by the following code:


In [None]:
adj_op_op = implicit.AdjacencyMultiplier(graph_batch_embedded_ops, "feed")  # op->op
adj_config = implicit.AdjacencyMultiplier(
    graph_batch_embedded_ops, "config"
)  # nconfig->op

adj_op_op_hat = (adj_op_op + adj_op_op.transpose()).add_eye()
adj_op_op_hat = adj_op_op_hat.normalize_symmetric()

Finally, the message passing can written as:


In [None]:
A_times_X = adj_op_op_hat @ combined_features
print("A_times_x.shape =", A_times_X.shape)

Now, we put together everything above to write a model class `ResModel` (next), which has a couple more concepts:

1. Adjacency for `"g_op"` and `"g_config"`, which is used to pool information from all ops and from configurable ops, to the graph level.
1. Residual connections.
1. Segment dropout. a forward-pass is computed on the entire graph (but, with `tf.stop_gradient`). Then, another forward pass is computed using only sampled edge-sets (`edgeset_prefix` is set to `"sampled_"` by `forward()`).

Without further ado, `ResModel`:


In [None]:
class ResModel(tf.keras.Model):
    """GNN with residual connections."""

    def __init__(
        self,
        num_configs: int,
        num_ops: int,
        op_embed_dim: int = 32,
        num_gnns: int = 2,
        mlp_layers: int = 2,
        hidden_activation: str = "leaky_relu",
        hidden_dim: int = 32,
        reduction: str = "sum",
    ):
        super().__init__()
        self._num_configs = num_configs
        self._num_ops = num_ops
        self._op_embedding = _OpEmbedding(num_ops, op_embed_dim)
        self._prenet = _mlp([hidden_dim] * mlp_layers, hidden_activation)
        self._gc_layers = []
        for _ in range(num_gnns):
            self._gc_layers.append(_mlp([hidden_dim] * mlp_layers, hidden_activation))
        self._postnet = _mlp([hidden_dim, 1], hidden_activation, use_bias=False)

    def call(self, graph: tfgnn.GraphTensor, training: bool = False):
        del training
        return self.forward(graph, self._num_configs)

    def _node_level_forward(
        self,
        node_features: tf.Tensor,
        config_features: tf.Tensor,
        graph: tfgnn.GraphTensor,
        num_configs: int,
        edgeset_prefix="",
    ) -> tf.Tensor:
        adj_op_op = implicit.AdjacencyMultiplier(
            graph, edgeset_prefix + "feed"
        )  # op->op
        adj_config = implicit.AdjacencyMultiplier(
            graph, edgeset_prefix + "config"
        )  # nconfig->op

        adj_op_op_hat = (adj_op_op + adj_op_op.transpose()).add_eye()
        adj_op_op_hat = adj_op_op_hat.normalize_symmetric()

        x = node_features

        x = tf.stack([x] * num_configs, axis=1)
        config_features = 100 * (adj_config @ config_features)
        x = tf.concat([config_features, x], axis=-1)
        x = self._prenet(x)
        x = tf.nn.leaky_relu(x)

        for layer in self._gc_layers:
            y = x
            y = tf.concat([config_features, y], axis=-1)
            y = tf.nn.leaky_relu(layer(adj_op_op_hat @ y))
            x += y
        return x

    def forward(
        self, graph: tfgnn.GraphTensor, num_configs: int, backprop=True
    ) -> tf.Tensor:
        graph = self._op_embedding(graph)

        config_features = graph.node_sets["nconfig"]["feats"]
        node_features = tf.concat(
            [graph.node_sets["op"]["feats"], graph.node_sets["op"]["op_e"]], axis=-1
        )

        x_full = self._node_level_forward(
            node_features=tf.stop_gradient(node_features),
            config_features=tf.stop_gradient(config_features),
            graph=graph,
            num_configs=num_configs,
        )

        if backprop:
            x_backprop = self._node_level_forward(
                node_features=node_features,
                config_features=config_features,
                graph=graph,
                num_configs=num_configs,
                edgeset_prefix="sampled_",
            )

            is_selected = graph.node_sets["op"]["selected"]
            # Need to expand twice as `is_selected` is a vector (num_nodes) but
            # x_{backprop, full} are 3D tensors (num_nodes, num_configs, num_feats).
            is_selected = tf.expand_dims(is_selected, axis=-1)
            is_selected = tf.expand_dims(is_selected, axis=-1)
            x = tf.where(is_selected, x_backprop, x_full)
        else:
            x = x_full

        adj_config = implicit.AdjacencyMultiplier(graph, "config")

        # Features for configurable nodes.
        config_feats = adj_config.transpose() @ x

        # Global pooling
        adj_pool_op_sum = implicit.AdjacencyMultiplier(graph, "g_op").transpose()
        adj_pool_op_mean = adj_pool_op_sum.normalize_right()
        adj_pool_config_sum = implicit.AdjacencyMultiplier(
            graph, "g_config"
        ).transpose()
        x = self._postnet(
            tf.concat(
                [
                    # (A D^-1) @ Features
                    adj_pool_op_mean @ x,
                    # l2_normalize( A @ Features )
                    tf.nn.l2_normalize(adj_pool_op_sum @ x, axis=-1),
                    # l2_normalize( A @ Features )
                    tf.nn.l2_normalize(adj_pool_config_sum @ config_feats, axis=-1),
                ],
                axis=-1,
            )
        )

        x = tf.squeeze(x, -1)

        return x

## Training loop

Create a model, objective function, and optimizer.


In [None]:
model = ResModel(CONFIGS_PER_GRAPH, layout_npz_dataset.num_ops)

loss = tfr.keras.losses.ListMLELoss()  # (temperature=10)
opt = tf.keras.optimizers.Adam(learning_rate=1e-3, clipnorm=0.5)

model.compile(
    loss=loss,
    optimizer=opt,
    metrics=[
        tfr.keras.metrics.OPAMetric(name="opa_metric"),
    ],
)

### Train for a few epochs.


In [None]:
early_stop = (
    5  # If validation OPA did not increase in this many epochs, terminate training.
)
best_params = None  # Stores parameters corresponding to best validation OPA, to restore to them after training.
best_val_opa = -1  # Tracks best validation OPA
best_val_at_epoch = -1  # At which epoch.
epochs = 1  # Total number of training epochs.

for i in range(epochs):
    history = model.fit(
        layout_train_ds,
        epochs=1,
        verbose=1,
        validation_data=layout_valid_ds,
        validation_freq=1,
    )

    train_loss = history.history["loss"][-1]
    train_opa = history.history["opa_metric"][-1]
    val_loss = history.history["val_loss"][-1]
    val_opa = history.history["val_opa_metric"][-1]
    if val_opa > best_val_opa:
        best_val_opa = val_opa
        best_val_at_epoch = i
        best_params = {v.ref: v + 0 for v in model.trainable_variables}
        print(" * [@%i] Validation (NEW BEST): %s" % (i, str(val_opa)))
    elif early_stop > 0 and i - best_val_at_epoch >= early_stop:
        print(
            "[@%i] Best accuracy was attained at epoch %i. Stopping."
            % (i, best_val_at_epoch)
        )
        break

# Restore best parameters.
print("Restoring parameters corresponding to the best validation OPA.")
assert best_params is not None
for v in model.trainable_variables:
    v.assign(best_params[v.ref])

## Make Submission CSV file for this task


In [None]:
import tqdm

_INFERENCE_CONFIGS_BATCH_SIZE = 50

output_csv_filename = f"inference_layout_{SOURCE}_{SEARCH}.csv"
print("\n\n   Running inference on test set ...\n\n")
test_rankings = []

assert layout_npz_dataset.test.graph_id is not None
for graph in tqdm.tqdm(
    layout_npz_dataset.test.iter_graph_tensors(),
    total=layout_npz_dataset.test.graph_id.shape[-1],
    desc="Inference",
):
    num_configs = graph.node_sets["g"]["runtimes"].shape[-1]
    all_scores = []
    for i in tqdm.tqdm(range(0, num_configs, _INFERENCE_CONFIGS_BATCH_SIZE)):
        end_i = min(i + _INFERENCE_CONFIGS_BATCH_SIZE, num_configs)
        # Take a cut of the configs.
        node_set_g = graph.node_sets["g"]
        subconfigs_graph = tfgnn.GraphTensor.from_pieces(
            edge_sets=graph.edge_sets,
            node_sets={
                "op": graph.node_sets["op"],
                "nconfig": tfgnn.NodeSet.from_fields(
                    sizes=graph.node_sets["nconfig"].sizes,
                    features={
                        "feats": graph.node_sets["nconfig"]["feats"][:, i:end_i],
                    },
                ),
                "g": tfgnn.NodeSet.from_fields(
                    sizes=tf.constant([1]),
                    features={
                        "graph_id": node_set_g["graph_id"],
                        "runtimes": node_set_g["runtimes"][:, i:end_i],
                        "kept_node_ratio": node_set_g["kept_node_ratio"],
                    },
                ),
            },
        )
        h = model.forward(subconfigs_graph, num_configs=(end_i - i), backprop=False)
        all_scores.append(h[0])
    all_scores = tf.concat(all_scores, axis=0)
    graph_id = graph.node_sets["g"]["graph_id"][0].numpy().decode()
    sorted_indices = (
        tf.strings.join(tf.strings.as_string(tf.argsort(all_scores)), ";")
        .numpy()
        .decode()
    )
    test_rankings.append((graph_id, sorted_indices))

with tf.io.gfile.GFile(output_csv_filename, "w") as fout:
    fout.write("ID,TopConfigs\n")
    for graph_id, ranks in test_rankings:
        fout.write(f"layout:{SOURCE}:{SEARCH}:{graph_id},{ranks}\n")
print("\n\n   ***  Wrote", output_csv_filename, "\n\n")

## Combine submission CSVs from all layout collections into one CSV file

Finally, after running on all collections, you need to combine the CSV files together (e.g., by concatenation), to prepare the final submission. Specifically, you can modify the constants:

```
SOURCE = 'xla'  # Can be "xla" or "nlp"
SEARCH = 'random'  # Can be "random" or "default"
```

(from a few cells ago) and run for all 4 combinations: SOURCE=("xla", "nlp") x SEARCH=("random", "default"), then combine all inferences into one file:


In [None]:
!cat inference_layout_xla_random.csv inference_layout_xla_default.csv inference_layout_nlp_random.csv inference_layout_nlp_default.csv > inference_layout_all.csv

producing file `"inference_layout_all.csv"` that combines all predictions for all layout subcollections. Finally, this file should be combined with the CSV for the tiles collection, as explained next.


# Tile Training Pipeline

This section will be written by end of September. We prioritized getting this notebook out, as soon as possible, as the above Layout section is (1) more tricky and (2) most of the score depends on it.
