# Sparse Operation Kit #
---
This notebook introduces what is sparse operation kit and how to use it to accerlerate the recommander system's training process.

Sparse Opertion Kit (hereafter SOK) is a toolkit aiming at wrapping those effecient algorithms / implementations used in recommendation scenarios, which includes many sparse operations, into a user-friendly library. When user wants to leverage those GPU-accerlerated algorithms to speed up their application, they can quickly start from this Python toolkit. 

## Documents ##
Documentation: https://nvidia-merlin.github.io/HugeCTR/sparse_operation_kit/master/index.html

## Menu ##
1. **Installation**
2. **Single-node, Multi-GPUs synchronized training**
3. **Multi-node, Multi-GPUs synchronized training**

### Installation ###
+ **Requirements**
    - TensorFlow 2.x


+ **Get SOK from NGC** <br>
The SparseOperationKit is preinstalled in the [Merlin Tensorflow Training Container](https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-tensorflow-training): `nvcr.io/nvidia/merlin/merlin-tensorflow-training:21.12`. <br>
You can check the existence of required libraries by running the following Python code after launching this container.
```shell
$ python3 -c "import sparse_operation_kit as sok"
```

+ **Build SOK from Souce Code** <br>
If you want to build SparseOperationKit from the souce code instead of using the NGC container, please refer to the [Setup development environment](../../docs/hugectr_contributor_guide.md#build-sparse-operation-kit-sok-from-source).

### Single-node, Multi-GPUs synchronized training ###

Firstly, specify hyper parameters.

In [1]:
%reset -f

args = dict()

args["gpu_num"] = 8                               # the number of available GPUs
args["iter_num"] = 50                             # the number of training iteration
args["max_vocabulary_size_per_gpu"] = 1024
args["slot_num"] = 10                             # the number of feature fields in this embedding layer
args["max_nnz"] = 4                               # the maximum number of valid features in each slot
args["embedding_vec_size"] = 4                    # the dimension of embedding vectors
args["combiner"] = "mean"                         # the reduction combiner used intra slots, it can be [mean, sum]
args["global_batch_size"] = 65536                 # the globally batchsize for all GPUs
args["optimizer"] = "plugin_adam"                 # the optimizer used for training, it can be [plugin_adam, adam, sgd]

Secondly, import the used modules.

In [2]:
import sys, os, json
import sparse_operation_kit as sok
import tensorflow as tf
os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(map(str, range(args["gpu_num"])))
import numpy as np

[INFO]: sparse_operation_kit is imported


Thirdly, define a DNN model using TensorFlow.

In [3]:
class TfDemo(tf.keras.models.Model):
    def __init__(self, 
                 init_tensors, 
                 combiner, 
                 global_batch_size,
                 slot_num, 
                 embedding_vec_size,
                 **kwargs):
        super(TfDemo, self).__init__(**kwargs)
        self.combiner = combiner
        self.global_batch_size = global_batch_size
        self.slot_num = slot_num
        self.embedding_vec_size = embedding_vec_size

        self.init_tensors = init_tensors
        self.params = tf.Variable(initial_value=tf.concat(self.init_tensors, axis=0))

        self.dense_layer = tf.keras.layers.Dense(units=1, activation=None,
                                                 kernel_initializer="ones",
                                                 bias_initializer="zeros")

    def call(self, inputs, training=True):
        # [batchsize * slot_num, embedding_vec_size]
        embedding_vector = tf.nn.embedding_lookup_sparse(params=self.params, sp_ids=inputs,
                                                        sp_weights=None, combiner=self.combiner)

        # [batchsize, slot_num * embedding_vec_size]
        embedding_vector = tf.reshape(embedding_vector, shape=[self.global_batch_size, self.slot_num * self.embedding_vec_size])
        logit = self.dense_layer(embedding_vector)
        return logit, embedding_vector

Fourthly, define the same DNN model using SOK.

In [4]:
class SOKDemo(tf.keras.models.Model):
    def __init__(self,
                 combiner,
                 max_vocabulary_size_per_gpu,
                 slot_num,
                 max_nnz,
                 embedding_vec_size, 
                 **kwargs):
        super(SOKDemo, self).__init__(**kwargs)

        self.combiner = combiner
        self.max_vocabulary_size_per_gpu = max_vocabulary_size_per_gpu
        self.slot_num = slot_num
        self.max_nnz = max_nnz
        self.embedding_vec_size = embedding_vec_size

        self.embedding_layer = sok.DistributedEmbedding(combiner=self.combiner,
                                                           max_vocabulary_size_per_gpu=self.max_vocabulary_size_per_gpu,
                                                           embedding_vec_size=self.embedding_vec_size,
                                                           slot_num=self.slot_num,
                                                           max_nnz=self.max_nnz)

        self.dense_layer = tf.keras.layers.Dense(units=1, activation=None,
                                                 kernel_initializer="ones",
                                                 bias_initializer="zeros")

    def call(self, inputs, training=True):
        # [batchsize, slot_num, embedding_vec_size]
        embedding_vector = self.embedding_layer(inputs, training=training)
        # [batchsize, slot_num * embedding_vec_size]
        embedding_vector = tf.reshape(embedding_vector, shape=[-1, self.slot_num * self.embedding_vec_size])
        # [batchsize, 1]
        logit = self.dense_layer(embedding_vector)
        return logit, embedding_vector

Fifthly, generate synthetic dataset and initial values that is used to initialize embedding parameters.

In [5]:
# import utility python script
sys.path.append("../unit_test/test_scripts/tf2/")
import utils

In [6]:
# -1 is used to represent the invalid keys
random_samples = utils.generate_random_samples(num_of_samples=args["global_batch_size"] * args["iter_num"],
                                               vocabulary_size=args["gpu_num"] * args["max_vocabulary_size_per_gpu"],
                                               slot_num=args["slot_num"],
                                               max_nnz=args["max_nnz"])

[INFO]: begin to generate random samples
[INFO]: generated random samples


In [7]:
# check ramdom_samples
random_samples

(array([[[ 237,   38,   -1,   -1],
         [1255,  921,   -1,   -1],
         [2139,   -1,   -1,   -1],
         ...,
         [  -1,   -1,   -1,   -1],
         [7018, 6987,   -1,   -1],
         [7875,   -1,   -1,   -1]],
 
        [[ 749,  718,  680,  642],
         [1606,  894,   -1,   -1],
         [1859, 1782,   -1,   -1],
         ...,
         [6455, 6384, 6321,   -1],
         [6589,   -1,   -1,   -1],
         [7890,   -1,   -1,   -1]],
 
        [[ 653,  582,   -1,   -1],
         [1063,  929,  858,   -1],
         [1953,   -1,   -1,   -1],
         ...,
         [6209, 6084, 5866, 5741],
         [6942,   -1,   -1,   -1],
         [7886, 7410,   -1,   -1]],
 
        ...,
 
        [[ 166,   -1,   -1,   -1],
         [1580, 1210, 1058,  913],
         [2326, 2174, 1796,   -1],
         ...,
         [5917,   -1,   -1,   -1],
         [7227, 7066, 6738,   -1],
         [8056,   -1,   -1,   -1]],
 
        [[ 356,   -1,   -1,   -1],
         [1346, 1177,   -1,   -1],
       

In [8]:
# generate initial value for embedding parameters
init_tensors = utils.get_ones_tensor(max_vocab_size_per_gpu=args["max_vocabulary_size_per_gpu"],
                                     embedding_vec_size=args["embedding_vec_size"],
                                     num=args["gpu_num"])

In [9]:
# check init_tensors
init_tensors

[array([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        ...,
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], dtype=float32),
 array([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        ...,
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], dtype=float32),
 array([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        ...,
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], dtype=float32),
 array([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        ...,
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], dtype=float32),
 array([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        ...,
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], dtype=float32),
 array([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1

Sixly, define training loop for TensorFlow and SOK

In [10]:
def test_tf_demo(args, init_tensors, *random_samples):
    dataset = utils.tf_dataset(*random_samples, batchsize=args["global_batch_size"], to_sparse_tensor=True, repeat=1)

    loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)

    tf_demo = TfDemo(init_tensors, args["combiner"], args["global_batch_size"], 
                     args["slot_num"], args["embedding_vec_size"])

    optimizer = utils.get_dense_optimizer(args["optimizer"])(learning_rate=0.1)

    @tf.function
    def _train_step(inputs, labels):
        with tf.GradientTape() as tape:
            logit, embedding_vector = tf_demo(inputs, training=True)
            loss = loss_fn(labels, logit)
        grads = tape.gradient(loss, tf_demo.trainable_variables)
        optimizer.apply_gradients(zip(grads, tf_demo.trainable_variables))
        return logit, embedding_vector

    tf_results = list()

    for i, (sparse_tensors, labels) in enumerate(dataset):
        print("-"*30, str(i), "-"*30)
        logit, embedding_vector = _train_step(sparse_tensors, labels)
        print("[INFO]: embedding_vector:\n", embedding_vector)
        tf_results.append(embedding_vector)

        # FIXME: because plugin sleepd, here is only used for 
        # simulate the same DNN structure. 
        import time
        time.sleep(0.2) # seconds

    return tf_results

In [11]:
def test_sok_demo(args, init_tensors, *random_samples):
    strategy = tf.distribute.MirroredStrategy()
    with strategy.scope():
        result = sok.Init(global_batch_size=args["global_batch_size"])

        plugin_demo = SOKDemo(combiner=args["combiner"], 
                                 max_vocabulary_size_per_gpu=args["max_vocabulary_size_per_gpu"],
                                 slot_num=args["slot_num"], max_nnz=args["max_nnz"],
                                 embedding_vec_size=args["embedding_vec_size"])

        emb_opt = utils.get_embedding_optimizer(args["optimizer"])(learning_rate=0.1)
        dense_opt = utils.get_dense_optimizer(args["optimizer"])(learning_rate=0.1)

    plugin_saver = sok.Saver()

    plugin_saver.load_embedding_values(plugin_demo.embedding_layer.embedding_variable, init_tensors)

    loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
    def _replica_loss(labels, logits):
        loss = loss_fn(labels, logits)
        return tf.nn.compute_average_loss(loss, global_batch_size=args["global_batch_size"])

    @tf.function
    def _train_step(inputs, labels):
        with tf.GradientTape() as tape:
            logit, embedding_vector = plugin_demo(inputs, training=True)
            loss = _replica_loss(labels, logit)
        embedding_variables, other_variable = sok.split_embedding_variable_from_others(plugin_demo.trainable_variables)
        grads, emb_grads = tape.gradient(loss, [other_variable, embedding_variables])
        if 'plugin' not in args["optimizer"]:
            with sok.OptimizerScope(embedding_variables):
                emb_opt.apply_gradients(zip(emb_grads, embedding_variables),
                                        experimental_aggregate_gradients=False)
        else:
            emb_opt.apply_gradients(zip(emb_grads, embedding_variables),
                                    experimental_aggregate_gradients=False)
        dense_opt.apply_gradients(zip(grads, other_variable))
        return logit, embedding_vector

    sok_results = list()

    def _dataset_fn(input_context):
        replica_batch_size = input_context.get_per_replica_batch_size(args["global_batch_size"])
        dataset = utils.tf_dataset(*random_samples, batchsize=replica_batch_size, to_sparse_tensor=True, repeat=1)
        dataset = dataset.shard(input_context.num_input_pipelines, input_context.input_pipeline_id)
        return dataset

    dataset = strategy.distribute_datasets_from_function(_dataset_fn)
    
    for i, (sparse_tensors, replica_labels) in enumerate(dataset):
        print("-" * 30, "step ", str(i), "-" * 30)
        logit, embedding_vector = strategy.run(_train_step, args=(sparse_tensors, replica_labels))
        print("[INFO]: embedding_vector\n", embedding_vector)
        sok_results.append(embedding_vector)

        # FIXME: when the forward computation is too fast, there
        # may exist some conficts with datareader, which cause the program hang.
        import time
        time.sleep(0.2) # seconds

    return sok_results

Sevenly, start training process. Compare whether the embedding vectors obtained from TensorFlow and SOK are consistent in all iterations.

In [12]:
# train TensorFlow Demo Model, this command will print each iteration's embedding vector
tf_results = test_tf_demo(args, init_tensors, *random_samples)

2021-12-07 04:48:26.388375: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-12-07 04:48:26.388450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13631 MB memory:  -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:06:00.0, compute capability: 7.0
2021-12-07 04:48:26.390576: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-12-07 04:48:26.390609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14631 MB memory:  -> device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:07:00.0, compute capability: 7.0
2021-12-07 04:48:26.392713: W tensorflow/cor

------------------------------ 0 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]], shape=(65536, 40), dtype=float32)
------------------------------ 1 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[0.9005389  0.9005389  0.9005389  ... 0.9004773  0.9004773  0.9004773 ]
 [0.9004776  0.9004776  0.9004776  ... 0.90067196 0.90067196 0.90067196]
 [0.90046406 0.90046406 0.90046406 ... 0.9004576  0.9004576  0.9004576 ]
 ...
 [0.9005024  0.9005024  0.9005024  ... 0.9005311  0.9005311  0.9005311 ]
 [0.9005841  0.9005841  0.9005841  ... 0.90050334 0.90050334 0.90050334]
 [0.90044    0.90044    0.90044    ... 0.90049696 0.90049696 0.90049696]], shape=(65536, 40), dtype=float32)
------------------------------ 2 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[0.8012184  0.8012184  

------------------------------ 15 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.15322745 -0.15322745 -0.15322745 ... -0.17870769 -0.17870769
  -0.17870769]
 [-0.15844432 -0.15844432 -0.15844432 ... -0.17112882 -0.17112882
  -0.17112882]
 [-0.14272878 -0.14272878 -0.14272878 ... -0.18848768 -0.18848768
  -0.18848768]
 ...
 [-0.17903242 -0.17903242 -0.17903242 ... -0.21368095 -0.21368095
  -0.21368095]
 [-0.18603516 -0.18603516 -0.18603516 ... -0.18102147 -0.18102147
  -0.18102147]
 [-0.16306126 -0.16306126 -0.16306126 ... -0.18238421 -0.18238421
  -0.18238421]], shape=(65536, 40), dtype=float32)
------------------------------ 16 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.22411308 -0.22411308 -0.22411308 ... -0.19928825 -0.19928825
  -0.19928825]
 [-0.20470376 -0.20470376 -0.20470376 ... -0.19559422 -0.19559422
  -0.19559422]
 [-0.19532926 -0.19532926 -0.19532926 ... -0.22929913 -0.22929913
  -0.22929913]
 ...
 [-0.21242538 -0.21

------------------------------ 29 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.15862423 -0.15862423 -0.15862423 ... -0.17314598 -0.17314598
  -0.17314598]
 [-0.1752002  -0.1752002  -0.1752002  ... -0.17943908 -0.17943908
  -0.17943908]
 [-0.20004535 -0.20004535 -0.20004535 ... -0.15819792 -0.15819792
  -0.15819792]
 ...
 [-0.17449039 -0.17449039 -0.17449039 ... -0.17067263 -0.17067263
  -0.17067263]
 [-0.19266994 -0.19266994 -0.19266994 ... -0.17156011 -0.17156011
  -0.17156011]
 [-0.20928861 -0.20928861 -0.20928861 ... -0.13899939 -0.13899939
  -0.13899939]], shape=(65536, 40), dtype=float32)
------------------------------ 30 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.17543076 -0.17543076 -0.17543076 ... -0.1694403  -0.1694403
  -0.1694403 ]
 [-0.17624065 -0.17624065 -0.17624065 ... -0.19092642 -0.19092642
  -0.19092642]
 [-0.17132854 -0.17132854 -0.17132854 ... -0.15731235 -0.15731235
  -0.15731235]
 ...
 [-0.15872258 -0.158

------------------------------ 43 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.26315174 -0.26315174 -0.26315174 ... -0.20632924 -0.20632924
  -0.20632924]
 [-0.19329208 -0.19329208 -0.19329208 ... -0.20625876 -0.20625876
  -0.20625876]
 [-0.23012948 -0.23012948 -0.23012948 ... -0.24424371 -0.24424371
  -0.24424371]
 ...
 [-0.2262593  -0.2262593  -0.2262593  ... -0.22234231 -0.22234231
  -0.22234231]
 [-0.20353846 -0.20353846 -0.20353846 ... -0.20778775 -0.20778775
  -0.20778775]
 [-0.19702505 -0.19702505 -0.19702505 ... -0.21742004 -0.21742004
  -0.21742004]], shape=(65536, 40), dtype=float32)
------------------------------ 44 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.20941475 -0.20941475 -0.20941475 ... -0.22879735 -0.22879735
  -0.22879735]
 [-0.21896069 -0.21896069 -0.21896069 ... -0.23172814 -0.23172814
  -0.23172814]
 [-0.21667118 -0.21667118 -0.21667118 ... -0.21302357 -0.21302357
  -0.21302357]
 ...
 [-0.20077366 -0.20

In [13]:
# train SOK Demo Model, this command will print each iteration's embedding vector 
sok_results = test_sok_demo(args, init_tensors, *random_samples)

INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3', '/job:localhost/replica:0/task:0/device:GPU:4', '/job:localhost/replica:0/task:0/device:GPU:5', '/job:localhost/replica:0/task:0/device:GPU:6', '/job:localhost/replica:0/task:0/device:GPU:7')
You are using the plugin with MirroredStrategy.
2021-12-07 04:48:48.852528: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:99] Mapping from local_replica_id to device_id:
2021-12-07 04:48:48.852528: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:101] 0 -> 0
2021-12-07 04:48:48.852528: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:101] 1 -> 1
2021-12-07 04:48:48.852528: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:101] 2 -> 2
2021-12-07 04:48:48.852528: I sparse_operation_

------------------------------ step  0 ------------------------------
INFO:tensorflow:batch_all_reduce: 2 all-reduces with algorithm = nccl, num_packs = 1
INFO:tensorflow:batch_all_reduce: 2 all-reduces with algorithm = nccl, num_packs = 1
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]], shape=(8192, 40), dtype=float32),
  2: tf.Tensor(
[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]], shape=(8192, 40), dtype=float32),
  3: tf.Tensor(
[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1.

------------------------------ step  3 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[0.70422626 0.70422626 0.70422626 ... 0.7052028  0.7052028  0.7052028 ]
 [0.7036482  0.7036482  0.7036482  ... 0.7032463  0.7032463  0.7032463 ]
 [0.701295   0.701295   0.701295   ... 0.7022344  0.7022344  0.7022344 ]
 ...
 [0.7025738  0.7025738  0.7025738  ... 0.7037034  0.7037034  0.7037034 ]
 [0.7022596  0.7022596  0.7022596  ... 0.7078545  0.7078545  0.7078545 ]
 [0.70445216 0.70445216 0.70445216 ... 0.704602   0.704602   0.704602  ]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[0.7023215  0.7023215  0.7023215  ... 0.70324075 0.70324075 0.70324075]
 [0.702116   0.702116   0.702116   ... 0.7016717  0.7016717  0.7016717 ]
 [0.7035055  0.7035055  0.7035055  ... 0.7026157  0.7026157  0.7026157 ]
 ...
 [0.7045913  0.7045913  0.7045913  ... 0.70173776 0.70173776 0.70173776]
 [0.702466   0.702466   0.702466   ... 0.7023295  0.7023295  0.7023295 ]
 [0.702456

------------------------------ step  6 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[0.42153543 0.42153543 0.42153543 ... 0.42519388 0.42519388 0.42519388]
 [0.4170929  0.4170929  0.4170929  ... 0.41547838 0.41547838 0.41547838]
 [0.4118647  0.4118647  0.4118647  ... 0.4137937  0.4137937  0.4137937 ]
 ...
 [0.41602445 0.41602445 0.41602445 ... 0.4223327  0.4223327  0.4223327 ]
 [0.41451913 0.41451913 0.41451913 ... 0.4199844  0.4199844  0.4199844 ]
 [0.41644222 0.41644222 0.41644222 ... 0.41922688 0.41922688 0.41922688]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[0.4137619  0.4137619  0.4137619  ... 0.41629443 0.41629443 0.41629443]
 [0.4158877  0.4158877  0.4158877  ... 0.42044166 0.42044166 0.42044166]
 [0.4148597  0.4148597  0.4148597  ... 0.41802114 0.41802114 0.41802114]
 ...
 [0.41918486 0.41918486 0.41918486 ... 0.41373587 0.41373587 0.41373587]
 [0.41620752 0.41620752 0.41620752 ... 0.420725   0.420725   0.420725  ]
 [0.422563

------------------------------ step  9 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[0.16157222 0.16157222 0.16157222 ... 0.16512923 0.16512923 0.16512923]
 [0.17077649 0.17077649 0.17077649 ... 0.16518483 0.16518483 0.16518483]
 [0.16769904 0.16769904 0.16769904 ... 0.1654256  0.1654256  0.1654256 ]
 ...
 [0.16069056 0.16069056 0.16069056 ... 0.16277106 0.16277106 0.16277106]
 [0.16398567 0.16398567 0.16398567 ... 0.17398588 0.17398588 0.17398588]
 [0.15237144 0.15237144 0.15237144 ... 0.16489398 0.16489398 0.16489398]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[0.18537274 0.18537274 0.18537274 ... 0.17068884 0.17068884 0.17068884]
 [0.16184442 0.16184442 0.16184442 ... 0.16080418 0.16080418 0.16080418]
 [0.16586392 0.16586392 0.16586392 ... 0.1633567  0.1633567  0.1633567 ]
 ...
 [0.14759749 0.14759749 0.14759749 ... 0.1542436  0.1542436  0.1542436 ]
 [0.16890895 0.16890895 0.16890895 ... 0.15605208 0.15605208 0.15605208]
 [0.155914

------------------------------ step  12 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.02505599 -0.02505599 -0.02505599 ... -0.02554284 -0.02554284
  -0.02554284]
 [-0.02333529 -0.02333529 -0.02333529 ... -0.03851294 -0.03851294
  -0.03851294]
 [-0.03780506 -0.03780506 -0.03780506 ... -0.03587885 -0.03587885
  -0.03587885]
 ...
 [-0.02959952 -0.02959952 -0.02959952 ... -0.02874575 -0.02874575
  -0.02874575]
 [-0.03088887 -0.03088887 -0.03088887 ... -0.02098713 -0.02098713
  -0.02098713]
 [-0.03072194 -0.03072194 -0.03072194 ... -0.02162267 -0.02162267
  -0.02162267]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.03369994 -0.03369994 -0.03369994 ... -0.02528584 -0.02528584
  -0.02528584]
 [-0.03591504 -0.03591504 -0.03591504 ... -0.00925754 -0.00925754
  -0.00925754]
 [-0.02865777 -0.02865777 -0.02865777 ... -0.03381521 -0.03381521
  -0.03381521]
 ...
 [-0.03260084 -0.03260084 -0.03260084 ... -0.0396219  -0.0396219
  -0.0396219 ]
 [-

------------------------------ step  14 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.13518184 -0.13518184 -0.13518184 ... -0.12721187 -0.12721187
  -0.12721187]
 [-0.11077808 -0.11077808 -0.11077808 ... -0.12843534 -0.12843534
  -0.12843534]
 [-0.10311799 -0.10311799 -0.10311799 ... -0.13690932 -0.13690932
  -0.13690932]
 ...
 [-0.13113362 -0.13113362 -0.13113362 ... -0.13058433 -0.13058433
  -0.13058433]
 [-0.13294964 -0.13294964 -0.13294964 ... -0.14016397 -0.14016397
  -0.14016397]
 [-0.12572125 -0.12572125 -0.12572125 ... -0.13133661 -0.13133661
  -0.13133661]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.1087272  -0.1087272  -0.1087272  ... -0.14244205 -0.14244205
  -0.14244205]
 [-0.13119288 -0.13119288 -0.13119288 ... -0.13623011 -0.13623011
  -0.13623011]
 [-0.1257813  -0.1257813  -0.1257813  ... -0.1191218  -0.1191218
  -0.1191218 ]
 ...
 [-0.1338479  -0.1338479  -0.1338479  ... -0.15440923 -0.15440923
  -0.15440923]
 [-

------------------------------ step  16 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.22410838 -0.22410838 -0.22410838 ... -0.19928336 -0.19928336
  -0.19928336]
 [-0.20469892 -0.20469892 -0.20469892 ... -0.19558935 -0.19558935
  -0.19558935]
 [-0.19532439 -0.19532439 -0.19532439 ... -0.22929418 -0.22929418
  -0.22929418]
 ...
 [-0.2157424  -0.2157424  -0.2157424  ... -0.19569314 -0.19569314
  -0.19569314]
 [-0.21398807 -0.21398807 -0.21398807 ... -0.2135714  -0.2135714
  -0.2135714 ]
 [-0.19648087 -0.19648087 -0.19648087 ... -0.23419078 -0.23419078
  -0.23419078]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.20335314 -0.20335314 -0.20335314 ... -0.21006052 -0.21006052
  -0.21006052]
 [-0.21110621 -0.21110621 -0.21110621 ... -0.24195579 -0.24195579
  -0.24195579]
 [-0.21984792 -0.21984792 -0.21984792 ... -0.21060213 -0.21060213
  -0.21060213]
 ...
 [-0.21619621 -0.21619621 -0.21619621 ... -0.21057378 -0.21057378
  -0.21057378]
 [-

------------------------------ step  18 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.28225255 -0.28225255 -0.28225255 ... -0.2652623  -0.2652623
  -0.2652623 ]
 [-0.29595184 -0.29595184 -0.29595184 ... -0.24435014 -0.24435014
  -0.24435014]
 [-0.2778905  -0.2778905  -0.2778905  ... -0.28574687 -0.28574687
  -0.28574687]
 ...
 [-0.24807943 -0.24807943 -0.24807943 ... -0.27287102 -0.27287102
  -0.27287102]
 [-0.26325783 -0.26325783 -0.26325783 ... -0.28209847 -0.28209847
  -0.28209847]
 [-0.28699368 -0.28699368 -0.28699368 ... -0.26445645 -0.26445645
  -0.26445645]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.25206006 -0.25206006 -0.25206006 ... -0.27569014 -0.27569014
  -0.27569014]
 [-0.2672004  -0.2672004  -0.2672004  ... -0.23190486 -0.23190486
  -0.23190486]
 [-0.25691646 -0.25691646 -0.25691646 ... -0.26783794 -0.26783794
  -0.26783794]
 ...
 [-0.26268327 -0.26268327 -0.26268327 ... -0.27244312 -0.27244312
  -0.27244312]
 [-

------------------------------ step  20 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.30255    -0.30255    -0.30255    ... -0.28497857 -0.28497857
  -0.28497857]
 [-0.29824895 -0.29824895 -0.29824895 ... -0.3098632  -0.3098632
  -0.3098632 ]
 [-0.29218602 -0.29218602 -0.29218602 ... -0.30056703 -0.30056703
  -0.30056703]
 ...
 [-0.29464787 -0.29464787 -0.29464787 ... -0.2930225  -0.2930225
  -0.2930225 ]
 [-0.31403974 -0.31403974 -0.31403974 ... -0.2974615  -0.2974615
  -0.2974615 ]
 [-0.31429642 -0.31429642 -0.31429642 ... -0.3054897  -0.3054897
  -0.3054897 ]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.35771415 -0.35771415 -0.35771415 ... -0.31378573 -0.31378573
  -0.31378573]
 [-0.3228782  -0.3228782  -0.3228782  ... -0.31570673 -0.31570673
  -0.31570673]
 [-0.2947835  -0.2947835  -0.2947835  ... -0.28730863 -0.28730863
  -0.28730863]
 ...
 [-0.30530882 -0.30530882 -0.30530882 ... -0.29983908 -0.29983908
  -0.29983908]
 [-0.2

------------------------------ step  22 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.3002702  -0.3002702  -0.3002702  ... -0.32113218 -0.32113218
  -0.32113218]
 [-0.31445616 -0.31445616 -0.31445616 ... -0.31821615 -0.31821615
  -0.31821615]
 [-0.28149    -0.28149    -0.28149    ... -0.3089064  -0.3089064
  -0.3089064 ]
 ...
 [-0.2978853  -0.2978853  -0.2978853  ... -0.28884187 -0.28884187
  -0.28884187]
 [-0.3159441  -0.3159441  -0.3159441  ... -0.299038   -0.299038
  -0.299038  ]
 [-0.31383234 -0.31383234 -0.31383234 ... -0.28291753 -0.28291753
  -0.28291753]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.30163795 -0.30163795 -0.30163795 ... -0.30443287 -0.30443287
  -0.30443287]
 [-0.30127174 -0.30127174 -0.30127174 ... -0.29014465 -0.29014465
  -0.29014465]
 [-0.30801424 -0.30801424 -0.30801424 ... -0.3141367  -0.3141367
  -0.3141367 ]
 ...
 [-0.31994525 -0.31994525 -0.31994525 ... -0.30602917 -0.30602917
  -0.30602917]
 [-0.3

------------------------------ step  24 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.26861715 -0.26861715 -0.26861715 ... -0.2935297  -0.2935297
  -0.2935297 ]
 [-0.27740425 -0.27740425 -0.27740425 ... -0.257448   -0.257448
  -0.257448  ]
 [-0.2925169  -0.2925169  -0.2925169  ... -0.30540457 -0.30540457
  -0.30540457]
 ...
 [-0.2813642  -0.2813642  -0.2813642  ... -0.27291226 -0.27291226
  -0.27291226]
 [-0.2631308  -0.2631308  -0.2631308  ... -0.29082248 -0.29082248
  -0.29082248]
 [-0.25941348 -0.25941348 -0.25941348 ... -0.2675142  -0.2675142
  -0.2675142 ]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.27076694 -0.27076694 -0.27076694 ... -0.28881466 -0.28881466
  -0.28881466]
 [-0.26801455 -0.26801455 -0.26801455 ... -0.273818   -0.273818
  -0.273818  ]
 [-0.29190683 -0.29190683 -0.29190683 ... -0.25628245 -0.25628245
  -0.25628245]
 ...
 [-0.27664906 -0.27664906 -0.27664906 ... -0.27869886 -0.27869886
  -0.27869886]
 [-0.286

------------------------------ step  26 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.21730551 -0.21730551 -0.21730551 ... -0.23798622 -0.23798622
  -0.23798622]
 [-0.19460799 -0.19460799 -0.19460799 ... -0.24834624 -0.24834624
  -0.24834624]
 [-0.2510563  -0.2510563  -0.2510563  ... -0.22299616 -0.22299616
  -0.22299616]
 ...
 [-0.26554498 -0.26554498 -0.26554498 ... -0.21597014 -0.21597014
  -0.21597014]
 [-0.24453472 -0.24453472 -0.24453472 ... -0.25932938 -0.25932938
  -0.25932938]
 [-0.2257804  -0.2257804  -0.2257804  ... -0.24375796 -0.24375796
  -0.24375796]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.28554806 -0.28554806 -0.28554806 ... -0.24119747 -0.24119747
  -0.24119747]
 [-0.22624932 -0.22624932 -0.22624932 ... -0.21133211 -0.21133211
  -0.21133211]
 [-0.23551771 -0.23551771 -0.23551771 ... -0.22453736 -0.22453736
  -0.22453736]
 ...
 [-0.235907   -0.235907   -0.235907   ... -0.24049407 -0.24049407
  -0.24049407]
 [

------------------------------ step  28 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.1896804  -0.1896804  -0.1896804  ... -0.17670514 -0.17670514
  -0.17670514]
 [-0.21344793 -0.21344793 -0.21344793 ... -0.1897239  -0.1897239
  -0.1897239 ]
 [-0.19733383 -0.19733383 -0.19733383 ... -0.17628865 -0.17628865
  -0.17628865]
 ...
 [-0.199254   -0.199254   -0.199254   ... -0.19903034 -0.19903034
  -0.19903034]
 [-0.16208838 -0.16208838 -0.16208838 ... -0.19566785 -0.19566785
  -0.19566785]
 [-0.18742242 -0.18742242 -0.18742242 ... -0.18132702 -0.18132702
  -0.18132702]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.18553065 -0.18553065 -0.18553065 ... -0.20956054 -0.20956054
  -0.20956054]
 [-0.20415564 -0.20415564 -0.20415564 ... -0.20312211 -0.20312211
  -0.20312211]
 [-0.2039993  -0.2039993  -0.2039993  ... -0.18802145 -0.18802145
  -0.18802145]
 ...
 [-0.17797412 -0.17797412 -0.17797412 ... -0.19347072 -0.19347072
  -0.19347072]
 [-

------------------------------ step  30 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.17542297 -0.17542297 -0.17542297 ... -0.16943279 -0.16943279
  -0.16943279]
 [-0.17623335 -0.17623335 -0.17623335 ... -0.19091883 -0.19091883
  -0.19091883]
 [-0.17132157 -0.17132157 -0.17132157 ... -0.15730482 -0.15730482
  -0.15730482]
 ...
 [-0.14190184 -0.14190184 -0.14190184 ... -0.15729855 -0.15729855
  -0.15729855]
 [-0.17775045 -0.17775045 -0.17775045 ... -0.15107052 -0.15107052
  -0.15107052]
 [-0.18589711 -0.18589711 -0.18589711 ... -0.19471425 -0.19471425
  -0.19471425]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.17151028 -0.17151028 -0.17151028 ... -0.14649442 -0.14649442
  -0.14649442]
 [-0.16956015 -0.16956015 -0.16956015 ... -0.17339367 -0.17339367
  -0.17339367]
 [-0.18287492 -0.18287492 -0.18287492 ... -0.1673139  -0.1673139
  -0.1673139 ]
 ...
 [-0.20457551 -0.20457551 -0.20457551 ... -0.14892933 -0.14892933
  -0.14892933]
 [-

------------------------------ step  32 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.1515274  -0.1515274  -0.1515274  ... -0.14542866 -0.14542866
  -0.14542866]
 [-0.11194907 -0.11194907 -0.11194907 ... -0.15331486 -0.15331486
  -0.15331486]
 [-0.13846326 -0.13846326 -0.13846326 ... -0.15362307 -0.15362307
  -0.15362307]
 ...
 [-0.16697419 -0.16697419 -0.16697419 ... -0.13752745 -0.13752745
  -0.13752745]
 [-0.14540687 -0.14540687 -0.14540687 ... -0.15984985 -0.15984985
  -0.15984985]
 [-0.14668098 -0.14668098 -0.14668098 ... -0.12983885 -0.12983885
  -0.12983885]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.12333038 -0.12333038 -0.12333038 ... -0.17307444 -0.17307444
  -0.17307444]
 [-0.1086438  -0.1086438  -0.1086438  ... -0.13135241 -0.13135241
  -0.13135241]
 [-0.16049421 -0.16049421 -0.16049421 ... -0.15183878 -0.15183878
  -0.15183878]
 ...
 [-0.13899213 -0.13899213 -0.13899213 ... -0.1536586  -0.1536586
  -0.1536586 ]
 [-

------------------------------ step  34 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.13053373 -0.13053373 -0.13053373 ... -0.16829923 -0.16829923
  -0.16829923]
 [-0.12267791 -0.12267791 -0.12267791 ... -0.13874787 -0.13874787
  -0.13874787]
 [-0.15151384 -0.15151384 -0.15151384 ... -0.09777969 -0.09777969
  -0.09777969]
 ...
 [-0.1267448  -0.1267448  -0.1267448  ... -0.10234334 -0.10234334
  -0.10234334]
 [-0.16809106 -0.16809106 -0.16809106 ... -0.16737422 -0.16737422
  -0.16737422]
 [-0.14517534 -0.14517534 -0.14517534 ... -0.1307665  -0.1307665
  -0.1307665 ]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.08599437 -0.08599437 -0.08599437 ... -0.1546773  -0.1546773
  -0.1546773 ]
 [-0.13887423 -0.13887423 -0.13887423 ... -0.20112933 -0.20112933
  -0.20112933]
 [-0.16667657 -0.16667657 -0.16667657 ... -0.1563475  -0.1563475
  -0.1563475 ]
 ...
 [-0.09819798 -0.09819798 -0.09819798 ... -0.14136678 -0.14136678
  -0.14136678]
 [-0.

------------------------------ step  36 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.06405328 -0.06405328 -0.06405328 ... -0.13683882 -0.13683882
  -0.13683882]
 [-0.12286855 -0.12286855 -0.12286855 ... -0.10786526 -0.10786526
  -0.10786526]
 [-0.14316519 -0.14316519 -0.14316519 ... -0.12777558 -0.12777558
  -0.12777558]
 ...
 [-0.1289829  -0.1289829  -0.1289829  ... -0.12792471 -0.12792471
  -0.12792471]
 [-0.15074895 -0.15074895 -0.15074895 ... -0.1170946  -0.1170946
  -0.1170946 ]
 [-0.1457327  -0.1457327  -0.1457327  ... -0.13504033 -0.13504033
  -0.13504033]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.11454844 -0.11454844 -0.11454844 ... -0.14470962 -0.14470962
  -0.14470962]
 [-0.08909258 -0.08909258 -0.08909258 ... -0.17917094 -0.17917094
  -0.17917094]
 [-0.13585643 -0.13585643 -0.13585643 ... -0.16493043 -0.16493043
  -0.16493043]
 ...
 [-0.16052023 -0.16052023 -0.16052023 ... -0.12015436 -0.12015436
  -0.12015436]
 [-

------------------------------ step  38 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.10463198 -0.10463198 -0.10463198 ... -0.16859077 -0.16859077
  -0.16859077]
 [-0.18326606 -0.18326606 -0.18326606 ... -0.153182   -0.153182
  -0.153182  ]
 [-0.13773112 -0.13773112 -0.13773112 ... -0.15307528 -0.15307528
  -0.15307528]
 ...
 [-0.20615347 -0.20615347 -0.20615347 ... -0.172165   -0.172165
  -0.172165  ]
 [-0.17421284 -0.17421284 -0.17421284 ... -0.17899726 -0.17899726
  -0.17899726]
 [-0.12812185 -0.12812185 -0.12812185 ... -0.13657543 -0.13657543
  -0.13657543]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.17891292 -0.17891292 -0.17891292 ... -0.14485744 -0.14485744
  -0.14485744]
 [-0.15008801 -0.15008801 -0.15008801 ... -0.17702065 -0.17702065
  -0.17702065]
 [-0.1377445  -0.1377445  -0.1377445  ... -0.15741232 -0.15741232
  -0.15741232]
 ...
 [-0.13904417 -0.13904417 -0.13904417 ... -0.15257436 -0.15257436
  -0.15257436]
 [-0.1

------------------------------ step  40 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.16127078 -0.16127078 -0.16127078 ... -0.18304515 -0.18304515
  -0.18304515]
 [-0.1922871  -0.1922871  -0.1922871  ... -0.14632958 -0.14632958
  -0.14632958]
 [-0.1974465  -0.1974465  -0.1974465  ... -0.15940674 -0.15940674
  -0.15940674]
 ...
 [-0.17839476 -0.17839476 -0.17839476 ... -0.23484176 -0.23484176
  -0.23484176]
 [-0.1537202  -0.1537202  -0.1537202  ... -0.17789915 -0.17789915
  -0.17789915]
 [-0.16126381 -0.16126381 -0.16126381 ... -0.19991472 -0.19991472
  -0.19991472]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.15530169 -0.15530169 -0.15530169 ... -0.15739058 -0.15739058
  -0.15739058]
 [-0.18126467 -0.18126467 -0.18126467 ... -0.1831001  -0.1831001
  -0.1831001 ]
 [-0.16429624 -0.16429624 -0.16429624 ... -0.1760344  -0.1760344
  -0.1760344 ]
 ...
 [-0.15476167 -0.15476167 -0.15476167 ... -0.17273518 -0.17273518
  -0.17273518]
 [-0

------------------------------ step  42 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.20266193 -0.20266193 -0.20266193 ... -0.17308879 -0.17308879
  -0.17308879]
 [-0.16007192 -0.16007192 -0.16007192 ... -0.20874861 -0.20874861
  -0.20874861]
 [-0.22338642 -0.22338642 -0.22338642 ... -0.19393001 -0.19393001
  -0.19393001]
 ...
 [-0.18364424 -0.18364424 -0.18364424 ... -0.21327041 -0.21327041
  -0.21327041]
 [-0.19914481 -0.19914481 -0.19914481 ... -0.19567479 -0.19567479
  -0.19567479]
 [-0.26409876 -0.26409876 -0.26409876 ... -0.20579208 -0.20579208
  -0.20579208]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.20218132 -0.20218132 -0.20218132 ... -0.21695873 -0.21695873
  -0.21695873]
 [-0.20778793 -0.20778793 -0.20778793 ... -0.20841631 -0.20841631
  -0.20841631]
 [-0.18760791 -0.18760791 -0.18760791 ... -0.2056439  -0.2056439
  -0.2056439 ]
 ...
 [-0.23459375 -0.23459375 -0.23459375 ... -0.19046655 -0.19046655
  -0.19046655]
 [-

------------------------------ step  44 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.20941171 -0.20941171 -0.20941171 ... -0.22879487 -0.22879487
  -0.22879487]
 [-0.21895784 -0.21895784 -0.21895784 ... -0.23172522 -0.23172522
  -0.23172522]
 [-0.21666765 -0.21666765 -0.21666765 ... -0.21301958 -0.21301958
  -0.21301958]
 ...
 [-0.21469995 -0.21469995 -0.21469995 ... -0.21231383 -0.21231383
  -0.21231383]
 [-0.21488464 -0.21488464 -0.21488464 ... -0.12033444 -0.12033444
  -0.12033444]
 [-0.20021415 -0.20021415 -0.20021415 ... -0.22670025 -0.22670025
  -0.22670025]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.23484477 -0.23484477 -0.23484477 ... -0.18834177 -0.18834177
  -0.18834177]
 [-0.21303031 -0.21303031 -0.21303031 ... -0.22244355 -0.22244355
  -0.22244355]
 [-0.2199068  -0.2199068  -0.2199068  ... -0.22984855 -0.22984855
  -0.22984855]
 ...
 [-0.20269692 -0.20269692 -0.20269692 ... -0.19387221 -0.19387221
  -0.19387221]
 [

------------------------------ step  46 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.23491383 -0.23491383 -0.23491383 ... -0.24627867 -0.24627867
  -0.24627867]
 [-0.23250745 -0.23250745 -0.23250745 ... -0.21174543 -0.21174543
  -0.21174543]
 [-0.20785967 -0.20785967 -0.20785967 ... -0.2619608  -0.2619608
  -0.2619608 ]
 ...
 [-0.24293315 -0.24293315 -0.24293315 ... -0.25826    -0.25826
  -0.25826   ]
 [-0.23931171 -0.23931171 -0.23931171 ... -0.20098595 -0.20098595
  -0.20098595]
 [-0.22243682 -0.22243682 -0.22243682 ... -0.24403623 -0.24403623
  -0.24403623]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.2253752  -0.2253752  -0.2253752  ... -0.23098066 -0.23098066
  -0.23098066]
 [-0.2145882  -0.2145882  -0.2145882  ... -0.22161697 -0.22161697
  -0.22161697]
 [-0.24860863 -0.24860863 -0.24860863 ... -0.23111928 -0.23111928
  -0.23111928]
 ...
 [-0.22765952 -0.22765952 -0.22765952 ... -0.24573767 -0.24573767
  -0.24573767]
 [-0.2

------------------------------ step  48 ------------------------------
[INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[-0.24711709 -0.24711709 -0.24711709 ... -0.28307924 -0.28307924
  -0.28307924]
 [-0.2479815  -0.2479815  -0.2479815  ... -0.2596137  -0.2596137
  -0.2596137 ]
 [-0.23282254 -0.23282254 -0.23282254 ... -0.24117479 -0.24117479
  -0.24117479]
 ...
 [-0.27805972 -0.27805972 -0.27805972 ... -0.24752554 -0.24752554
  -0.24752554]
 [-0.2415568  -0.2415568  -0.2415568  ... -0.2516385  -0.2516385
  -0.2516385 ]
 [-0.2403656  -0.2403656  -0.2403656  ... -0.24829827 -0.24829827
  -0.24829827]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.24093132 -0.24093132 -0.24093132 ... -0.2616806  -0.2616806
  -0.2616806 ]
 [-0.2341744  -0.2341744  -0.2341744  ... -0.23874548 -0.23874548
  -0.23874548]
 [-0.23052762 -0.23052762 -0.23052762 ... -0.23824754 -0.23824754
  -0.23824754]
 ...
 [-0.2503054  -0.2503054  -0.2503054  ... -0.2665841  -0.2665841
  -0.2665841 ]
 [-0.2

Finally, check the consistency of the embedding vectors obtained from TensorFlow ans SOK.

In [14]:
if (len(sok_results) != len(tf_results)):
    raise ValueError("The length of sok results is not equal to that of TensorFlow.")
if (len(tf_results) != args["iter_num"]):
    raise ValueError("The length of embedding vectors: %d is not equal to iteration number: %d."
                    %(len(tf_results), args["iter_num"]))
    
for i, sok_vector in enumerate(sok_results):
    if args["gpu_num"] != 1:
        sok_vector = tf.stack(sok_vector.values, axis=0)
    tf.debugging.assert_near(tf.reshape(sok_vector,
                                        shape=[-1, tf.shape(sok_vector)[-1]]),
                             tf_results[i],
                             atol=1e-4,
                             rtol=1e-4,
                             message="The embedding vectors obtained from TF and SOK vary in iteration: %d" %i)
    
# if no exception, then the embedding vectors for all iterations are consistent.
print(("\n[INFO]: With MirroredStrategy, when %d GPUs are used, the embedding vectors obtained from TensorFlow" 
       "and SOK are consistent for %d iterations") 
      %(args["gpu_num"], args["iter_num"]))


[INFO]: With MirroredStrategy, when 8 GPUs are used, the embedding vectors obtained from TensorFlowand SOK are consistent for 50 iterations


### Multi-node, Multi-GPUs synchronized training ###

**The jupyter notebook kernel must be restarted!!**

Firstly, specify hyper parameters

In [1]:
%reset -f

args = dict()

args["iter_num"] = 50                             # the number of training iteration
args["max_vocabulary_size_per_gpu"] = 1024
args["slot_num"] = 10                             # the number of feature fields in this embedding layer
args["max_nnz"] = 4                               # the maximum number of valid features in each slot
args["embedding_vec_size"] = 4                    # the dimension of embedding vectors
args["combiner"] = "mean"                         # the reduction combiner used intra slots, it can be [mean, sum]
args["global_batch_size"] = 65536                 # the globally batchsize for all GPUs
args["optimizer"] = "plugin_adam"                 # the optimizer used for training, it can be [plugin_adam, adam, sgd]
args["ips"] = ["localhost", "localhost"]          # specify the ip addr of each node. Here we use different GPUs to 
                                                  # simulate multi-node with single-node    
args["worker_num"] = len(args["ips"])             # the number of workers in synchronized training

In [2]:
import sys, os, json
import sparse_operation_kit as sok
import tensorflow as tf
import numpy as np
# import utility python script
sys.path.append("../unit_test/test_scripts/tf2/")
import utils

[INFO]: sparse_operation_kit is imported


In [3]:
total_gpu_num = utils.get_local_gpu_count()
print("[INFO]: There are %d GPUs in total" %total_gpu_num)
if (total_gpu_num % args["worker_num"] != 0):
    raise RuntimeError("total_gpu_num:%d is not divisible by workers_num: %d" %(total_gpu_num, args["worker_num"]))
    
per_worker_gpu_num = total_gpu_num // args["worker_num"]
args["local_gpu_num"] = per_worker_gpu_num # the number of avaiable GPUs in each process

[INFO]: There are 8 GPUs in total


Secondly, define DNN model using Tensorflow and SOK.

In [4]:
class TfDemo(tf.keras.models.Model):
    def __init__(self, 
                 init_tensors, 
                 combiner, 
                 global_batch_size,
                 slot_num, 
                 embedding_vec_size,
                 **kwargs):
        super(TfDemo, self).__init__(**kwargs)
        self.combiner = combiner
        self.global_batch_size = global_batch_size
        self.slot_num = slot_num
        self.embedding_vec_size = embedding_vec_size

        self.init_tensors = init_tensors
        self.params = tf.Variable(initial_value=tf.concat(self.init_tensors, axis=0))

        self.dense_layer = tf.keras.layers.Dense(units=1, activation=None,
                                                 kernel_initializer="ones",
                                                 bias_initializer="zeros")

    def call(self, inputs, training=True):
        # [batchsize * slot_num, embedding_vec_size]
        embedding_vector = tf.nn.embedding_lookup_sparse(params=self.params, sp_ids=inputs,
                                                        sp_weights=None, combiner=self.combiner)

        # [batchsize, slot_num * embedding_vec_size]
        embedding_vector = tf.reshape(embedding_vector, shape=[self.global_batch_size, self.slot_num * self.embedding_vec_size])
        logit = self.dense_layer(embedding_vector)
        return logit, embedding_vector

In [5]:
class SOKDemo(tf.keras.models.Model):
    def __init__(self,
                 combiner,
                 max_vocabulary_size_per_gpu,
                 slot_num,
                 max_nnz,
                 embedding_vec_size, 
                 **kwargs):
        super(SOKDemo, self).__init__(**kwargs)

        self.combiner = combiner
        self.max_vocabulary_size_per_gpu = max_vocabulary_size_per_gpu
        self.slot_num = slot_num
        self.max_nnz = max_nnz
        self.embedding_vec_size = embedding_vec_size

        self.embedding_layer = sok.DistributedEmbedding(combiner=self.combiner,
                                                           max_vocabulary_size_per_gpu=self.max_vocabulary_size_per_gpu,
                                                           embedding_vec_size=self.embedding_vec_size,
                                                           slot_num=self.slot_num,
                                                           max_nnz=self.max_nnz)

        self.dense_layer = tf.keras.layers.Dense(units=1, activation=None,
                                                 kernel_initializer="ones",
                                                 bias_initializer="zeros")

    def call(self, inputs, training=True):
        # [batchsize, slot_num, embedding_vec_size]
        embedding_vector = self.embedding_layer(inputs, training=training)
        # [batchsize, slot_num * embedding_vec_size]
        embedding_vector = tf.reshape(embedding_vector, shape=[-1, self.slot_num * self.embedding_vec_size])
        # [batchsize, 1]
        logit = self.dense_layer(embedding_vector)
        return logit, embedding_vector

In [6]:
def test_tf_demo(args, init_tensors, *random_samples):
    dataset = utils.tf_dataset(*random_samples, batchsize=args["global_batch_size"], to_sparse_tensor=True, repeat=1)

    loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)

    tf_demo = TfDemo(init_tensors, args["combiner"], args["global_batch_size"], 
                     args["slot_num"], args["embedding_vec_size"])

    optimizer = utils.get_dense_optimizer(args["optimizer"])(learning_rate=0.1)

    @tf.function
    def _train_step(inputs, labels):
        with tf.GradientTape() as tape:
            logit, embedding_vector = tf_demo(inputs, training=True)
            loss = loss_fn(labels, logit)
        grads = tape.gradient(loss, tf_demo.trainable_variables)
        optimizer.apply_gradients(zip(grads, tf_demo.trainable_variables))
        return logit, embedding_vector

    tf_results = list()

    for i, (sparse_tensors, labels) in enumerate(dataset):
        print("-"*30, str(i), "-"*30)
        logit, embedding_vector = _train_step(sparse_tensors, labels)
        print("[INFO]: embedding_vector:\n", embedding_vector)
        tf_results.append(embedding_vector)

    return tf_results

Thirdly, define multi-node training loop for SOK.

In [7]:
def test_sok_demo(args, task_id, init_tensors, *random_samples):
    physical_devices = tf.config.list_physical_devices('GPU')
    print("[INFO]: physical_devices on task %d:" %task_id, physical_devices)
    
    port = 12345
    os.environ["TF_CONFIG"] = json.dumps({
        'cluster': {"worker": [args["ips"][i] + ":" + str(port + i) for i in range(args["worker_num"])] },
        'task': {"type": 'worker', "index": task_id}
    })
    strategy = tf.distribute.MultiWorkerMirroredStrategy()
    with strategy.scope():
        sok.Init(global_batch_size=args["global_batch_size"])

        sok_demo = SOKDemo(combiner=args["combiner"], 
                            max_vocabulary_size_per_gpu=args["max_vocabulary_size_per_gpu"],
                            slot_num=args["slot_num"], max_nnz=args["max_nnz"],
                            embedding_vec_size=args["embedding_vec_size"])

        emb_opt = utils.get_embedding_optimizer(args["optimizer"])(learning_rate=0.1)
        dense_opt = utils.get_dense_optimizer(args["optimizer"])(learning_rate=0.1)
    
    sok_saver = sok.Saver()
    sok_saver.load_embedding_values(sok_demo.embedding_layer.embedding_variable, init_tensors)

    loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
    def _replica_loss(labels, logits):
        loss = loss_fn(labels, logits)
        return tf.nn.compute_average_loss(loss, global_batch_size=args["global_batch_size"])

    @tf.function
    def _train_step(inputs, labels):
        with tf.GradientTape() as tape:
            logit, embedding_vector = sok_demo(inputs, training=True)
            loss = _replica_loss(labels, logit)
        embedding_variables, other_variable = sok.split_embedding_variable_from_others(sok_demo.trainable_variables)
        grads, emb_grads = tape.gradient(loss, [other_variable, embedding_variables])
        if "plugin" not in args["optimizer"]:
            with sok.OptimizerScope(embedding_variables):
                emb_opt.apply_gradients(zip(emb_grads, embedding_variables),
                                        experimental_aggregate_gradients=False)
        else:
            emb_opt.apply_gradients(zip(emb_grads, embedding_variables),
                                    experimental_aggregate_gradients=False)
        dense_opt.apply_gradients(zip(grads, other_variable))
        return logit, embedding_vector

    sok_results = list()

    def _dataset_fn(input_context):
        replica_batch_size = input_context.get_per_replica_batch_size(args["global_batch_size"])
        dataset = utils.tf_dataset(*random_samples, batchsize=replica_batch_size, to_sparse_tensor=True, repeat=1)
        # because each worker has its own data source, so that no need to shard the dataset.
        return dataset

    dataset = strategy.distribute_datasets_from_function(_dataset_fn)

    for i, (sparse_tensors, replica_labels) in enumerate(dataset):
        print("-" * 30, "step ", str(i), "-" * 30)
        logit, embedding_vector = strategy.run(_train_step, args=(sparse_tensors, replica_labels))
        print("[INFO]: embedding_vector\n", embedding_vector)
        sok_results.append(embedding_vector)

    return sok_results

Fourthly, define subprocess work function to simulate multi-node synchronized training

In [8]:
def compare_sok_with_tf(args, task_id):
    if (args["global_batch_size"] % args["local_gpu_num"] != 0):
        raise ValueError("global_batch_size: %d is not divisible by local_gpu_num: %d"
                            %(args["global_batch_size"], args["local_gpu_num"]))
    if (args["global_batch_size"] % args["worker_num"] != 0):
        raise ValueError("global_batch_size: %d is not divisible by worker_num: %d"
                            %(args["global_batch_size"], args["worker_num"]))

    # each worker generate different dataset
    worker_batch_size = args["global_batch_size"] // args["worker_num"]
    random_samples_local = utils.generate_random_samples(num_of_samples=worker_batch_size * args["iter_num"],
                                                         vocabulary_size=args["local_gpu_num"] * args["max_vocabulary_size_per_gpu"] * args["worker_num"],
                                                         slot_num=args["slot_num"],
                                                         max_nnz=args["max_nnz"])
    utils.save_to_file(r"./random_samples_" + str(task_id) + r".file", *random_samples_local)

    # each worker generate same init tensors, because each worker will do the filtering by itself.
    init_tensors = utils.get_ones_tensor(max_vocab_size_per_gpu=args["max_vocabulary_size_per_gpu"],
                                            embedding_vec_size=args["embedding_vec_size"],
                                            num=args["local_gpu_num"] * args["worker_num"])

    sok_results_local = test_sok_demo(args, task_id, init_tensors, *random_samples_local)
    # save the forward embedding vector from different worker to file
    utils.save_to_file(r"./sok_embedding_vectors_" + str(task_id) + r".file", *sok_results_local)

    # aggregate dataset from different worker
    dataset_filenames = [r"./random_samples_" + str(task_id) + r".file"
                         for task_id in range(args["worker_num"])]
    random_samples_total = [list() for _ in range(args["iter_num"])]
    random_labels_total = [list() for _ in range(args["iter_num"])]
    local_batch_size = args["global_batch_size"] // args["worker_num"]
    for work_id in range(args["worker_num"]):
        samples, labels = utils.restore_from_file(dataset_filenames[work_id])
        for i in range(args["iter_num"]):
            random_samples_total[i].extend(samples[i * local_batch_size : (i + 1) * local_batch_size])
            random_labels_total[i].extend(labels[i * local_batch_size : (i + 1) * local_batch_size])
    random_samples_total = np.concatenate(random_samples_total, axis=0)
    random_labels_total = np.concatenate(random_labels_total, axis=0)

    tf_results = test_tf_demo(args, init_tensors, random_samples_total, random_labels_total)

    # aggregate forward embedding vector from different worker
    sok_results_filenames = [r"./sok_embedding_vectors_" + str(task_id) + r".file"
                             for task_id in range(args["worker_num"])]
    sok_results_total = list()
    for file_name in sok_results_filenames:
        sok_results_local = utils.restore_from_file(file_name)
        sok_results_total.append(sok_results_local)

    if (len(sok_results_total[0]) != len(tf_results)):
        raise ValueError("The length of results obtained from sok: %d is not equal to that of tensorflow: %d."
                        %(len(sok_results_total[0]), len(tf_results)))
    if (len(tf_results) != args["iter_num"]):
        raise ValueError("The length of embedding vectors: %d is not equal to iteration number: %d."
                         %(len(tf_results), args["iter_num"]))

    # for i, sok_vector in enumerate(sok_results_total):
    for i in range(args["iter_num"]):
        if args["local_gpu_num"] != 1:
            sok_vector = tf.concat([tf.concat(sok_results_total[task_id][i].values, axis=0)
                                    for task_id in range(args["worker_num"])], axis=0)
        else:
            sok_vector = tf.concat([sok_results_total[task_id][i]
                                    for task_id in range(args["worker_num"])], axis=0)
        tf.debugging.assert_near(tf.reshape(sok_vector, 
                                            shape=[-1, tf.shape(sok_vector)[-1]]),
                                 tf_results[i],
                                 atol=1e-4,
                                 rtol=1e-4)

    print(("\n[INFO]: With MultiWorkerMirroredStrategy, when %d GPUs are used for each node and %d GPUs in total, "
           "the embedding vectors obtained from TensorFlow and SOK are consistent for %d iterations")
          %(args["local_gpu_num"], args["local_gpu_num"] * args["worker_num"], args["iter_num"]))

Fifthly, create sub CPU processes to simulate multi-node synchronized training 

In [9]:
from multiprocessing import Process

processes = list()
for task_id in range(args["worker_num"]):
    available_gpus = ",".join([str(per_worker_gpu_num * task_id + i)
                              for i in range(per_worker_gpu_num)])
    print("[INFO]: on task: %d, its avaiable GPUs are: %s" %(task_id, available_gpus))
    
    os.environ["CUDA_VISIBLE_DEVICES"] = available_gpus
    process = Process(target=compare_sok_with_tf, args=(args, task_id))
    process.start()
    processes.append(process)
    
    
for process in processes:
    if process.is_alive():
        process.join()

[INFO]: on task: 0, its avaiable GPUs are: 0,1,2,3
[INFO]: on task: 1, its avaiable GPUs are: 4,5,6,7
[INFO]: begin to generate random samples
[INFO]: begin to generate random samples
[INFO]: generated random samples
[INFO]: generated random samples
[INFO]: dumpped items to file ./random_samples_1.file
[INFO]: physical_devices on task 1: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
[INFO]: dumpped items to file ./random_samples_0.file
[INFO]: physical_devices on task 0: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
INFO:tensorflow:Enabled multi-worker col

2021-12-07 04:50:11.099010: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-12-07 04:50:11.099107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14131 MB memory:  -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:85:00.0, compute capability: 7.0
2021-12-07 04:50:11.101215: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2021-12-07 04:50:11.101268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14631 MB memory:  -> device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:86:00.0, compute capability: 7.0
2021-12-07 04:50:11.103196: W tensorflow/cor

INFO:tensorflow:Cluster is ready.
INFO:tensorflow:Cluster is ready.
INFO:tensorflow:MultiWorkerMirroredStrategy with cluster_spec = {'worker': ['localhost:12345', 'localhost:12346']}, task_type = 'worker', task_id = 1, num_workers = 2, local_devices = ('/job:worker/task:1/device:GPU:0', '/job:worker/task:1/device:GPU:1', '/job:worker/task:1/device:GPU:2', '/job:worker/task:1/device:GPU:3'), communication = CommunicationImplementation.AUTO
INFO:tensorflow:MultiWorkerMirroredStrategy with cluster_spec = {'worker': ['localhost:12345', 'localhost:12346']}, task_type = 'worker', task_id = 0, num_workers = 2, local_devices = ('/job:worker/task:0/device:GPU:0', '/job:worker/task:0/device:GPU:1', '/job:worker/task:0/device:GPU:2', '/job:worker/task:0/device:GPU:3'), communication = CommunicationImplementation.AUTO


2021-12-07 04:50:12.670152: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2021-12-07 04:50:12.713944: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: cond/branch_executed/_6
2021-12-07 04:50:13.649949: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2021-12-07 04:50:13.850567: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:907] Skipping loop optimization for Merge node with control input: replica_3/cond_1/branch_executed/_55


2021-12-07 04:50:13.852613: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:99] Mapping from local_replica_id to device_id:
2021-12-07 04:50:13.852613: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:101] 0 -> 0
2021-12-07 04:50:13.852613: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:101] 1 -> 1
2021-12-07 04:50:13.852613: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:101] 2 -> 2
2021-12-07 04:50:13.852613: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:101] 3 -> 3
2021-12-07 04:50:13.852613: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:77] Global seed is 783670919
2021-12-07 04:50:13.852613: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:78] Local GPU Count: 4
2021-12-07 04:50:13.852613: I sparse_operation_kit/kit_cc/kit_cc_infra/src/resources/manager.cc:79] Global GPU Count: 8
2021-12-07 04:50:13.852613: I sparse_operation_kit/kit_cc/ki

------------------------------------------------------------  step step   00  ------------------------------------------------------------

INFO:tensorflow:Collective all_reduce tensors: 2 all_reduces, num_devices = 4, group_size = 8, implementation = AUTO, num_packs = 1
INFO:tensorflow:Collective all_reduce tensors: 2 all_reduces, num_devices = 4, group_size = 8, implementation = AUTO, num_packs = 1
INFO:tensorflow:Collective all_reduce tensors: 2 all_reduces, num_devices = 4, group_size = 8, implementation = AUTO, num_packs = 1
INFO:tensorflow:Collective all_reduce tensors: 2 all_reduces, num_devices = 4, group_size = 8, implementation = AUTO, num_packs = 1
[INFO]: embedding_vector
 [INFO]: embedding_vector
 PerReplica:{
  0: tf.Tensor(
[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1

}PerReplica:{
  0: tf.Tensor(
[[0.8010758  0.8010758  0.8010758  ... 0.80136186 0.80136186 0.80136186]
 [0.8019467  0.8019467  0.8019467  ... 0.8021935  0.8021935  0.8021935 ]
 [0.8017717  0.8017717  0.8017717  ... 0.80201644 0.80201644 0.80201644]
 ...
 [0.8017008  0.8017008  0.8017008  ... 0.80107605 0.80107605 0.80107605]
 [0.8011438  0.8011438  0.8011438  ... 0.8013276  0.8013276  0.8013276 ]
 [0.8024448  0.8024448  0.8024448  ... 0.8031342  0.8031342  0.8031342 ]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[0.8012887  0.8012887  0.8012887  ... 0.80096495 0.80096495 0.80096495]
 [0.8042176  0.8042176  0.8042176  ... 0.80138516 0.80138516 0.80138516]
 [0.8019333  0.8019333  0.8019333  ... 0.80153835 0.80153835 0.80153835]
 ...
 [0.8017067  0.8017067  0.8017067  ... 0.80109566 0.80109566 0.80109566]
 [0.8018422  0.8018422  0.8018422  ... 0.80205244 0.80205244 0.80205244]
 [0.80099267 0.80099267 0.80099267 ... 0.8023248  0.8023248  0.8023248 ]], shape=(8192, 40), dtype=float3

}PerReplica:{
  0: tf.Tensor(
[[0.60496116 0.60496116 0.60496116 ... 0.60659146 0.60659146 0.60659146]
 [0.6058155  0.6058155  0.6058155  ... 0.60560036 0.60560036 0.60560036]
 [0.606745   0.606745   0.606745   ... 0.60665023 0.60665023 0.60665023]
 ...
 [0.6053973  0.6053973  0.6053973  ... 0.6064053  0.6064053  0.6064053 ]
 [0.60366404 0.60366404 0.60366404 ... 0.60544264 0.60544264 0.60544264]
 [0.6058608  0.6058608  0.6058608  ... 0.6104212  0.6104212  0.6104212 ]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[0.6063523  0.6063523  0.6063523  ... 0.60864097 0.60864097 0.60864097]
 [0.6086781  0.6086781  0.6086781  ... 0.60800236 0.60800236 0.60800236]
 [0.60317314 0.60317314 0.60317314 ... 0.6095214  0.6095214  0.6095214 ]
 ...
 [0.6087903  0.6087903  0.6087903  ... 0.6043147  0.6043147  0.6043147 ]
 [0.60209495 0.60209495 0.60209495 ... 0.6071436  0.6071436  0.6071436 ]
 [0.6031012  0.6031012  0.6031012  ... 0.6070004  0.6070004  0.6070004 ]], shape=(8192, 40), dtype=float3

}PerReplica:{
  0: tf.Tensor(
[[0.41855583 0.41855583 0.41855583 ... 0.4185282  0.4185282  0.4185282 ]
 [0.41285908 0.41285908 0.41285908 ... 0.41414613 0.41414613 0.41414613]
 [0.42163068 0.42163068 0.42163068 ... 0.41743124 0.41743124 0.41743124]
 ...
 [0.417592   0.417592   0.417592   ... 0.41985178 0.41985178 0.41985178]
 [0.41547334 0.41547334 0.41547334 ... 0.4225157  0.4225157  0.4225157 ]
 [0.41381347 0.41381347 0.41381347 ... 0.42141372 0.42141372 0.42141372]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[0.4239341  0.4239341  0.4239341  ... 0.4192538  0.4192538  0.4192538 ]
 [0.4242925  0.4242925  0.4242925  ... 0.41270453 0.41270453 0.41270453]
 [0.41973093 0.41973093 0.41973093 ... 0.4197301  0.4197301  0.4197301 ]
 ...
 [0.4161967  0.4161967  0.4161967  ... 0.4190143  0.4190143  0.4190143 ]
 [0.4124771  0.4124771  0.4124771  ... 0.4176667  0.4176667  0.4176667 ]
 [0.41803324 0.41803324 0.41803324 ... 0.41557956 0.41557956 0.41557956]], shape=(8192, 40), dtype=float3

}PerReplica:{
  0: tf.Tensor(
[[0.23506397 0.23506397 0.23506397 ... 0.23980045 0.23980045 0.23980045]
 [0.24067622 0.24067622 0.24067622 ... 0.2388439  0.2388439  0.2388439 ]
 [0.23719361 0.23719361 0.23719361 ... 0.24090225 0.24090225 0.24090225]
 ...
 [0.24921389 0.24921389 0.24921389 ... 0.2496515  0.2496515  0.2496515 ]
 [0.24006933 0.24006933 0.24006933 ... 0.24064118 0.24064118 0.24064118]
 [0.23475248 0.23475248 0.23475248 ... 0.24043626 0.24043626 0.24043626]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[0.24576017 0.24576017 0.24576017 ... 0.24465412 0.24465412 0.24465412]
 [0.24436861 0.24436861 0.24436861 ... 0.24672823 0.24672823 0.24672823]
 [0.2503479  0.2503479  0.2503479  ... 0.22655067 0.22655067 0.22655067]
 ...
 [0.24408603 0.24408603 0.24408603 ... 0.22687855 0.22687855 0.22687855]
 [0.23251489 0.23251489 0.23251489 ... 0.24404973 0.24404973 0.24404973]
 [0.26572958 0.26572958 0.26572958 ... 0.2428817  0.2428817  0.2428817 ]], shape=(8192, 40), dtype=float3

}PerReplica:{
  0: tf.Tensor(
[[0.09687094 0.09687094 0.09687094 ... 0.06834569 0.06834569 0.06834569]
 [0.08395098 0.08395098 0.08395098 ... 0.08104706 0.08104706 0.08104706]
 [0.10349726 0.10349726 0.10349726 ... 0.08655937 0.08655937 0.08655937]
 ...
 [0.09472247 0.09472247 0.09472247 ... 0.09190756 0.09190756 0.09190756]
 [0.09533982 0.09533982 0.09533982 ... 0.10539721 0.10539721 0.10539721]
 [0.08562803 0.08562803 0.08562803 ... 0.09929486 0.09929486 0.09929486]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[0.08737717 0.08737717 0.08737717 ... 0.08538679 0.08538679 0.08538679]
 [0.0906418  0.0906418  0.0906418  ... 0.09964925 0.09964925 0.09964925]
 [0.08544888 0.08544888 0.08544888 ... 0.09801837 0.09801837 0.09801837]
 ...
 [0.09678078 0.09678078 0.09678078 ... 0.08906878 0.08906878 0.08906878]
 [0.088358   0.088358   0.088358   ... 0.09481758 0.09481758 0.09481758]
 [0.0847678  0.0847678  0.0847678  ... 0.09385458 0.09385458 0.09385458]], shape=(8192, 40), dtype=float3

}PerReplica:{
  0: tf.Tensor(
[[-0.03126866 -0.03126866 -0.03126866 ... -0.03134074 -0.03134074
  -0.03134074]
 [-0.02751561 -0.02751561 -0.02751561 ... -0.03708369 -0.03708369
  -0.03708369]
 [-0.0407884  -0.0407884  -0.0407884  ... -0.04872361 -0.04872361
  -0.04872361]
 ...
 [-0.03105154 -0.03105154 -0.03105154 ... -0.01474717 -0.01474717
  -0.01474717]
 [-0.02871628 -0.02871628 -0.02871628 ... -0.06141364 -0.06141364
  -0.06141364]
 [-0.0453866  -0.0453866  -0.0453866  ... -0.05230027 -0.05230027
  -0.05230027]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.03245939 -0.03245939 -0.03245939 ... -0.03913233 -0.03913233
  -0.03913233]
 [-0.02579645 -0.02579645 -0.02579645 ... -0.04029911 -0.04029911
  -0.04029911]
 [-0.03592315 -0.03592315 -0.03592315 ... -0.03873451 -0.03873451
  -0.03873451]
 ...
 [-0.01905179 -0.01905179 -0.01905179 ... -0.02560707 -0.02560707
  -0.02560707]
 [-0.02032471 -0.02032471 -0.02032471 ... -0.03271369 -0.03271369
  -0.03271369]
 [-0.03547594 -0.

}PerReplica:{
  0: tf.Tensor(
[[-0.11252594 -0.11252594 -0.11252594 ... -0.12579718 -0.12579718
  -0.12579718]
 [-0.1192105  -0.1192105  -0.1192105  ... -0.1308043  -0.1308043
  -0.1308043 ]
 [-0.12866615 -0.12866615 -0.12866615 ... -0.11974386 -0.11974386
  -0.11974386]
 ...
 [-0.12223078 -0.12223078 -0.12223078 ... -0.12265706 -0.12265706
  -0.12265706]
 [-0.1313988  -0.1313988  -0.1313988  ... -0.12434806 -0.12434806
  -0.12434806]
 [-0.14779508 -0.14779508 -0.14779508 ... -0.11236013 -0.11236013
  -0.11236013]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.14241144 -0.14241144 -0.14241144 ... -0.13149698 -0.13149698
  -0.13149698]
 [-0.1594122  -0.1594122  -0.1594122  ... -0.09576359 -0.09576359
  -0.09576359]
 [-0.11345179 -0.11345179 -0.11345179 ... -0.13191287 -0.13191287
  -0.13191287]
 ...
 [-0.15396285 -0.15396285 -0.15396285 ... -0.13534078 -0.13534078
  -0.13534078]
 [-0.14552562 -0.14552562 -0.14552562 ... -0.13196358 -0.13196358
  -0.13196358]
 [-0.12925743 -0.1

}PerReplica:{
  0: tf.Tensor(
[[-0.18557853 -0.18557853 -0.18557853 ... -0.2294093  -0.2294093
  -0.2294093 ]
 [-0.19239408 -0.19239408 -0.19239408 ... -0.21053383 -0.21053383
  -0.21053383]
 [-0.19028439 -0.19028439 -0.19028439 ... -0.21640424 -0.21640424
  -0.21640424]
 ...
 [-0.20881002 -0.20881002 -0.20881002 ... -0.20857424 -0.20857424
  -0.20857424]
 [-0.19312575 -0.19312575 -0.19312575 ... -0.1961278  -0.1961278
  -0.1961278 ]
 [-0.22845505 -0.22845505 -0.22845505 ... -0.17724025 -0.17724025
  -0.17724025]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.21108857 -0.21108857 -0.21108857 ... -0.21608493 -0.21608493
  -0.21608493]
 [-0.17435628 -0.17435628 -0.17435628 ... -0.20663387 -0.20663387
  -0.20663387]
 [-0.2124694  -0.2124694  -0.2124694  ... -0.21103959 -0.21103959
  -0.21103959]
 ...
 [-0.21794951 -0.21794951 -0.21794951 ... -0.23521954 -0.23521954
  -0.23521954]
 [-0.23233305 -0.23233305 -0.23233305 ... -0.19545047 -0.19545047
  -0.19545047]
 [-0.19717279 -0.19

}PerReplica:{
  0: tf.Tensor(
[[-0.25776958 -0.25776958 -0.25776958 ... -0.2531976  -0.2531976
  -0.2531976 ]
 [-0.25639272 -0.25639272 -0.25639272 ... -0.26465923 -0.26465923
  -0.26465923]
 [-0.26687717 -0.26687717 -0.26687717 ... -0.26470613 -0.26470613
  -0.26470613]
 ...
 [-0.28487194 -0.28487194 -0.28487194 ... -0.2820628  -0.2820628
  -0.2820628 ]
 [-0.25619847 -0.25619847 -0.25619847 ... -0.2687176  -0.2687176
  -0.2687176 ]
 [-0.29643747 -0.29643747 -0.29643747 ... -0.3000363  -0.3000363
  -0.3000363 ]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.3045588  -0.3045588  -0.3045588  ... -0.26813287 -0.26813287
  -0.26813287]
 [-0.27863112 -0.27863112 -0.27863112 ... -0.273818   -0.273818
  -0.273818  ]
 [-0.27770215 -0.27770215 -0.27770215 ... -0.27316087 -0.27316087
  -0.27316087]
 ...
 [-0.27075255 -0.27075255 -0.27075255 ... -0.27963242 -0.27963242
  -0.27963242]
 [-0.26409572 -0.26409572 -0.26409572 ... -0.3014991  -0.3014991
  -0.3014991 ]
 [-0.26489216 -0.2648921

}PerReplica:{
  0: tf.Tensor(
[[-0.30004576 -0.30004576 -0.30004576 ... -0.30129188 -0.30129188
  -0.30129188]
 [-0.27407452 -0.27407452 -0.27407452 ... -0.29746908 -0.29746908
  -0.29746908]
 [-0.32805672 -0.32805672 -0.32805672 ... -0.3282205  -0.3282205
  -0.3282205 ]
 ...
 [-0.2584542  -0.2584542  -0.2584542  ... -0.30776444 -0.30776444
  -0.30776444]
 [-0.30064857 -0.30064857 -0.30064857 ... -0.30684966 -0.30684966
  -0.30684966]
 [-0.29471582 -0.29471582 -0.29471582 ... -0.2930435  -0.2930435
  -0.2930435 ]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.28290823 -0.28290823 -0.28290823 ... -0.3102165  -0.3102165
  -0.3102165 ]
 [-0.3113042  -0.3113042  -0.3113042  ... -0.28651857 -0.28651857
  -0.28651857]
 [-0.31294394 -0.31294394 -0.31294394 ... -0.30894366 -0.30894366
  -0.30894366]
 ...
 [-0.3335798  -0.3335798  -0.3335798  ... -0.30469698 -0.30469698
  -0.30469698]
 [-0.30331498 -0.30331498 -0.30331498 ... -0.32693917 -0.32693917
  -0.32693917]
 [-0.33106196 -0.331

}PerReplica:{
  0: tf.Tensor(
[[-0.3227744  -0.3227744  -0.3227744  ... -0.3298269  -0.3298269
  -0.3298269 ]
 [-0.27946788 -0.27946788 -0.27946788 ... -0.2973371  -0.2973371
  -0.2973371 ]
 [-0.3116777  -0.3116777  -0.3116777  ... -0.31368592 -0.31368592
  -0.31368592]
 ...
 [-0.31977898 -0.31977898 -0.31977898 ... -0.28217465 -0.28217465
  -0.28217465]
 [-0.293393   -0.293393   -0.293393   ... -0.2974066  -0.2974066
  -0.2974066 ]
 [-0.33780164 -0.33780164 -0.33780164 ... -0.2792967  -0.2792967
  -0.2792967 ]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.31916696 -0.31916696 -0.31916696 ... -0.32584828 -0.32584828
  -0.32584828]
 [-0.305668   -0.305668   -0.305668   ... -0.317971   -0.317971
  -0.317971  ]
 [-0.30147827 -0.30147827 -0.30147827 ... -0.32699552 -0.32699552
  -0.32699552]
 ...
 [-0.3158459  -0.3158459  -0.3158459  ... -0.31856397 -0.31856397
  -0.31856397]
 [-0.2945599  -0.2945599  -0.2945599  ... -0.3068571  -0.3068571
  -0.3068571 ]
 [-0.33652583 -0.3365258

}PerReplica:{
  0: tf.Tensor(
[[-0.2783395  -0.2783395  -0.2783395  ... -0.26254717 -0.26254717
  -0.26254717]
 [-0.28537008 -0.28537008 -0.28537008 ... -0.29203886 -0.29203886
  -0.29203886]
 [-0.30213028 -0.30213028 -0.30213028 ... -0.30334422 -0.30334422
  -0.30334422]
 ...
 [-0.29492742 -0.29492742 -0.29492742 ... -0.30114916 -0.30114916
  -0.30114916]
 [-0.28009176 -0.28009176 -0.28009176 ... -0.26556987 -0.26556987
  -0.26556987]
 [-0.3019367  -0.3019367  -0.3019367  ... -0.27844056 -0.27844056
  -0.27844056]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.28777435 -0.28777435 -0.28777435 ... -0.26763028 -0.26763028
  -0.26763028]
 [-0.2730748  -0.2730748  -0.2730748  ... -0.28799504 -0.28799504
  -0.28799504]
 [-0.28461662 -0.28461662 -0.28461662 ... -0.257796   -0.257796
  -0.257796  ]
 ...
 [-0.27208477 -0.27208477 -0.27208477 ... -0.28623772 -0.28623772
  -0.28623772]
 [-0.28056982 -0.28056982 -0.28056982 ... -0.30582836 -0.30582836
  -0.30582836]
 [-0.30676466 -0.30

}PerReplica:{
  0: tf.Tensor(
[[-0.20331605 -0.20331605 -0.20331605 ... -0.24649958 -0.24649958
  -0.24649958]
 [-0.2649289  -0.2649289  -0.2649289  ... -0.2227771  -0.2227771
  -0.2227771 ]
 [-0.23291008 -0.23291008 -0.23291008 ... -0.27009806 -0.27009806
  -0.27009806]
 ...
 [-0.23927948 -0.23927948 -0.23927948 ... -0.24216457 -0.24216457
  -0.24216457]
 [-0.22370991 -0.22370991 -0.22370991 ... -0.2237774  -0.2237774
  -0.2237774 ]
 [-0.22651038 -0.22651038 -0.22651038 ... -0.21587408 -0.21587408
  -0.21587408]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.25728786 -0.25728786 -0.25728786 ... -0.26064253 -0.26064253
  -0.26064253]
 [-0.24723393 -0.24723393 -0.24723393 ... -0.21807657 -0.21807657
  -0.21807657]
 [-0.25506443 -0.25506443 -0.25506443 ... -0.24831721 -0.24831721
  -0.24831721]
 ...
 [-0.27103567 -0.27103567 -0.27103567 ... -0.22737162 -0.22737162
  -0.22737162]
 [-0.22401662 -0.22401662 -0.22401662 ... -0.24344382 -0.24344382
  -0.24344382]
 [-0.25443858 -0.25

}PerReplica:{
  0: tf.Tensor(
[[-0.15625171 -0.15625171 -0.15625171 ... -0.22421753 -0.22421753
  -0.22421753]
 [-0.18535993 -0.18535993 -0.18535993 ... -0.19527435 -0.19527435
  -0.19527435]
 [-0.16356798 -0.16356798 -0.16356798 ... -0.20920229 -0.20920229
  -0.20920229]
 ...
 [-0.15948589 -0.15948589 -0.15948589 ... -0.19727905 -0.19727905
  -0.19727905]
 [-0.20972228 -0.20972228 -0.20972228 ... -0.2057939  -0.2057939
  -0.2057939 ]
 [-0.22022368 -0.22022368 -0.22022368 ... -0.19431953 -0.19431953
  -0.19431953]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.19160831 -0.19160831 -0.19160831 ... -0.18935217 -0.18935217
  -0.18935217]
 [-0.19834432 -0.19834432 -0.19834432 ... -0.17992145 -0.17992145
  -0.17992145]
 [-0.16641375 -0.16641375 -0.16641375 ... -0.18862331 -0.18862331
  -0.18862331]
 ...
 [-0.20468545 -0.20468545 -0.20468545 ... -0.18401584 -0.18401584
  -0.18401584]
 [-0.18739638 -0.18739638 -0.18739638 ... -0.19530898 -0.19530898
  -0.19530898]
 [-0.21415822 -0.2

}PerReplica:{
  0: tf.Tensor(
[[-0.19174416 -0.19174416 -0.19174416 ... -0.1662325  -0.1662325
  -0.1662325 ]
 [-0.16288634 -0.16288634 -0.16288634 ... -0.17257543 -0.17257543
  -0.17257543]
 [-0.16131045 -0.16131045 -0.16131045 ... -0.16490233 -0.16490233
  -0.16490233]
 ...
 [-0.15292597 -0.15292597 -0.15292597 ... -0.1585483  -0.1585483
  -0.1585483 ]
 [-0.16649374 -0.16649374 -0.16649374 ... -0.16409777 -0.16409777
  -0.16409777]
 [-0.12612556 -0.12612556 -0.12612556 ... -0.1791183  -0.1791183
  -0.1791183 ]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.151345   -0.151345   -0.151345   ... -0.1565192  -0.1565192
  -0.1565192 ]
 [-0.17895174 -0.17895174 -0.17895174 ... -0.1429563  -0.1429563
  -0.1429563 ]
 [-0.1588629  -0.1588629  -0.1588629  ... -0.15774553 -0.15774553
  -0.15774553]
 ...
 [-0.18394932 -0.18394932 -0.18394932 ... -0.13967031 -0.13967031
  -0.13967031]
 [-0.14761049 -0.14761049 -0.14761049 ... -0.14577779 -0.14577779
  -0.14577779]
 [-0.18036704 -0.18036

}PerReplica:{
  0: tf.Tensor(
[[-0.16815151 -0.16815151 -0.16815151 ... -0.1458141  -0.1458141
  -0.1458141 ]
 [-0.13497877 -0.13497877 -0.13497877 ... -0.13760954 -0.13760954
  -0.13760954]
 [-0.15989488 -0.15989488 -0.15989488 ... -0.14937398 -0.14937398
  -0.14937398]
 ...
 [-0.17722663 -0.17722663 -0.17722663 ... -0.1427409  -0.1427409
  -0.1427409 ]
 [-0.147671   -0.147671   -0.147671   ... -0.13292259 -0.13292259
  -0.13292259]
 [-0.16486892 -0.16486892 -0.16486892 ... -0.1675607  -0.1675607
  -0.1675607 ]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.13432895 -0.13432895 -0.13432895 ... -0.13488185 -0.13488185
  -0.13488185]
 [-0.07660555 -0.07660555 -0.07660555 ... -0.12876102 -0.12876102
  -0.12876102]
 [-0.11597651 -0.11597651 -0.11597651 ... -0.14008886 -0.14008886
  -0.14008886]
 ...
 [-0.16307572 -0.16307572 -0.16307572 ... -0.14733616 -0.14733616
  -0.14733616]
 [-0.15181512 -0.15181512 -0.15181512 ... -0.1278985  -0.1278985
  -0.1278985 ]
 [-0.1458592  -0.1458

}PerReplica:{
  0: tf.Tensor(
[[-0.14747244 -0.14747244 -0.14747244 ... -0.1309759  -0.1309759
  -0.1309759 ]
 [-0.13157403 -0.13157403 -0.13157403 ... -0.13023405 -0.13023405
  -0.13023405]
 [-0.1297859  -0.1297859  -0.1297859  ... -0.11079148 -0.11079148
  -0.11079148]
 ...
 [-0.14686443 -0.14686443 -0.14686443 ... -0.14289051 -0.14289051
  -0.14289051]
 [-0.11602416 -0.11602416 -0.11602416 ... -0.05856589 -0.05856589
  -0.05856589]
 [-0.16118762 -0.16118762 -0.16118762 ... -0.13944644 -0.13944644
  -0.13944644]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.09866805 -0.09866805 -0.09866805 ... -0.11413323 -0.11413323
  -0.11413323]
 [-0.1725307  -0.1725307  -0.1725307  ... -0.16141652 -0.16141652
  -0.16141652]
 [-0.13714427 -0.13714427 -0.13714427 ... -0.13591951 -0.13591951
  -0.13591951]
 ...
 [-0.1346918  -0.1346918  -0.1346918  ... -0.13579184 -0.13579184
  -0.13579184]
 [-0.15924418 -0.15924418 -0.15924418 ... -0.16946225 -0.16946225
  -0.16946225]
 [-0.12373075 -0.1

}
PerReplica:{
  0: tf.Tensor(
[[-0.13050523 -0.13050523 -0.13050523 ... -0.12057085 -0.12057085
  -0.12057085]
 [-0.13114573 -0.13114573 -0.13114573 ... -0.1748514  -0.1748514
  -0.1748514 ]
 [-0.12738082 -0.12738082 -0.12738082 ... -0.12301777 -0.12301777
  -0.12301777]
 ...
 [-0.13155879 -0.13155879 -0.13155879 ... -0.14655757 -0.14655757
  -0.14655757]
 [-0.09117307 -0.09117307 -0.09117307 ... -0.1276281  -0.1276281
  -0.1276281 ]
 [-0.20856729 -0.20856729 -0.20856729 ... -0.10124613 -0.10124613
  -0.10124613]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.09517786 -0.09517786 -0.09517786 ... -0.10599483 -0.10599483
  -0.10599483]
 [-0.13572377 -0.13572377 -0.13572377 ... -0.12271081 -0.12271081
  -0.12271081]
 [-0.14163588 -0.14163588 -0.14163588 ... -0.1844615  -0.1844615
  -0.1844615 ]
 ...
 [-0.1270605  -0.1270605  -0.1270605  ... -0.11611266 -0.11611266
  -0.11611266]
 [-0.12610337 -0.12610337 -0.12610337 ... -0.15052158 -0.15052158
  -0.15052158]
 [-0.18152963 -0.18

}PerReplica:{
  0: tf.Tensor(
[[-0.1500794  -0.1500794  -0.1500794  ... -0.11097371 -0.11097371
  -0.11097371]
 [-0.18026407 -0.18026407 -0.18026407 ... -0.14487176 -0.14487176
  -0.14487176]
 [-0.11632041 -0.11632041 -0.11632041 ... -0.17557865 -0.17557865
  -0.17557865]
 ...
 [-0.13093327 -0.13093327 -0.13093327 ... -0.14237332 -0.14237332
  -0.14237332]
 [-0.1271075  -0.1271075  -0.1271075  ... -0.16766535 -0.16766535
  -0.16766535]
 [-0.16502622 -0.16502622 -0.16502622 ... -0.18193355 -0.18193355
  -0.18193355]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.14253503 -0.14253503 -0.14253503 ... -0.16362204 -0.16362204
  -0.16362204]
 [-0.16887572 -0.16887572 -0.16887572 ... -0.17062588 -0.17062588
  -0.17062588]
 [-0.16398257 -0.16398257 -0.16398257 ... -0.2138996  -0.2138996
  -0.2138996 ]
 ...
 [-0.13861682 -0.13861682 -0.13861682 ... -0.17234063 -0.17234063
  -0.17234063]
 [-0.18770203 -0.18770203 -0.18770203 ... -0.13176277 -0.13176277
  -0.13176277]
 [-0.15724567 -0.1

}PerReplica:{
  0: tf.Tensor(
[[-0.14670214 -0.14670214 -0.14670214 ... -0.1523164  -0.1523164
  -0.1523164 ]
 [-0.18418339 -0.18418339 -0.18418339 ... -0.16919132 -0.16919132
  -0.16919132]
 [-0.1578069  -0.1578069  -0.1578069  ... -0.1820644  -0.1820644
  -0.1820644 ]
 ...
 [-0.14243269 -0.14243269 -0.14243269 ... -0.17708308 -0.17708308
  -0.17708308]
 [-0.17807823 -0.17807823 -0.17807823 ... -0.1514872  -0.1514872
  -0.1514872 ]
 [-0.16545098 -0.16545098 -0.16545098 ... -0.19027108 -0.19027108
  -0.19027108]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.23063153 -0.23063153 -0.23063153 ... -0.14859116 -0.14859116
  -0.14859116]
 [-0.17656955 -0.17656955 -0.17656955 ... -0.1687172  -0.1687172
  -0.1687172 ]
 [-0.17838474 -0.17838474 -0.17838474 ... -0.19209486 -0.19209486
  -0.19209486]
 ...
 [-0.15956108 -0.15956108 -0.15956108 ... -0.16459085 -0.16459085
  -0.16459085]
 [-0.17470922 -0.17470922 -0.17470922 ... -0.187361   -0.187361
  -0.187361  ]
 [-0.18290927 -0.182909

}PerReplica:{
  0: tf.Tensor(
[[-0.21465573 -0.21465573 -0.21465573 ... -0.22436312 -0.22436312
  -0.22436312]
 [-0.22589427 -0.22589427 -0.22589427 ... -0.21576276 -0.21576276
  -0.21576276]
 [-0.19167176 -0.19167176 -0.19167176 ... -0.14702143 -0.14702143
  -0.14702143]
 ...
 [-0.19376385 -0.19376385 -0.19376385 ... -0.17944437 -0.17944437
  -0.17944437]
 [-0.17556955 -0.17556955 -0.17556955 ... -0.1653204  -0.1653204
  -0.1653204 ]
 [-0.20857863 -0.20857863 -0.20857863 ... -0.20476031 -0.20476031
  -0.20476031]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.16983932 -0.16983932 -0.16983932 ... -0.20705335 -0.20705335
  -0.20705335]
 [-0.2066825  -0.2066825  -0.2066825  ... -0.18662593 -0.18662593
  -0.18662593]
 [-0.22156033 -0.22156033 -0.22156033 ... -0.18917611 -0.18917611
  -0.18917611]
 ...
 [-0.20086372 -0.20086372 -0.20086372 ... -0.21794896 -0.21794896
  -0.21794896]
 [-0.20554101 -0.20554101 -0.20554101 ... -0.22674328 -0.22674328
  -0.22674328]
 [-0.16745889 -0.1

}PerReplica:{
  0: tf.Tensor(
[[-0.23361634 -0.23361634 -0.23361634 ... -0.22457527 -0.22457527
  -0.22457527]
 [-0.26063493 -0.26063493 -0.26063493 ... -0.21891987 -0.21891987
  -0.21891987]
 [-0.24077351 -0.24077351 -0.24077351 ... -0.24446118 -0.24446118
  -0.24446118]
 ...
 [-0.19945773 -0.19945773 -0.19945773 ... -0.21475297 -0.21475297
  -0.21475297]
 [-0.2226859  -0.2226859  -0.2226859  ... -0.21949983 -0.21949983
  -0.21949983]
 [-0.25492153 -0.25492153 -0.25492153 ... -0.23160473 -0.23160473
  -0.23160473]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.24050513 -0.24050513 -0.24050513 ... -0.19125937 -0.19125937
  -0.19125937]
 [-0.24115127 -0.24115127 -0.24115127 ... -0.27125758 -0.27125758
  -0.27125758]
 [-0.26922867 -0.26922867 -0.26922867 ... -0.2017139  -0.2017139
  -0.2017139 ]
 ...
 [-0.19779147 -0.19779147 -0.19779147 ... -0.26071554 -0.26071554
  -0.26071554]
 [-0.26236874 -0.26236874 -0.26236874 ... -0.20335548 -0.20335548
  -0.20335548]
 [-0.18621954 -0.1

}PerReplica:{
  0: tf.Tensor(
[[-0.22543876 -0.22543876 -0.22543876 ... -0.23053543 -0.23053543
  -0.23053543]
 [-0.2663606  -0.2663606  -0.2663606  ... -0.24494916 -0.24494916
  -0.24494916]
 [-0.2087196  -0.2087196  -0.2087196  ... -0.25625402 -0.25625402
  -0.25625402]
 ...
 [-0.2351336  -0.2351336  -0.2351336  ... -0.24896231 -0.24896231
  -0.24896231]
 [-0.23828721 -0.23828721 -0.23828721 ... -0.30302948 -0.30302948
  -0.30302948]
 [-0.26398286 -0.26398286 -0.26398286 ... -0.22848614 -0.22848614
  -0.22848614]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.24402103 -0.24402103 -0.24402103 ... -0.20474172 -0.20474172
  -0.20474172]
 [-0.23009908 -0.23009908 -0.23009908 ... -0.21608531 -0.21608531
  -0.21608531]
 [-0.24618314 -0.24618314 -0.24618314 ... -0.22316566 -0.22316566
  -0.22316566]
 ...
 [-0.23506153 -0.23506153 -0.23506153 ... -0.20431197 -0.20431197
  -0.20431197]
 [-0.27290025 -0.27290025 -0.27290025 ... -0.20350437 -0.20350437
  -0.20350437]
 [-0.22526461 -0.

}PerReplica:{
  0: tf.Tensor(
[[-0.26984927 -0.26984927 -0.26984927 ... -0.24529219 -0.24529219
  -0.24529219]
 [-0.25112462 -0.25112462 -0.25112462 ... -0.25174344 -0.25174344
  -0.25174344]
 [-0.22615665 -0.22615665 -0.22615665 ... -0.23483276 -0.23483276
  -0.23483276]
 ...
 [-0.23128834 -0.23128834 -0.23128834 ... -0.26323932 -0.26323932
  -0.26323932]
 [-0.25968194 -0.25968194 -0.25968194 ... -0.25421366 -0.25421366
  -0.25421366]
 [-0.22657561 -0.22657561 -0.22657561 ... -0.30627003 -0.30627003
  -0.30627003]], shape=(8192, 40), dtype=float32),
  1: tf.Tensor(
[[-0.17424306 -0.17424306 -0.17424306 ... -0.23624444 -0.23624444
  -0.23624444]
 [-0.21892571 -0.21892571 -0.21892571 ... -0.26961914 -0.26961914
  -0.26961914]
 [-0.24676792 -0.24676792 -0.24676792 ... -0.22595768 -0.22595768
  -0.22595768]
 ...
 [-0.26171774 -0.26171774 -0.26171774 ... -0.20032802 -0.20032802
  -0.20032802]
 [-0.22521436 -0.22521436 -0.22521436 ... -0.22760205 -0.22760205
  -0.22760205]
 [-0.28012815 -0.

 [1. 1. 1. ... 1. 1. 1.]], shape=(65536, 40), dtype=float32)
------------------------------ 1 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[0.8010717  0.8010717  0.8010717  ... 0.80135775 0.80135775 0.80135775]
 [0.8019426  0.8019426  0.8019426  ... 0.80218947 0.80218947 0.80218947]
 [0.8017676  0.8017676  0.8017676  ... 0.80201226 0.80201226 0.80201226]
 ...
 [0.80214393 0.80214393 0.80214393 ... 0.8014156  0.8014156  0.8014156 ]
 [0.8008634  0.8008634  0.8008634  ... 0.8021169  0.8021169  0.8021169 ]
 [0.80100465 0.80100465 0.80100465 ... 0.80109835 0.80109835 0.80109835]], shape=(65536, 40), dtype=float32)
[INFO]: embedding_vector:
 tf.Tensor(
[[0.90052605 0.90052605 0.90052605 ... 0.9005313  0.9005313  0.9005313 ]
 [0.90055907 0.90055907 0.90055907 ... 0.9005337  0.9005337  0.9005337 ]
 [0.90046984 0.90046984 0.90046984 ... 0.9006437  0.9006437  0.9006437 ]
 ...
 [0.90049905 0.90049905 0.90049905 ... 0.90063065 0.90063065 0.90063065]
 [0.9005899  0.9005899 

------------------------------
------------------------------ 9 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[0.23505837 0.23505837 0.23505837 ... 0.23979488 0.23979488 0.23979488]
 [0.24067074 0.24067074 0.24067074 ... 0.23883832 0.23883832 0.23883832]
 [0.23718816 0.23718816 0.23718816 ... 0.2408967  0.2408967  0.2408967 ]
 ...
 [0.24934453 0.24934453 0.24934453 ... 0.2331647  0.2331647  0.2331647 ]
 [0.2427354  0.2427354  0.2427354  ... 0.23341992 0.23341992 0.23341992]
 [0.25642774 0.25642774 0.25642774 ... 0.23338681 0.23338681 0.23338681]], shape=(65536, 40), dtype=float32)
------------------------------ 9 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[0.16279979 0.16279979 0.16279979 ... 0.17118676 0.17118676 0.17118676]
 [0.16621807 0.16621807 0.16621807 ... 0.16611832 0.16611832 0.16611832]
 [0.1661632  0.1661632  0.1661632  ... 0.16512689 0.16512689 0.16512689]
 ...
 [0.17154846 0.17154846 0.17154846 ... 0.16619283 0.16619283 0

  -0.1809609 ]], shape=(65536, 40), dtype=float32)
------------------------------ 16 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.1743881  -0.1743881  -0.1743881  ... -0.16145132 -0.16145132
  -0.16145132]
 [-0.13499284 -0.13499284 -0.13499284 ... -0.15988389 -0.15988389
  -0.15988389]
 [-0.16099232 -0.16099232 -0.16099232 ... -0.17622605 -0.17622605
  -0.17622605]
 ...
 [-0.1669551  -0.1669551  -0.1669551  ... -0.16675307 -0.16675307
  -0.16675307]
 [-0.1811203  -0.1811203  -0.1811203  ... -0.16348746 -0.16348746
  -0.16348746]
 [-0.1748159  -0.1748159  -0.1748159  ... -0.1809609  -0.1809609
  -0.1809609 ]], shape=(65536, 40), dtype=float32)
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.18558341 -0.18558341 -0.18558341 ... -0.22941414 -0.22941414
  -0.22941414]
 [-0.19239876 -0.19239876 -0.19239876 ... -0.21053864 -0.21053864
  -0.21053864]
 [-0.19028914 -0.19028914 -0.19028914 ... -0.2164091  -0.2164091
  -0.2164091 ]
 ...
 [-0.17878489 -0.17878489 -0.178784

------------------------------ 21 ------------------------------
------------------------------ 23 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.319013   -0.319013   -0.319013   ... -0.31606743 -0.31606743
  -0.31606743]
 [-0.3031985  -0.3031985  -0.3031985  ... -0.29684377 -0.29684377
  -0.29684377]
 [-0.3093907  -0.3093907  -0.3093907  ... -0.3269143  -0.3269143
  -0.3269143 ]
 ...
 [-0.31462306 -0.31462306 -0.31462306 ... -0.3337177  -0.3337177
  -0.3337177 ]
 [-0.30985603 -0.30985603 -0.30985603 ... -0.3014966  -0.3014966
  -0.3014966 ]
 [-0.30994752 -0.30994752 -0.30994752 ... -0.33786583 -0.33786583
  -0.33786583]], shape=(65536, 40), dtype=float32)
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.3010753  -0.3010753  -0.3010753  ... -0.26137453 -0.26137453
  -0.26137453]
 [-0.2920072  -0.2920072  -0.2920072  ... -0.26624942 -0.26624942
  -0.26624942]
 [-0.29443598 -0.29443598 -0.29443598 ... -0.31122905 -0.31122905
  -0.31122905]
 ...
 [-0.2656673  -0.26566

  -0.18044093]], shape=(65536, 40), dtype=float32)
[INFO]: embedding_vector:
------------------------------  30 tf.Tensor(
[[-0.20522626 -0.20522626 -0.20522626 ... -0.17386943 -0.17386943
  -0.17386943]
 [-0.2058292  -0.2058292  -0.2058292  ... -0.17246705 -0.17246705
  -0.17246705]
 [-0.21884426 -0.21884426 -0.21884426 ... -0.21649659 -0.21649659
  -0.21649659]
 ...
 [-0.21075189 -0.21075189 -0.21075189 ... -0.21303983 -0.21303983
  -0.21303983]
 [-0.23248424 -0.23248424 -0.23248424 ... -0.22775568 -0.22775568
  -0.22775568]
 [-0.22323845 -0.22323845 -0.22323845 ... -0.22112778 -0.22112778
  -0.22112778]], shape=(65536, 40), dtype=float32)------------------------------

------------------------------ 28 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.19175145 -0.19175145 -0.19175145 ... -0.16623998 -0.16623998
  -0.16623998]
 [-0.16289337 -0.16289337 -0.16289337 ... -0.1725828  -0.1725828
  -0.1725828 ]
 [-0.16131778 -0.16131778 -0.16131778 ... -0.16490965 -

  -0.11739247]], shape=(65536, 40), dtype=float32)
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.15180266 -0.15180266 -0.15180266 ... -0.15956396 -0.15956396
  -0.15956396]
 [-0.18056567 -0.18056567 -0.18056567 ... -0.11601666 -0.11601666
  -0.11601666]
 [-0.10885726 -0.10885726 -0.10885726 ... -0.17777488 -0.17777488
  -0.17777488]
 ...
 [-0.12706445 -0.12706445 -0.12706445 ... -0.12976485 -0.12976485
  -0.12976485]
 [-0.12794422 -0.12794422 -0.12794422 ... -0.13762495 -0.13762495
  -0.13762495]
 [-0.11606186 -0.11606186 -0.11606186 ... -0.1576132  -0.1576132
  -0.1576132 ]], shape=(65536, 40), dtype=float32)
------------------------------ 37 ------------------------------
------------------------------ 34 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.11994702 -0.11994702 -0.11994702 ... -0.16632766 -0.16632766
  -0.16632766]
 [-0.12410761 -0.12410761 -0.12410761 ... -0.17561823 -0.17561823
  -0.17561823]
 [-0.13765712 -0.13765712 -0.13765712 ... -0.15519011 -

  -0.15296361]], shape=(65536, 40), dtype=float32)
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.21763018 -0.21763018 -0.21763018 ... -0.21754324 -0.21754324
  -0.21754324]
 [-0.19721556 -0.19721556 -0.19721556 ... -0.20997626 -0.20997626
  -0.20997626]
 [-0.18108685 -0.18108685 -0.18108685 ... -0.17802556 -0.17802556
  -0.17802556]
 ...
 [-0.17719774 -0.17719774 -0.17719774 ... -0.1929565  -0.1929565
  -0.1929565 ]
 [-0.20807157 -0.20807157 -0.20807157 ... -0.18814465 -0.18814465
  -0.18814465]
 [-0.21611315 -0.21611315 -0.21611315 ... -0.1911238  -0.1911238
  -0.1911238 ]], shape=(65536, 40), dtype=float32)
------------------------------ 40 ------------------------------
------------------------------ 44 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.1847917  -0.1847917  -0.1847917  ... -0.18993592 -0.18993592
  -0.18993592]
 [-0.17649817 -0.17649817 -0.17649817 ... -0.17080042 -0.17080042
  -0.17080042]
 [-0.18804726 -0.18804726 -0.18804726 ... -0.15337943 -0

------------------------------ 46 ------------------------------
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.22544068 -0.22544068 -0.22544068 ... -0.23053782 -0.23053782
  -0.23053782]
 [-0.26636156 -0.26636156 -0.26636156 ... -0.24495119 -0.24495119
  -0.24495119]
 [-0.20872188 -0.20872188 -0.20872188 ... -0.25625733 -0.25625733
  -0.25625733]
 ...
 [-0.2380555  -0.2380555  -0.2380555  ... -0.26428485 -0.26428485
  -0.26428485]
 [-0.2280464  -0.2280464  -0.2280464  ... -0.22280021 -0.22280021
  -0.22280021]
 [-0.25092384 -0.25092384 -0.25092384 ... -0.2429116  -0.2429116
  -0.2429116 ]], shape=(65536, 40), dtype=float32)
------------------------------ 47 ------------------------------
[INFO] loadded from file ./sok_embedding_vectors_0.file
[INFO]: embedding_vector:
 tf.Tensor(
[[-0.22355214 -0.22355214 -0.22355214 ... -0.23591082 -0.23591082
  -0.23591082]
 [-0.27213818 -0.27213818 -0.27213818 ... -0.25591666 -0.25591666
  -0.25591666]
 [-0.27956653 -0.27956653 -0.27956653 ... -0.24207

If no exceptions and their embedding vectors are totally consistent, then a sentence similar to the following one will be printed.
```shell
"[INFO]: With MultiWorkerMirroredStrategy, when args["local_gpu_num"] GPUs are used for each node and total_gpu_num GPUs in total, the embedding vectors obtained from TensorFlow and SOK are consistent for 50 iterations"
```