# HS04 model

## Phase 1

> The semantic component consisted of the 1,989 semantic features described above. These units were all connected to 50 units in the semantic cleanup apparatus...

- *50 sem_cleanup*

> The phonological representation consisted of the 200 phonolog-ical units (eight slots of 25 units each), which projected onto a set of 50 phonological cleanup units. These...

- *50 pho_cleanup*

> The semantic component mapped onto the phonological component via a set of 500 hidden units. There was feedback in both directions. 

- *500 sem_pho_hidden_units*
- *500 pho_sem_hidden_units*

> The phonological form of the target word was clamped on the phonological units for 2.66 units of time. Then a target signal was provided for the next 1.33 units of time, in which the network was required to retain the phonological pattern in the absence of external clamping. 

- *4 output_ticks* 

> In Harm and Seidenberg (1999), auto-connections were used to give the units a tendency to retain their value but gradually decay. To accomplish the task, the network had to learn enough of the statistical regularities of the representations to prevent this decay. In the current simulations, the idea is the same, but because continuous time units were used, auto-connections were not necessary to provide the units with a tendency to gradually decay; this was part of the units’ normal processing dynamics.

> HS99: This makes it easier to read weights as correlations between units. Each phonological unit has an auto-connection: a weight set to 0.75 and frozen to that value.

- *No auto-connection lock*

> HS04: These trials were devoted to training the semantic attractor. This task was constructed to be analogous to the phonological task: The pattern of semantic units corresponding to the selected word was clamped onto the units for 2.66 units of time, and the network was allowed to cycle. Then the semantic units were unclamped, and the network’s task was to maintain their activity in the face of the tendency of the units’ activity to decay for 1.33 units of time. 

- *Attractor clamped for 8 steps, free for last 4 steps*
- implemented only in modeling.py but not the generator, it will just drop the extra generated time ticks 

In [1]:
%reload_ext lab_black
import pickle, os
import tensorflow as tf
import numpy as np
import pandas as pd
import altair as alt
from IPython.display import clear_output

import meta, data_wrangling, modeling, metrics, evaluate

# meta.set_gpu_mem_cap()

# Parameters block (for papermill)

In [2]:
code_name = "testing_testset_metrics"
tf_root = "/home/jupyter/tf"

# Model architechture
ort_units = 119
pho_units = 250
sem_units = 2446

hidden_os_units = 500  # P2
hidden_op_units = 100  # P2
hidden_ps_units = 500
hidden_sp_units = 500

pho_cleanup_units = 50
sem_cleanup_units = 50

pho_noise_level = 0.0  # P3
sem_noise_level = 0.0  # P3

activation = "sigmoid"
tau = 1 / 3
max_unit_time = 4.0
output_ticks = 4

# Training
sample_name = "hs04"

rng_seed = 2021
learning_rate = 0.01
n_mil_sample = 1.5
batch_size = 100
save_freq = 10

In [3]:
# cfg = meta.ModelConfig.from_json(os.path.join("models", code_name, "model_config.json"))

In [4]:
# Load global cfg variables into a dictionary for feeding into ModelConfig()

config_dict = {}
for v in meta.CORE_CONFIGS:
    try:
        config_dict[v] = globals()[v]
    except:
        raise

for v in meta.OPTIONAL_CONFIGS:
    try:
        config_dict[v] = globals()[v]
    except:
        pass

# Construct ModelConfig object
cfg = meta.ModelConfig(**config_dict)
cfg.save()
del config_dict

init from scratch
Saved config json to /home/jupyter/tf/models/testing_testset_metrics/model_config.json


# Build model and all supporting components

In [5]:
tf.random.set_seed(cfg.rng_seed)
data = data_wrangling.MyData()
model = modeling.HS04Model(cfg)

sampler = data_wrangling.FastSampling(cfg, data)
generators = {
    "pho_sem": sampler.sample_generator(x="pho", y="sem"),
    "sem_pho": sampler.sample_generator(x="sem", y="pho"),
    "pho_pho": sampler.sample_generator(x="pho", y="pho"),
    "sem_sem": sampler.sample_generator(x="sem", y="sem"),
}

# Instantiate optimizer for each task
optimizers = {
    "pho_pho": tf.keras.optimizers.Adam(learning_rate=cfg.learning_rate),
    "sem_sem": tf.keras.optimizers.Adam(learning_rate=cfg.learning_rate),
    "pho_sem": tf.keras.optimizers.Adam(learning_rate=cfg.learning_rate),
    "sem_pho": tf.keras.optimizers.Adam(learning_rate=cfg.learning_rate),
}

# Instantiate loss_fn for each task
loss_fns = {
    "pho_pho": tf.keras.losses.BinaryCrossentropy(),
    "sem_sem": tf.keras.losses.BinaryCrossentropy(),
    "pho_sem": tf.keras.losses.BinaryCrossentropy(),
    "sem_pho": tf.keras.losses.BinaryCrossentropy(),
}

# Mean loss (for TensorBoard)
train_losses = {
    "pho_pho": tf.keras.metrics.Mean("train_loss_pho_pho", dtype=tf.float32),
    "sem_sem": tf.keras.metrics.Mean("train_loss_sem_sem", dtype=tf.float32),
    "pho_sem": tf.keras.metrics.Mean("train_loss_pho_sem", dtype=tf.float32),
    "sem_pho": tf.keras.metrics.Mean("train_loss_sem_pho", dtype=tf.float32),
}

# Train metrics
train_acc = {
    "pho_pho": metrics.PhoAccuracy("acc_pho_pho"),
    "sem_sem": metrics.RightSideAccuracy("acc_sem_sem"),
    "pho_sem": metrics.RightSideAccuracy("acc_pho_sem"),
    "sem_pho": metrics.PhoAccuracy("acc_sem_pho"),
}

## Train step for each task

In [6]:
# Since each sub-task has its own states, it must be trained with separate optimizer,
# instead of sharing the same optimizer instance (https://github.com/tensorflow/tensorflow/issues/27120)


def get_train_step():
    """Wrap universal train step creator"""

    @tf.function
    def train_step(x, y, model, task, loss_fn, optimizer, train_metric, train_losses):

        train_weights_name = [x + ":0" for x in modeling.WEIGHTS_AND_BIASES[task]]
        train_weights = [x for x in model.weights if x.name in train_weights_name]

        with tf.GradientTape() as tape:
            y_pred = model(x, training=True)
            loss_value = loss_fn(y, y_pred)

        grads = tape.gradient(loss_value, train_weights)
        optimizer.apply_gradients(zip(grads, train_weights))

        # Mean loss for Tensorboard
        train_losses.update_state(loss_value)

        # Metric for last time step (output first dimension is time ticks, from -cfg.output_ticks to end)
        train_metric.update_state(tf.cast(y[-1], tf.float32), y_pred[-1])

    return train_step


train_steps = {
    "pho_pho": get_train_step(),
    "pho_sem": get_train_step(),
    "sem_sem": get_train_step(),
    "sem_pho": get_train_step(),
}

## Test step

In [7]:
# Test metrics
test_metrics = {}
test_metrics["homophone"] = {
    "pho_sem": metrics.RightSideAccuracy("homophone_acc_pho_sem"),
    "sem_pho": metrics.PhoAccuracy("homophone_acc_sem_pho"),
}

test_metrics["non_homophone"] = {
    "pho_sem": metrics.RightSideAccuracy("non_homophone_acc_pho_sem"),
    "sem_pho": metrics.PhoAccuracy("non_homophone_acc_sem_pho"),
}


@tf.function
def test_step(tasks, testsets):
    for testset in testsets:
        for task in tasks:
            model.set_active_task(task)
            x_name, y_name = task.split("_")
            x = [data.testsets[testset][x_name]] * cfg.n_timesteps
            y = data.testsets[testset][y_name]

            pred_y = model(x, training=False)
            test_metrics[testset][task].update_state(y, pred_y[-1])

# Train model

In [8]:
import time

model.build()
phase1_tasks = ["pho_sem", "sem_pho", "pho_pho", "sem_sem"]
phase1_tasks_probability = [0.4, 0.4, 0.1, 0.1]

# TensorBoard writer
train_summary_writer = tf.summary.create_file_writer(cfg.path["tensorboard_folder"])

for epoch in range(cfg.total_number_of_epoch):
    start_time = time.time()

    for step in range(cfg.steps_per_epoch):
        # Intermix tasks (Draw a new task in each step)
        task = np.random.choice(phase1_tasks, p=phase1_tasks_probability)
        x_batch_train, y_batch_train = next(generators[task])
        model.set_active_task(task)  # task switching must be done outside trainstep...

        train_steps[task](
            x_batch_train,
            y_batch_train,
            model,
            task,
            loss_fns[task],
            optimizers[task],
            train_acc[task],
            train_losses[task],
        )

    # End of epoch operations

    ## Write log to tensorboard
    with train_summary_writer.as_default():
        ### Losses
        [
            tf.summary.scalar(f"loss_{x}", train_losses[x].result(), step=epoch)
            for x in train_losses.keys()
        ]

        ### Metrics
        [
            tf.summary.scalar(f"acc_{x}", train_acc[x].result(), step=epoch)
            for x in train_acc.keys()
        ]

        ### Weight histogram
        [tf.summary.histogram(f"{x.name}", x, step=epoch) for x in model.weights]

    ## Print status
    compute_time = time.time() - start_time
    print(f"Epoch {epoch + 1} trained for {compute_time:.0f}s")
    print(
        "Losses:",
        [f"{x}: {train_losses[x].result().numpy()}" for x in phase1_tasks],
    )
    clear_output(wait=True)

    ## Save weights
    if (epoch < 10) or ((epoch + 1) % 10 == 0):
        weight_path = cfg.path["weights_checkpoint_fstring"].format(epoch=epoch + 1)
        model.save_weights(weight_path, overwrite=True, save_format="tf")

    ## Reset metric and loss
    [train_losses[x].reset_states() for x in train_losses.keys()]
    [train_acc[x].reset_states() for x in train_acc.keys()]

    # Test model
    test_sets = ["homophone", "non_homophone"]
    test_tasks = ["pho_sem", "sem_pho"]

    # Run evaluatin with tf.function
    test_step(tasks=test_tasks, testsets=test_sets)

    # Log to Tensorboard
    with train_summary_writer.as_default():
        for testset in test_sets:
            for task in test_tasks:
                # Write
                tf.summary.scalar(
                    f"acc_{testset}_{task}",
                    test_metrics[testset][task].result(),
                    step=epoch,
                )

                # Reset
                test_metrics[testset][task].reset_states()


# End of training ops
# model.save(cfg.path["save_model_folder"])
print("Done")

Done


# Evaluate model

In [28]:
import tensorboard as tb

experiment_id = "1UgPJK4xQTmeJ7uWVXcWCw"
experiment = tb.data.experimental.ExperimentFromDev(experiment_id)
df = experiment.get_scalars()

selection = alt.selection_multi(fields=["tag"], bind="legend")

alt.Chart(df).mark_line().encode(
    x="step:Q",
    y="value",
    color="tag",
    opacity=alt.condition(selection, alt.value(1), alt.value(0)),
).add_selection(selection).properties(title="All metrics").interactive()

In [None]:
# local ssh to cloud tensorboard
# gcloud compute ssh tensorflow-2-4-20210120-000018 --zone us-east4-b -- -L 6006:localhost:6006
# !tensorboard dev upload --logdir tensorboard_log