In [1]:

import pandas as pd
from src.learner import *
import time

2024-02-21 20:24:18.986531: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-21 20:24:19.086042: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX_VNNI, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Some tuning for the `kidwords` set of words.

In [2]:
# inputs and outputs
X = np.genfromtxt('data/kidwords/orth-kid.csv', delimiter=",")
Y = np.genfromtxt('data/kidwords/phon-kid.csv', delimiter=",")
words = pd.read_csv('data/kidwords/kidwords.csv', header=None)[0].tolist()

For tuning we will use a random sample of the same size that our samples will eventually be. This involves allocating 600 words for test, and the rest for train - but not using our pre-allocated samples for the purpose.

In [3]:
np.random.seed(982)

target_train_size = 300

train_n = X.shape[0]
test_n = train_n-target_train_size

sample = np.full(train_n, True, dtype=bool)

indices = np.random.choice(train_n, test_n, replace=False)

# Set chosen indices to True because they select the test items not the train items
sample[indices] = False

Limited search across HPs...

In [5]:
seed = 387

In [None]:
with open('outputs/tune_kidwords_1.csv', 'w') as f:
    f.write("{},{},{},{},{},{},{},{},{},{},{}\n".format(
                                            "hidden_units",
                                            "learning_rate",
                                             "batch_size",
                                             "epochs",
                                             "loss_train",
                                             "accuracy_train",
                                             "mse_train",
                                             "loss_test",
                                             "accuracy_test",
                                             "mse_test",
                                             "time"))
    for learning_rate in [.01, .025, .05, .075, .1, .15, .2, .25, None]: 
        for batch_size in [16, 32, 64, 96, 128, 256]:
            for epochs in [50, 100, 150, 200, 250, 300]:
                for hidden in [80, 100, 120]:

                    if learning_rate is not None:
                        optimizer = Adam(learning_rate=learning_rate)
                    if learning_rate is None:
                        optimzer = None

                    model = learner(X, Y, seed, hidden, optimizer=None)
                    
                    start_time = time.time()


                    model.fit(X[sample], Y[sample], epochs=epochs, batch_size=batch_size, verbose=False)

                    end_time = time.time()
                    runtime = end_time - start_time

                    loss_train, accuracy_train, mse_train = model.evaluate(X[sample], Y[sample], verbose=0) 
                    loss_test, accuracy_test, mse_test = model.evaluate(X[~sample], Y[~sample], verbose=0) 

                    f.write("{},{},{},{},{},{},{},{},{},{},{}\n".format(
                                                    hidden,
                                                    learning_rate,
                                                    batch_size,
                                                    epochs,
                                                    loss_train,
                                                    accuracy_train,
                                                    mse_train,
                                                    loss_test,
                                                    accuracy_test,
                                                    mse_test,
                                                    runtime))
f.close()

The following configuration is the peak performer for models with 100 hidden units. These models are trivially different in performance than those for the 120 hidden unit versions, and outperform the 120 hidden unit versions on the holdout set By a very small amount. See `tune_kidwords.Rmd` for a summary of performance.

train_accuracy = 0.997  
test_accuracy = 0.986  
time = 4.74 seconds

Instead of 300 epochs, for speed we will go with 50...the differences in end performance are trivial (~ .0005 on binary accuracy difference)

In [8]:
hidden = 100
learning_rate = 0.01
batch_size = 16
epochs = 50

Get train and test indices

In [9]:
train_indices = np.where(sample)[0]
test_indices = np.where(~sample)[0]

In [10]:
split = []

for i, word in enumerate(words):
    if i in train_indices:
        split.append('train')
    elif i in test_indices:
        split.append('test')

In [16]:
model = learner(X, Y, seed=seed, hidden=hidden, optimizer=Adam(learning_rate=learning_rate))
        
start_time = time.time()

model.fit(X[sample], Y[sample], epochs=50, batch_size=batch_size, verbose=True)

end_time = time.time()
runtime = end_time - start_time
print("Run time...", runtime)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Run time... 1.3481016159057617


This configuration will do for brute force runs. I'll run those with 10K iterations across several values for hidden units and see where that gets us. See brute_force_1.ipynb for the next step.