In [67]:
import tensorflow as tf
import pandas as pd
import numpy as np
import random

tf.data.experimental.enable_debug_mode()

Load the saved model

In [68]:
model = tf.keras.models.load_model('./dnn_model_keras')

I will make up some new inputs. These inputs are purposefully extreme so that we can verify that the model has been updated. I am initilializing an array of inputs with very low salaries for low years of experience. This should result in lower predictions for low years of experience after retraining.

In [63]:
# 10,000 inputs with 1 years exp and random salary between 20,000 and 30,000 
new_inputs_count = 10_000
new_inputs_labels = pd.Series([random.randint(20_000, 30_000)
                    for x in range(new_inputs_count)])
new_inputs_features = pd.DataFrame([{"YearsCode": 1, "YearsCodePro": 1} for x in range(new_inputs_count)])

This is what the original model predicts for someone with 1 years coding and coding professionally experience.

In [64]:
def predict1YOE():
    return model.predict(np.array([1, 1]))[0][0]

predict1YOE()

40783.863

Now I will refit the model using the new inputs.

Fitting additional data to the tensorflow model will update (not override) the currently model. This is documented in several online discussions including by the creator Chollet, here https://github.com/keras-team/keras/issues/4446

This should also be obvious from the result - the original prediction for someone with 1 year of experience coding and coding professionally was about $41,000. Our new inputs are 30,000 at maximum. So if it overrode the model it should not predict above 30,000. 

We can also verify that the new data with extremely low salaries for low experience makes lower predictions for someone with 1 years of exerpience coding and coding professionally.

In [65]:
model.fit(
    new_inputs_labels,
    new_inputs_features,
    validation_split=0.2,
    verbose=0, epochs=10)

<keras.callbacks.History at 0x7fd200a064a0>

In [66]:
predict1YOE()

37617.83

This outcome verifies that our model was updated succesfully. The prediction is lower than $41,000 but clearly not based solely off the new data ranging beween 20,000 and 30,000.

This notebook is essentially what the `retrainer` service will do, but we will get the new inputs from the database and save the resulting model so that it can be reused.