Let's look at a simple example of using transfer learning to build upon an imagenet neural network. We will begin by training a neural network for Fisher's Iris Dataset. This network takes four measurements and classifies each observation into three iris species. However, what if later we received a data set that included the four measurements, plus a cost as the target? This dataset does not contain the species; as a result, it uses the same four inputs as the base model we just trained.

We can take our previously trained iris network and transfer the weights to a new neural network that will learn to predict the cost through transfer learning. Also of note, the original neural network was a classification network, yet we now use it to build a regression neural network. Such a transformation is common for transfer learning. As a reference point, I randomly created this iris cost dataset

In [2]:
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris.csv", 
    na_values=['NA', '?'])

# Convert to numpy - Classification
x = df[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
dummies = pd.get_dummies(df['species']) # Classification
species = dummies.columns
y = dummies.values


# Build neural network
model = Sequential()
model.add(Dense(50, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y.shape[1],activation='softmax')) # Output

model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(x,y,verbose=2,epochs=100)

Epoch 1/100
5/5 - 1s - loss: 0.9395 - 578ms/epoch - 116ms/step
Epoch 2/100
5/5 - 0s - loss: 0.8442 - 12ms/epoch - 2ms/step
Epoch 3/100
5/5 - 0s - loss: 0.7790 - 11ms/epoch - 2ms/step
Epoch 4/100
5/5 - 0s - loss: 0.7190 - 9ms/epoch - 2ms/step
Epoch 5/100
5/5 - 0s - loss: 0.6685 - 11ms/epoch - 2ms/step
Epoch 6/100
5/5 - 0s - loss: 0.6252 - 12ms/epoch - 2ms/step
Epoch 7/100
5/5 - 0s - loss: 0.5894 - 13ms/epoch - 3ms/step
Epoch 8/100
5/5 - 0s - loss: 0.5588 - 10ms/epoch - 2ms/step
Epoch 9/100
5/5 - 0s - loss: 0.5289 - 9ms/epoch - 2ms/step
Epoch 10/100
5/5 - 0s - loss: 0.5050 - 9ms/epoch - 2ms/step
Epoch 11/100
5/5 - 0s - loss: 0.4828 - 9ms/epoch - 2ms/step
Epoch 12/100
5/5 - 0s - loss: 0.4610 - 8ms/epoch - 2ms/step
Epoch 13/100
5/5 - 0s - loss: 0.4352 - 9ms/epoch - 2ms/step
Epoch 14/100
5/5 - 0s - loss: 0.4137 - 8ms/epoch - 2ms/step
Epoch 15/100
5/5 - 0s - loss: 0.3908 - 8ms/epoch - 2ms/step
Epoch 16/100
5/5 - 0s - loss: 0.3705 - 9ms/epoch - 2ms/step
Epoch 17/100
5/5 - 0s - loss: 0.3459 - 

<keras.callbacks.History at 0x7eff1b038b10>

To keep this example simple, we are not setting aside a validation set. The goal of this example is to show how to create a multi-layer neural network, where we transfer the weights to another network. We begin by evaluating the accuracy of the network on the training set.

In [3]:
from sklearn.metrics import accuracy_score
pred = model.predict(x)
predict_classes = np.argmax(pred,axis=1)
expected_classes = np.argmax(y,axis=1)
correct = accuracy_score(expected_classes,predict_classes)
print(f"Training Accuracy: {correct}")

Training Accuracy: 0.98


# Create a New Iris Network
Now that we've trained a neural network on the iris dataset, we can transfer the knowledge of this neural network to other neural networks. It is possible to create a new neural network from some or all of the layers of this neural network. We will create a new neural network that is essentially a clone of the first neural network to demonstrate the technique. We now transfer all of the layers from the original neural network into the new one.

In [4]:
model2 = Sequential()
for layer in model.layers:
    model2.add(layer)
model2.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 50)                250       
                                                                 
 dense_1 (Dense)             (None, 25)                1275      
                                                                 
 dense_2 (Dense)             (None, 3)                 78        
                                                                 
Total params: 1,603
Trainable params: 1,603
Non-trainable params: 0
_________________________________________________________________


# Transfering to a Regression Network

The Iris Cost Dataset has measurements for samples of these flowers that conform to the predictors contained in the original iris dataset: sepal width, sepal length, petal width, and petal length.

In [5]:
df_cost = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris_cost.csv", 
    na_values=['NA', '?'])

df_cost

Unnamed: 0,sepal_l,sepal_w,petal_l,petal_w,cost
0,7.8,3.0,6.2,2.0,10.740
1,5.0,2.2,1.7,1.5,2.710
2,6.9,2.6,3.7,1.4,4.624
3,5.9,2.2,3.7,2.4,6.558
4,5.1,3.9,6.8,0.7,7.395
...,...,...,...,...,...
245,4.7,2.1,4.0,2.3,5.721
246,7.2,3.0,4.3,1.1,5.266
247,6.6,3.4,4.6,1.4,5.776
248,5.7,3.7,3.1,0.4,2.233


For transfer learning to be effective, the input for the newly trained neural network most closely conforms to the first neural network we transfer.

We will strip away the last output layer that contains the softmax activation function that performs this final classification. We will create a new output layer that will output the cost prediction. We will only train the weights in this new layer. We will mark the first two layers as non-trainable. The hope is that the first few layers have learned to abstract the raw input data in a way that is also helpful to the new neural network. This process is accomplished by looping over the first few layers and copying them to the new neural network. We output a summary of the new neural network to verify that Keras stripped the previous output layer.

In [6]:
model3 = Sequential()
for i in range(2):
    layer = model.layers[i]
    layer.trainable = False
    model3.add(layer)
model3.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 50)                250       
                                                                 
 dense_1 (Dense)             (None, 25)                1275      
                                                                 
Total params: 1,525
Trainable params: 0
Non-trainable params: 1,525
_________________________________________________________________


In [7]:
model3.add(Dense(1)) # Output

model3.compile(loss='mean_squared_error', optimizer='adam')
model3.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 50)                250       
                                                                 
 dense_1 (Dense)             (None, 25)                1275      
                                                                 
 dense_3 (Dense)             (None, 1)                 26        
                                                                 
Total params: 1,551
Trainable params: 26
Non-trainable params: 1,525
_________________________________________________________________


In [8]:
# Convert to numpy - Classification
x = df_cost[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
y = df_cost.cost.values

# Train the last layer of the network
model3.fit(x,y,verbose=2,epochs=100)

Epoch 1/100
8/8 - 1s - loss: 60.2365 - 624ms/epoch - 78ms/step
Epoch 2/100
8/8 - 0s - loss: 57.2644 - 24ms/epoch - 3ms/step
Epoch 3/100
8/8 - 0s - loss: 54.3914 - 23ms/epoch - 3ms/step
Epoch 4/100
8/8 - 0s - loss: 51.5861 - 42ms/epoch - 5ms/step
Epoch 5/100
8/8 - 0s - loss: 48.9371 - 24ms/epoch - 3ms/step
Epoch 6/100
8/8 - 0s - loss: 46.4227 - 23ms/epoch - 3ms/step
Epoch 7/100
8/8 - 0s - loss: 43.9635 - 25ms/epoch - 3ms/step
Epoch 8/100
8/8 - 0s - loss: 41.6181 - 26ms/epoch - 3ms/step
Epoch 9/100
8/8 - 0s - loss: 39.4982 - 34ms/epoch - 4ms/step
Epoch 10/100
8/8 - 0s - loss: 37.3114 - 21ms/epoch - 3ms/step
Epoch 11/100
8/8 - 0s - loss: 35.3462 - 24ms/epoch - 3ms/step
Epoch 12/100
8/8 - 0s - loss: 33.4820 - 24ms/epoch - 3ms/step
Epoch 13/100
8/8 - 0s - loss: 31.6481 - 23ms/epoch - 3ms/step
Epoch 14/100
8/8 - 0s - loss: 29.9383 - 17ms/epoch - 2ms/step
Epoch 15/100
8/8 - 0s - loss: 28.3541 - 22ms/epoch - 3ms/step
Epoch 16/100
8/8 - 0s - loss: 26.8030 - 37ms/epoch - 5ms/step
Epoch 17/100
8/

<keras.callbacks.History at 0x7eff1af329d0>

In [9]:
from sklearn.metrics import accuracy_score
pred = model3.predict(x)
score = np.sqrt(metrics.mean_squared_error(pred,y))
print(f"Final score (RMSE): {score}")

Final score (RMSE): 1.5586627443709196
