Transfer learning & Fine-tuning #17

Li-ZhuoHan · 2022-03-03T03:06:34Z

Hi, Henry. I've got a well trained astroNN model, but I want to do some transfer learning to make it adaptable to another survey. What I've done is remove the top dense layer of the base model and build a new dense layer, but now it can only be treat like an ordinary keras model. By the way, the base model itself is a custom model under the parent class ''BayesianCNNBase''

I'm wondering:

What should I do if I want to build a new astroNN model on an astroNN base model? Should I build a new class, say ''transfer_model'', under ''BayesianCNNBase'' and load the base model in my new def model() function?
How can I do the fine-tuning step(fit_on_batch seems not enough)?

Thank you!

Li-ZhuoHan · 2022-03-04T09:10:50Z

Now my code is:

Building

class Noah_transfer(BayesianCNNBase):
def init(self, lr=0.0005, dropout_rate=0.2):
super().init()
self.initializer = RandomNormal(mean=0.0, stddev=0.05)
self.max_epochs = 50
self.lr = lr
self.reduce_lr_epsilon = 0.00005
self.reduce_lr_min = 1e-8
self.reduce_lr_patience = 2
self.l2 = 1e-9
self.dropout_rate = dropout_rate
self.input_norm_mode = 3
self.task = 'regression'

def model(self):
    input_tensor = Input(shape=self._input_shape['input'], name='input')
    labels_err_tensor = Input(shape=self._labels_shape['output'], name='labels_err')
    noah = load_folder('Noah_giant')
    base_model = Model(inputs=noah.keras_model.input,
                       outputs=noah.keras_model.get_layer('dense_1').output)
    base_model.trainable = False
    x = base_model([input_tensor], training=False)
    output = Dense(units=self._labels_shape['output'],
                   activation='linear',
                   name='output')(x)
    variance_output = Dense(units=self._labels_shape['output'],
                            activation='linear',
                            name='variance_output')(x)
    model = Model(inputs=[input_tensor, labels_err_tensor], outputs=[output, variance_output])
    model_prediction = Model(inputs=[input_tensor], outputs=concatenate([output, variance_output]))

    variance_loss = mse_var_wrapper(output, labels_err_tensor)
    output_loss = mse_lin_wrapper(variance_output, labels_err_tensor)

    return model, model_prediction, output_loss, variance_loss

Training

noah_transfer = Noah_transfer()
noah_transfer.task = 'regression'
noah_transfer.fit(input_data=x_train,
labels=y_train,
inputs_err=x_train_err,
labels_err=y_train_err)

Both ''model'' and ''model_prediction'' can be printed by summary(), but it will raise an error during training:
Layer "model_2" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, 4500, 1) dtype=float32>]

Call arguments received:
  • inputs={'input': 'tf.Tensor(shape=(None, 4500, 1), dtype=float32)', 'input_err': 'tf.Tensor(shape=(None, None, None), dtype=float32)', 'labels_err': 'tf.Tensor(shape=(None, 11), dtype=float32)'}
  • training=True
  • mask=None

It seems that 'label' hasn't been taken into the training and model_2(which means ''model'' in this code) received only one input(which seems to be x_train)

…uning #17

henrysky · 2022-03-07T05:20:58Z

Sorry for the late reply.

I have add a function as a first step to solve your issue. So now this new function transfer_weights() should transfer all the weights to a new model (except the input and possibly the output layers) and set those transferred weights as non-trainable (so when you train on the other survey, only the input/output layers are trained, the middle layers are not trained). To use this new function, you should do git pull to pull the latest commit to your computer.

Here is an example:

from astroNN.models import ApogeeBCNN

# a model trained on the original survey
bneuralnet = ApogeeBCNN()
bneuralnet.fit(xdata, ydata)

# another astroNN model
bneuralnet2 = ApogeeBCNN()

# just to initialize the model with the correct input and output shape
bneuralnet2.max_epochs = 1
bneuralnet2.fit(xdata_another_survey, ydata_another_survey)
# transfer all the weights except layers with incompatible shape
bneuralnet2.transfer_weights(bneuralnet)

# training for real, the middle part of the model is not trainable
bneuralnet2.max_epochs = 60
bneuralnet2.fit(xdata_another_survey, ydata_another_survey)

# now bneuralnet2 is your new astroNN model transferred to anther survey with the same architecture of the original survey

Li-ZhuoHan · 2022-03-07T07:09:28Z

Thank you for your reply.

The two of us seem to have different ideas, your way is to transfer the weights of the base model while mine is to transfer the whole base model. Function transfer_weights() is a clever and effective way to do the transfer learning, it should be enough for me, for now.
But I still have some doubts:

why the training step goes wrong while all the models associated(noah, base_model, model, model_predcition) are good to be printed.
what if I want to splicing two models or add new layers directly after a base model?
This may have something to do with your architecture and could be complicated to implement, I'm not sure for that. Anyway, thanks to your efforts, it can work now and forgive me for leaving these doubts to you irresponsibly. Hope you can make astroNN more and more perfect and benefit more users.

Li-ZhuoHan · 2022-03-09T07:31:46Z

There still are some bugs.

When the output layer of my transfered model and the base model have the same number of nodes, the summary says that all of my params are non-trainable. But the weights of the transfered model's output layer should be trained.
Funny thing is that, if so, my loss should stay the same during the training step, but it turns out that the loss kept getting smaller which means the weights are still trained. This behavior is not only a departure from what I want but also a departure from the model summary.
On the other hand, when my output layer node count is different from the original model, trainable params is the sum of params in output layer and variance_output layer which is right. But in the training step, it seems that all the params are still trained.

henrysky · 2022-03-10T04:59:10Z

Yes it seems so that supposedly non-trainable parameters still get trained somehow. I am still investigating what is going on but most likely I need to set them to be non-trainable before compiling the model.

As for the output layer, the current strategy is to transfer all weights with compatible shape (i.e. if shape of weights are the same for a layer, then transfer those weights). I think what you want is to only train the input layer?? Or you can force a different output shape so that output layers wont get transferred (i.e. maybe train on T_eff and Log(g) for one survey and fe_h for another survey so output shapes are different). I think there could be a case where you have a small overlap between two surveys, then you can use the spectra from survey B but only train the input layer with label from the original survey A?

Regarding your questions from a few days ago, what do you mean by training step goes wrong? And yes splicing/adding layers probably requires more work but its not undoable per say but we need to make the simplest case working correctly first...

Li-ZhuoHan · 2022-03-10T06:48:13Z

Thank you for your patience and reply.

The training step failure happened because of model splicing a few days ago, but as you said, we should make the simplest case work first, so let's talk about it later.
What really important is that I want to train both the input layer and the output layer, whether the output layers have the same shape or not. (for now they are the same, so the weights are transfered and "locked")
The case is that I have a model trained on spectra from survey A but labels from survey B, now I want to transfer this model to train it on spectra from survey C and labels from survey B. I don't know if it will work, but I just want to take an atempt.

henrysky · 2022-03-14T03:48:26Z

I think I have fixed the issue of weights still being trained even after setting trainable=False, I also have added an argument exclusion_output=False so you can exclude output weights when transferring with transfer_weights(). You can checkout the latest commit to see if it is working for you

Li-ZhuoHan · 2022-03-14T08:40:41Z

Thank you for all the effort, it works now.

henrysky self-assigned this Mar 6, 2022

henrysky added a commit that referenced this issue Mar 7, 2022

add weight transfer as a first step to fix Transfer learning & Fine-t…

b58f5af

…uning #17

henrysky added a commit that referenced this issue Mar 7, 2022

break instead of continue to try to fix weights #17

c261c9e

henrysky added a commit that referenced this issue Mar 7, 2022

add unit test for the initial temp fix of #17

393e4df

henrysky added a commit that referenced this issue Mar 14, 2022

recompile model after weights transfer #17

dfe6dff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer learning & Fine-tuning #17

Transfer learning & Fine-tuning #17

Li-ZhuoHan commented Mar 3, 2022

Li-ZhuoHan commented Mar 4, 2022

henrysky commented Mar 7, 2022 •

edited

Li-ZhuoHan commented Mar 7, 2022

Li-ZhuoHan commented Mar 9, 2022

henrysky commented Mar 10, 2022

Li-ZhuoHan commented Mar 10, 2022

henrysky commented Mar 14, 2022

Li-ZhuoHan commented Mar 14, 2022

Transfer learning & Fine-tuning #17

Transfer learning & Fine-tuning #17

Comments

Li-ZhuoHan commented Mar 3, 2022

Li-ZhuoHan commented Mar 4, 2022

Building

Training

henrysky commented Mar 7, 2022 • edited

Li-ZhuoHan commented Mar 7, 2022

Li-ZhuoHan commented Mar 9, 2022

henrysky commented Mar 10, 2022

Li-ZhuoHan commented Mar 10, 2022

henrysky commented Mar 14, 2022

Li-ZhuoHan commented Mar 14, 2022

henrysky commented Mar 7, 2022 •

edited