Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer learning & Fine-tuning #17

Open
Li-ZhuoHan opened this issue Mar 3, 2022 · 8 comments
Open

Transfer learning & Fine-tuning #17

Li-ZhuoHan opened this issue Mar 3, 2022 · 8 comments
Assignees

Comments

@Li-ZhuoHan
Copy link

Hi, Henry. I've got a well trained astroNN model, but I want to do some transfer learning to make it adaptable to another survey. What I've done is remove the top dense layer of the base model and build a new dense layer, but now it can only be treat like an ordinary keras model. By the way, the base model itself is a custom model under the parent class ''BayesianCNNBase''

I'm wondering:

  1. What should I do if I want to build a new astroNN model on an astroNN base model? Should I build a new class, say ''transfer_model'', under ''BayesianCNNBase'' and load the base model in my new def model() function?
  2. How can I do the fine-tuning step(fit_on_batch seems not enough)?

Thank you!

@Li-ZhuoHan
Copy link
Author

Now my code is:

Building

class Noah_transfer(BayesianCNNBase):
def init(self, lr=0.0005, dropout_rate=0.2):
super().init()
self.initializer = RandomNormal(mean=0.0, stddev=0.05)
self.max_epochs = 50
self.lr = lr
self.reduce_lr_epsilon = 0.00005
self.reduce_lr_min = 1e-8
self.reduce_lr_patience = 2
self.l2 = 1e-9
self.dropout_rate = dropout_rate
self.input_norm_mode = 3
self.task = 'regression'

def model(self):
    input_tensor = Input(shape=self._input_shape['input'], name='input')
    labels_err_tensor = Input(shape=self._labels_shape['output'], name='labels_err')
    noah = load_folder('Noah_giant')
    base_model = Model(inputs=noah.keras_model.input,
                       outputs=noah.keras_model.get_layer('dense_1').output)
    base_model.trainable = False
    x = base_model([input_tensor], training=False)
    output = Dense(units=self._labels_shape['output'],
                   activation='linear',
                   name='output')(x)
    variance_output = Dense(units=self._labels_shape['output'],
                            activation='linear',
                            name='variance_output')(x)
    model = Model(inputs=[input_tensor, labels_err_tensor], outputs=[output, variance_output])
    model_prediction = Model(inputs=[input_tensor], outputs=concatenate([output, variance_output]))

    variance_loss = mse_var_wrapper(output, labels_err_tensor)
    output_loss = mse_lin_wrapper(variance_output, labels_err_tensor)

    return model, model_prediction, output_loss, variance_loss

Training

noah_transfer = Noah_transfer()
noah_transfer.task = 'regression'
noah_transfer.fit(input_data=x_train,
labels=y_train,
inputs_err=x_train_err,
labels_err=y_train_err)

Both ''model'' and ''model_prediction'' can be printed by summary(), but it will raise an error during training:
Layer "model_2" expects 2 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, 4500, 1) dtype=float32>]

Call arguments received:
  • inputs={'input': 'tf.Tensor(shape=(None, 4500, 1), dtype=float32)', 'input_err': 'tf.Tensor(shape=(None, None, None), dtype=float32)', 'labels_err': 'tf.Tensor(shape=(None, 11), dtype=float32)'}
  • training=True
  • mask=None

It seems that 'label' hasn't been taken into the training and model_2(which means ''model'' in this code) received only one input(which seems to be x_train)

@henrysky
Copy link
Owner

henrysky commented Mar 7, 2022

Sorry for the late reply.

I have add a function as a first step to solve your issue. So now this new function transfer_weights() should transfer all the weights to a new model (except the input and possibly the output layers) and set those transferred weights as non-trainable (so when you train on the other survey, only the input/output layers are trained, the middle layers are not trained). To use this new function, you should do git pull to pull the latest commit to your computer.

Here is an example:

from astroNN.models import ApogeeBCNN

# a model trained on the original survey
bneuralnet = ApogeeBCNN()
bneuralnet.fit(xdata, ydata)

# another astroNN model
bneuralnet2 = ApogeeBCNN()

# just to initialize the model with the correct input and output shape
bneuralnet2.max_epochs = 1
bneuralnet2.fit(xdata_another_survey, ydata_another_survey)
# transfer all the weights except layers with incompatible shape
bneuralnet2.transfer_weights(bneuralnet)

# training for real, the middle part of the model is not trainable
bneuralnet2.max_epochs = 60
bneuralnet2.fit(xdata_another_survey, ydata_another_survey)

# now bneuralnet2 is your new astroNN model transferred to anther survey with the same architecture of the original survey

@Li-ZhuoHan
Copy link
Author

Thank you for your reply.

The two of us seem to have different ideas, your way is to transfer the weights of the base model while mine is to transfer the whole base model. Function transfer_weights() is a clever and effective way to do the transfer learning, it should be enough for me, for now.
But I still have some doubts:

  1. why the training step goes wrong while all the models associated(noah, base_model, model, model_predcition) are good to be printed.
  2. what if I want to splicing two models or add new layers directly after a base model?
    This may have something to do with your architecture and could be complicated to implement, I'm not sure for that. Anyway, thanks to your efforts, it can work now and forgive me for leaving these doubts to you irresponsibly. Hope you can make astroNN more and more perfect and benefit more users.

@Li-ZhuoHan
Copy link
Author

There still are some bugs.

When the output layer of my transfered model and the base model have the same number of nodes, the summary says that all of my params are non-trainable. But the weights of the transfered model's output layer should be trained.
Funny thing is that, if so, my loss should stay the same during the training step, but it turns out that the loss kept getting smaller which means the weights are still trained. This behavior is not only a departure from what I want but also a departure from the model summary.
On the other hand, when my output layer node count is different from the original model, trainable params is the sum of params in output layer and variance_output layer which is right. But in the training step, it seems that all the params are still trained.

@henrysky
Copy link
Owner

Yes it seems so that supposedly non-trainable parameters still get trained somehow. I am still investigating what is going on but most likely I need to set them to be non-trainable before compiling the model.

As for the output layer, the current strategy is to transfer all weights with compatible shape (i.e. if shape of weights are the same for a layer, then transfer those weights). I think what you want is to only train the input layer?? Or you can force a different output shape so that output layers wont get transferred (i.e. maybe train on T_eff and Log(g) for one survey and fe_h for another survey so output shapes are different). I think there could be a case where you have a small overlap between two surveys, then you can use the spectra from survey B but only train the input layer with label from the original survey A?

Regarding your questions from a few days ago, what do you mean by training step goes wrong? And yes splicing/adding layers probably requires more work but its not undoable per say but we need to make the simplest case working correctly first...

@Li-ZhuoHan
Copy link
Author

Thank you for your patience and reply.

The training step failure happened because of model splicing a few days ago, but as you said, we should make the simplest case work first, so let's talk about it later.
What really important is that I want to train both the input layer and the output layer, whether the output layers have the same shape or not. (for now they are the same, so the weights are transfered and "locked")
The case is that I have a model trained on spectra from survey A but labels from survey B, now I want to transfer this model to train it on spectra from survey C and labels from survey B. I don't know if it will work, but I just want to take an atempt.

@henrysky
Copy link
Owner

I think I have fixed the issue of weights still being trained even after setting trainable=False, I also have added an argument exclusion_output=False so you can exclude output weights when transferring with transfer_weights(). You can checkout the latest commit to see if it is working for you

@Li-ZhuoHan
Copy link
Author

Thank you for all the effort, it works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants