New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom loss and optimizers for BERT models through the ktrain Transformers call? #228
Comments
Hi, The answer is yes, as ktrain is just a lightweight wrapper around The models returned by Also, as an aside, all |
Great. Thank you. I will check out the class weights as well. I'm trying to incorporate focal loss with ktrain, and I'm getting hit with the following error. The focal loss implementation was taken from the above link.
I'm debugging it, but I appreciate it if you could take a look at it as well if you have some time, since you have an idea of how the downstream lr tuning methods work as well. |
Sorted out that issue, but my loss is currently nan. Here is the updated focal loss code. `def focal_loss(gamma=2., alpha=4.):
|
Have you verified that this works outside of ktrain with a simple baseline |
Unlike other models in ktrain, import tensorflow as tf
from tensorflow.keras import activations
def focal_loss(gamma=2., alpha=4., from_logits=False):
gamma = float(gamma)
alpha = float(alpha)
def focal_loss_fixed(y_true, y_pred):
"""Focal loss for multi-classification
FL(p_t)=-alpha(1-p_t)^{gamma}ln(p_t)
Notice: y_pred is model output BEFORE softmax, if from_logits in True
gradient is d(Fl)/d(p_t) not d(Fl)/d(x) as described in paper
d(Fl)/d(p_t) * [p_t(1-p_t)] = d(Fl)/d(x)
Focal Loss for Dense Object Detection
https://arxiv.org/abs/1708.02002
Arguments:
y_true {tensor} -- ground truth labels, shape of [batch_size, num_cls]
y_pred {tensor} -- model's output, shape of [batch_size, num_cls]
Keyword Arguments:
gamma {float} -- (default: {2.0})
alpha {float} -- (default: {4.0})
Returns:
[tensor] -- loss.
"""
epsilon = 1.e-9
y_true = tf.cast(y_true, dtype=tf.float32)
y_pred = tf.cast(y_pred, dtype=tf.float32)
if from_logits: y_pred = activations.softmax(y_pred)
model_out = tf.add(y_pred, epsilon)
ce = tf.multiply(y_true, -tf.math.log(model_out))
weight = tf.multiply(y_true, tf.pow(tf.subtract(1., model_out), gamma))
fl = tf.multiply(alpha, tf.multiply(weight, ce))
reduced_fl = tf.reduce_max(fl, axis=1)
return tf.reduce_mean(reduced_fl)
return focal_loss_fixed |
Thank you. I was actually just in the process of sending over the Colab notebook. This works on my end and I'm sure this will be helpful for anyone in the future trying to work with imbalanced datasets. Class weights, focal loss are two of the better ways to handle, without the need for synthetic sampling. One question, can we get class weights to work with learner.lr_find() as well? Currently it works with fit methods as you described as such learner.fit_onecycle(2e-6, 4,class_weight=class_weight_dict) |
Thanks - |
Hi, I looked through the FAQ and closed issues as much as possible so apologies if answered already.
Can I add different loss functions and optimizers for pre-trained BERT models? To give you my use case, I have an imbalanced dataset so I'm looking to use focal loss. Thanks for the very useful library!
The text was updated successfully, but these errors were encountered: