-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom loss function example using kad_op functions #52
Comments
Does your cost function take a new form or simply combine mse, ce, etc? If the former, you will need a new operator and have to implement backward propagation by yourself. If the latter, you can chain kad_op functions. You may have a look at the implementation of kann_layer_cost() to get an idea: You need to provide a truth vector and label nodes with |
Thanks for answering. I'm trying to implement Hinton's Forward-forward algorithm. I tried to implement this as a layer in the graph as follows:
I have two questions: Thus, my attempt at writing the operator:
I'm now trying to figure out the derivative of the loss function for the backward computation.
? |
In Hinton's preprint, it seems that he is using logistic function. Why do you choose log(1+exp)? It could go to infinity in theory? |
Yes, to be frank, I've seen log(1+exp) in an implementation I've found on github. I compared it with 1/(1+exp) and the log(1+exp) is working better for my dataset. I have a working python implementation for both. Anyways, in theory yes it can go to infinity, but in practice having the negative samples limit that behavior. I've made some progress today with the loss. I can compute the forward loss correctly (matches with the reference python implementation). Now I'm working on the backward pass. I'm stuck on the |
If I understand correctly, If you want to implement on your own, you may consider to implement an operator for kann_t *model_gen(float c_val)
{
kad_node_t *x, *h2, *c, *y, *t, *cost;
x_in = kad_feed(2, 1, 128), x_in->ext_flag |= KANN_F_IN;
t = kann_layer_layernorm(x_in), t->ext_label = 11;
t = kad_add(kad_cmul(x_in, w), b), t->ext_label = 12;
t = kad_relu(t), t->ext_label = 13; // Activations
h2 = kad_reduce_sum(kad_square(t),1); // \sum h^2
c = kann_new_scalar(KAD_CONST, c_val), c->ext_label = 31; // a new constant
t = kad_sub(h2, c), t->ext_label |= KANN_F_OUT; // \sum h^2 - c
y = kad_feed(2, 1, 1), y->ext_flag |= KANN_F_TRUTH; // truth
t = kad_exp(kad_cmul(y, t)); // exp(y * (\sum h^2 - c))
cost = kad_log(kad_add(kann_new_scalar(KAD_CONST, 1.0), t)); // the entire thing
cost->ext_flag |= KANN_F_COST;
return kann_new(cost, 0);
} |
Hi again, I have a working implementation of BP and FF for my dataset. For example, I have an example 3-layer MLP with 100 neurons in each layer. My input vector has a length of 120. I compile the application in debug mode, and iterate over the layers using the kad_print_graph function. At the end of each line, I added
Here, I see two oddities: In short, I'd like to estimate the memory footprint for training this network. Am I on the right track? |
Hi,
I'm trying to implement a custom loss function with a simple MLP.
Is there an example of using the kad_op functions to accomplish this so that I benefit from automatic differentiation?
I don't want to explicitly write the backward computation as is the case for the currently implemented loss functions (mse, ce, etc).
Or is this approach not feasible (for memory consumption reasons) as it will require the computation and storage of the gradients for each operation in the loss function?
I'd greatly appreciate any help/feedback/example!
Thanks!
The text was updated successfully, but these errors were encountered: