New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More comparison with existing methods? #24
Comments
Hi. Thanks for raising the issue. Comparison/ Benchmarks have been presented in this paper/ repository for Mish vs Swish, GELU and SELU. PAU and X-Units are not in my priority list to have a comparison against, but I can definitely run some experiments in the coming week. Additionally, xUnit is a block and not a function, so the more sensible approach would be to replace the non-linearity in xUnit with Mish and compare the two variant. |
It is due to their exotic nature that make it interesting to compare, as they may contain hidden information regarding what an optimal activation function should or should not look like. I would like to look into the issue as well. For reference one of the paper "Searching for Activation Functions" has a GitHub at https://github.com/Neoanarika/Searching-for-activation-functions and it might be possible to integrate that into the test. |
@DonaldTsang "Searching for Activation Functions" is the paper for Swish. All of my tests have compared Mish with Swish. What do you mean exactly by tests? |
@digantamisra98 the paper itself did list other "exotic forms" (not Swish itself) in Table 2 that are not on the table in the readME of the Mish, I would assume it is due to differences in naming schemes? If it is not just a difference in naming schemes, and that there are some activation functions that could be integrated into the repo for benchmarks, that would be great. |
@DonaldTsang The authors of that paper used a reinforcement learning algorithm to search the function space to obtain the best possible non-linear function which qualifies as an activation function. Out of all that were obtained in that search, Swish performed the best and hence I used Swish as a comparison benchmark against Mish and not the other activation which the algorithm found in that paper. |
@DonaldTsang So the other algorithms in https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/src/rnn_controller.py#L22 might not be as useful or as common, but worth exploring, I would assume? Or are you saying that the activation functions listed in the paper itself is "filler"? |
@DonaldTsang The other activation found by the search in that paper were not as efficient as Swish.
|
@DonaldTsang my current work with Mish involves more about Mean Field Theory - helping to find the Edge of Chaos and Rate of Convergence for Mish. These are more relevant since it will help to understand more of what's an ideal activation function like. |
Just wondering if all Activation Functions have been addressed in the ReadME.
The text was updated successfully, but these errors were encountered: