-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Description
KAT used the variance preserving initialization as formulated in the Kaimining initialization for learnable rational activations. This implies calculating the 2nd order moment of a rational function, which has a complicated closed form. We show that this 2nd order moment can be easily computed by considering orthogonal functions. As an example, we used orthogonal polynomials (Hermite) and trigonometric functions (Fourier) and showed that they can be used to achieve better results in image classification on ImageNet using ConvNeXt and next token prediction on OpenWebText using GPT-2.
📄 Paper: Learnable Polynomial, Trigonometric, and Tropical Activations
💻 Code: torchortho on GitHub
Metadata
Metadata
Assignees
Labels
No labels