New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SELU values for a truncated normal distribution #10
Comments
@carlthome I don't really trust the truncation either. Was looking at how truncated normal weights might actually change the distribution with a modified version of SNNs/getSELUparameters. These are the numbers I get for truncated / not-truncated normally distributed weights. They seem quite different. Normal weights
Truncated normal weights
CodeHere's the modified cell of SNNs/getSELUparameters I used: import tensorflow as tf
import numpy as np
from __future__ import absolute_import, division, print_function
import numbers
from tensorflow.contrib import layers
from tensorflow.python.framework import ops
from tensorflow.python.framework import tensor_shape
from tensorflow.python.framework import tensor_util
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import random_ops
from tensorflow.python.ops import array_ops
from tensorflow.python.layers import utils
in_data = tf.random_normal([10000, 50000], mean=myFixedPointMean, stddev=np.sqrt(myFixedPointVar))
# Truncated normal weights
weights = tf.truncated_normal([50000, 1], mean=0., stddev=1 / np.sqrt(50000))
# Normal weights
#weights = tf.random_normal([50000, 1], mean=0., stddev=1 / np.sqrt(50000))
x = tf.matmul(in_data, weights)
w = selu(x)
y = dropout_selu(w,0.2,training=True)
init = tf.global_variables_initializer()
gpu_options = tf.GPUOptions(allow_growth=True)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
sess.run(init)
in_data, x, w, y = sess.run([in_data, x, w, y])
print("mean/var should be at:", myFixedPointMean, "/", myFixedPointVar)
print("Input data mean/var: ", "{:.12f}".format(np.mean(in_data)), "/", "{:.12f}".format(np.var(in_data)))
print("After selu: ", "{:.12f}".format(np.mean(w)), "/", "{:.12f}".format(np.var(w)))
print("After dropout mean/var", "{:.12f}".format(np.mean(y)), "/", "{:.12f}".format(np.var(y))) |
It's simply because the variance is incorrect. The parameters of the function to generate truncated normals uses the parameter stddev before truncation, but it actually should be sqrt(1/n) after truncation. My solution was to solve the expressions for the variance of the truncated Gaussian for the variance of the non-truncated Gaussian. |
SNNs/selu.py
Line 31 in f992b22
I read in the paper that "Uniform and truncated Gaussian distributions with these moments led to
networks with similar behavior." but this feels unsatisfactory to me. Maybe a small discrepancy becomes really problematic for deeper networks? This aligns with my experience that it's still beneficial to have batchnorm/layernorm with SELU.
The text was updated successfully, but these errors were encountered: