Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using float32 in gradient checking may be not appropriate due to large rounding error #4283

Closed
pengli09 opened this issue Sep 21, 2017 · 1 comment

Comments

@pengli09
Copy link
Contributor

Major problem: float32's rounding error is so large that it may dominate the difference between the numerical gradients and the analytical gradients, which cases relatively large relative error in gradient checking. As a consensus, the gradient checker used in unit tests may be unreliable.

Potential solution:

  1. Choosing epsilon carefully to make rounding error reasonable. However, this is a challenging task. See https://en.wikipedia.org/wiki/Numerical_differentiation and the experiments in the end of this issue.
  2. Using float64 instead of float32. Reference: http://cs231n.github.io/neural-networks-3/

Experiments
The differences between the numerical and analytical gradients of the linear function f(x, y) = x^T * y are shown as bellow. We can conclude that

  1. Although linear function is very simple, the absolute error and relative error are unacceptable large if float32 is used.
  2. The errors are very small is float64 is used.
  3. If the scale of epsilon is comparable with x/y, errors will be small. But I'm not sure whether this conclusion generalizes to more complicated functions.
x_shape (1, 200) y_shape (200, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	5.96046e-08          	6.27301e-08          	5.96046e-08          	0.485359             	0.532412             
 1000.000000000000000	0                    	0                    	0                    	0.485359             	0.532412             
  100.000000000000000	5.96046e-08          	6.27301e-08          	5.96046e-08          	0.485359             	0.532412             
   10.000000000000000	2.98023e-07          	3.13651e-07          	2.98023e-07          	0.485359             	0.532412             
    1.000000000000000	1.19209e-07          	1.2546e-07           	1.19209e-07          	0.485359             	0.532412             
    0.100000000000000	7.7486e-06           	8.15491e-06          	7.7486e-06           	0.485359             	0.532412             
    0.010000000000000	6.49691e-05          	6.83758e-05          	6.49691e-05          	0.485359             	0.532412             
    0.050000000000000	2.68221e-05          	2.82285e-05          	2.68221e-05          	0.485359             	0.532412             
    0.001000000000000	0.00031656           	0.00033316           	0.00031656           	0.485359             	0.532412             
    0.000100000000000	0.0034982            	0.00368163           	0.0034982            	0.485359             	0.532412             
    0.000010000000000	0.194233             	0.204418             	0.194233             	0.485359             	0.532412             

x_shape (1, 200) y_shape (200, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.485358             	0.532412             
 1000.000000000000000	0                    	0                    	0                    	0.485358             	0.532412             
  100.000000000000000	0                    	0                    	0                    	0.485358             	0.532412             
   10.000000000000000	2.22045e-16          	2.33688e-16          	2.22045e-16          	0.485358             	0.532412             
    1.000000000000000	2.66454e-15          	2.80425e-15          	2.66454e-15          	0.485358             	0.532412             
    0.100000000000000	2.39808e-14          	2.52383e-14          	2.39808e-14          	0.485358             	0.532412             
    0.010000000000000	3.79252e-13          	3.99139e-13          	3.79252e-13          	0.485358             	0.532412             
    0.050000000000000	9.50351e-14          	1.00018e-13          	9.50351e-14          	0.485358             	0.532412             
    0.001000000000000	3.31291e-13          	3.48662e-13          	3.31291e-13          	0.485358             	0.532412             
    0.000100000000000	1.38796e-11          	1.46074e-11          	1.38796e-11          	0.485358             	0.532412             
    0.000010000000000	1.28229e-10          	1.34953e-10          	1.28229e-10          	0.485358             	0.532412             

---------------------------------------------------------------
x_shape (1, 84) y_shape (84, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.475109             	0.482829             
 1000.000000000000000	2.98023e-08          	1.10408e-07          	2.98023e-08          	0.475109             	0.482829             
  100.000000000000000	2.98023e-08          	1.10408e-07          	2.98023e-08          	0.475109             	0.482829             
   10.000000000000000	0                    	0                    	0                    	0.475109             	0.482829             
    1.000000000000000	8.9407e-08           	3.31225e-07          	8.9407e-08           	0.475109             	0.482829             
    0.100000000000000	8.9407e-08           	3.31225e-07          	8.9407e-08           	0.475109             	0.482829             
    0.010000000000000	3.80576e-05          	0.000140992          	3.80576e-05          	0.475109             	0.482829             
    0.050000000000000	8.9407e-08           	3.31225e-07          	8.9407e-08           	0.475109             	0.482829             
    0.001000000000000	3.80576e-05          	0.000140992          	3.80576e-05          	0.475109             	0.482829             
    0.000100000000000	0.00289908           	0.0107402            	0.00289908           	0.475109             	0.482829             
    0.000010000000000	0.079193             	0.293386             	0.079193             	0.475109             	0.482829             

x_shape (1, 84) y_shape (84, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.475109             	0.482829             
 1000.000000000000000	5.55112e-17          	2.05652e-16          	5.55112e-17          	0.475109             	0.482829             
  100.000000000000000	5.55112e-17          	2.05652e-16          	5.55112e-17          	0.475109             	0.482829             
   10.000000000000000	5.55112e-17          	2.05652e-16          	5.55112e-17          	0.475109             	0.482829             
    1.000000000000000	6.66134e-16          	2.46782e-15          	6.66134e-16          	0.475109             	0.482829             
    0.100000000000000	7.77156e-15          	2.87912e-14          	7.77156e-15          	0.475109             	0.482829             
    0.010000000000000	6.10623e-14          	2.26217e-13          	6.10623e-14          	0.475109             	0.482829             
    0.050000000000000	9.99201e-15          	3.70173e-14          	9.99201e-15          	0.475109             	0.482829             
    0.001000000000000	4.16334e-13          	1.54239e-12          	4.16334e-13          	0.475109             	0.482829             
    0.000100000000000	6.68909e-12          	2.4781e-11           	6.68909e-12          	0.475109             	0.482829             
    0.000010000000000	1.84325e-10          	6.82867e-10          	1.84325e-10          	0.475109             	0.482829             

---------------------------------------------------------------
x_shape (1, 10) y_shape (10, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
 1000.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
  100.000000000000000	2.98023e-08          	7.10943e-08          	2.98023e-08          	0.314629             	0.419932             
   10.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
    1.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
    0.100000000000000	1.78814e-07          	4.26566e-07          	1.78814e-07          	0.314629             	0.419932             
    0.010000000000000	1.01328e-06          	2.4172e-06           	1.01328e-06          	0.314629             	0.419932             
    0.050000000000000	1.78814e-07          	4.26566e-07          	1.78814e-07          	0.314629             	0.419932             
    0.001000000000000	4.91738e-06          	1.17306e-05          	4.91738e-06          	0.314629             	0.419932             
    0.000100000000000	0.000173867          	0.000414764          	0.000173867          	0.314629             	0.419932             
    0.000010000000000	0.00399846           	0.00953843           	0.00399846           	0.314629             	0.419932             

x_shape (1, 10) y_shape (10, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	5.55112e-17          	1.32423e-16          	5.55112e-17          	0.314629             	0.419932             
 1000.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
  100.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
   10.000000000000000	0                    	0                    	0                    	0.314629             	0.419932             
    1.000000000000000	1.11022e-16          	2.64847e-16          	1.11022e-16          	0.314629             	0.419932             
    0.100000000000000	9.99201e-16          	2.38362e-15          	9.99201e-16          	0.314629             	0.419932             
    0.010000000000000	7.66054e-15          	1.82744e-14          	7.66054e-15          	0.314629             	0.419932             
    0.050000000000000	1.22125e-15          	2.91331e-15          	1.22125e-15          	0.314629             	0.419932             
    0.001000000000000	9.64784e-14          	2.30152e-13          	9.64784e-14          	0.314629             	0.419932             
    0.000100000000000	5.40568e-13          	1.28954e-12          	5.40568e-13          	0.314629             	0.419932             
    0.000010000000000	8.34116e-12          	1.98981e-11          	8.34116e-12          	0.314629             	0.419932             

---------------------------------------------------------------
x_shape (1, 1) y_shape (1, 1)
<type 'numpy.float32'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
 1000.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
  100.000000000000000	5.96046e-08          	8.27469e-08          	5.96046e-08          	0.417022             	0.720325             
   10.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    1.000000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    0.100000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    0.010000000000000	8.9407e-07           	1.2412e-06           	8.9407e-07           	0.417022             	0.720325             
    0.050000000000000	0                    	0                    	0                    	0.417022             	0.720325             
    0.001000000000000	2.44379e-06          	3.39262e-06          	2.44379e-06          	0.417022             	0.720325             
    0.000100000000000	2.38419e-06          	3.30988e-06          	2.38419e-06          	0.417022             	0.720325             
    0.000010000000000	0.000891685          	0.00123789           	0.000891685          	0.417022             	0.720325             

x_shape (1, 1) y_shape (1, 1)
<type 'numpy.float64'>
epsilon              	max diff             	max relative diff    	avg_abs_diff         	avg_abs_x            	avg_abs_y            
10000.000000000000000	0                    	0                    	0                    	0.417022             	0.720324             
 1000.000000000000000	0                    	0                    	0                    	0.417022             	0.720324             
  100.000000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
   10.000000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
    1.000000000000000	0                    	0                    	0                    	0.417022             	0.720324             
    0.100000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
    0.010000000000000	1.22125e-15          	1.69541e-15          	1.22125e-15          	0.417022             	0.720324             
    0.050000000000000	1.11022e-16          	1.54128e-16          	1.11022e-16          	0.417022             	0.720324             
    0.001000000000000	1.23235e-14          	1.71082e-14          	1.23235e-14          	0.417022             	0.720324             
    0.000100000000000	1.23346e-13          	1.71236e-13          	1.23346e-13          	0.417022             	0.720324             
    0.000010000000000	4.31877e-13          	5.99559e-13          	4.31877e-13          	0.417022             	0.720324             

---------------------------------------------------------------

code

import numpy as np

def print_diff(dtype, x_shape, y_shape):
    np.random.seed(1)

    x = np.random.random(x_shape).astype(dtype)
    y = np.random.random(y_shape).astype(dtype)
    
    def f(e):
        return np.matmul(x + e, y)
    
    e = np.zeros(x_shape).astype(dtype)
    
    one = e.copy()
    one[0, 0] = 1
    target = np.dot(one, y)
    
    print '%-21s\t%-21s\t%-21s\t%-21s\t%-21s\t%-21s' \
            % ('delta', 'max diff', 'max relative diff',
               'avg_abs_diff', 'avg_abs_x', 'avg_abs_y')
    #for delta in [10000, 1000, 100, 10, 1, 0.1, 0.01, 0.05, 0.001, 0.0001, 0.00001]:
    for delta in [0.01, 0.05, 0.001, 0.0001, 0.00001]:
        #delta = np.abs(x).sum() / x.size
        e[0, 0] = delta
        grad = (f(e) - f(-e)) / 2 / delta
        #grad = np.matmul(e, y) / delta
        
        diff = grad - target
        
        target_ = target.copy()
        target_[target_ < 1e-3] = 1
        relative_diff = np.abs(diff) / target_
    
        print '%21.15f\t%-21g\t%-21g\t%-21g\t%-21g\t%-21g' \
                % (delta,
                   np.abs(diff).max(),
                   np.abs(relative_diff).max(),
                   np.abs(diff).mean(),
                   np.abs(x).mean(),
                   np.abs(y).mean())

for x_shape, y_shape in [((1, 200), (200, 1)), ((1, 84), (84, 1)), ((1, 10), (10, 1))]:
    for dtype in (np.float32, np.float64):
        print 'x_shape', x_shape, 'y_shape', y_shape
        print dtype
        print_diff(dtype, x_shape, y_shape)
        print ''

    print '-' * 63
@pengli09
Copy link
Contributor Author

Conclusion

  1. float64 should be used in gradient checking unless float64 is not supported
  2. computing Jacobian matrix is a safer choice
  3. smaller but reasonable tensor can be used to reduce testing time

Evidence
Pytorch

  1. float64 is used: https://github.com/pytorch/pytorch/blob/cf7e28de8eeafda25dad0e120fbb4e0d701a52f4/test/test_autograd.py#L2038
  2. Jacobian matrix is used: https://github.com/pytorch/pytorch/blob/cf7e28de8eeafda25dad0e120fbb4e0d701a52f4/test/test_autograd.py#L15, https://github.com/pytorch/pytorch/blob/cf7e28de8eeafda25dad0e120fbb4e0d701a52f4/test/test_autograd.py#L2141, https://github.com/pytorch/pytorch/blob/cf7e28de8eeafda25dad0e120fbb4e0d701a52f4/test/test_autograd.py#L2206

Tensorflow (code: tensorflow/python/ops)

  1. float64 is widely used
  2. Jacobian matrix is used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants