New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newly introduced vector-vector product bug #1240
Comments
More details:
|
This only appears to impact symbolic vectors. The nose tests don't use symbolic vectors, they convert numpy arrays to tensor variables, and I guess that's why tests pass. For example, this code works when the symbolic version above fails: import numpy as np
import theano
import theano.tensor as tt
testval = np.arange(5).astype(theano.config.floatX)
testvar = tt.as_tensor_variable(testval)
f_val = theano.function([], tt.dot(testvar, testvar))
print f_val() |
For me both version work. Can you tell me the blas version used? The theano flags blas.ldflags? Your OS version? python/numpy/scipy version? Can you run your tests and add this line and give me the output of this line: theano.printing.debugprint(f, print_type=True) I have this output:
|
Sure -- debug output:
OS: Mac OS X 10.8 Theano flags: floatX = float32 blas.ldflags: -lblas numpy version : 1.8.0.dev-c932553
I'm going to see if it's possibly a numpy incompatibility since I'm using a dev version by uninstalling/reinstalling numpy/scipy/theano. I'll update once that's done. |
What is the error message in DebugMode? It might be informative. |
Good point -- this is the DebugMode error: >>> v = tt.vector()
>>> f = theano.function([v], tt.dot(v, v), mode='DebugMode')
>>> testval = np.arange(5).astype(theano.config.floatX)
>>> print f(testval)
WARNING: ('Stride mismatch', ((1, 5), (1, 5), (20, 4), (4, 4), 'DimShuffle{1,0}'))
---------------------------------------------------------------------------
BadThunkOutput Traceback (most recent call last)
<ipython-input-1-e15c4b233b89> in <module>()
2 f = theano.function([v], tt.dot(v, v), mode='DebugMode')
3 testval = np.arange(5).astype(theano.config.floatX)
----> 4 print f(testval)
/Users/jlowin/git/Theano/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
578 t0_fn = time.time()
579 try:
--> 580 outputs = self.fn()
581 except Exception:
582 if hasattr(self.fn, 'position_of_error'):
/Users/jlowin/git/Theano/theano/compile/debugmode.pyc in deco()
2065 TensorType.filter_checks_isfinite = self.maker.mode.check_isfinite
2066 try:
-> 2067 return f()
2068 finally:
2069 # put back the filter_checks_isfinite
/Users/jlowin/git/Theano/theano/compile/debugmode.pyc in f()
1944 thunk1='perform', val1=r_vals[r],
1945 thunk2='c_code', val2=storage_map[r][0],
-> 1946 inputs_val=inputs_val)
1947 else:
1948 #print >> sys.stderr, i, "DEBUGMODE storing reference output %x" % id(storage_map[r][0])
BadThunkOutput: BadThunkOutput
variable : CGemv{no_inplace}.0
Outputs Type: TensorType(float32, (True,))
Outputs Shape: (1,)
Outputs Strides: (4,)
Inputs Type : [TensorType(float32, (True,)), TensorType(float32, scalar), TensorType(float32, row), TensorType(float32, vector), TensorType(float32, scalar)]
Inputs Shape: [(1,), (), (1, 5), (5,), ()]
Inputs Strides: [(4,), (), (20, 4), (4,), ()]
Apply : CGemv{no_inplace}(TensorConstant{(1,) of 0.0}, TensorConstant{1.0}, InplaceDimShuffle{x,0}.0, <TensorType(float32, vector)>, TensorConstant{0.0})
thunk1 : perform
thunk2 : c_code
val1 : [ 30.]
val2 : [ 0.]
op : <class 'theano.tensor.blas_c.CGemv'>
Value 1 : shape, dtype, strides, min, max, n_inf, n_nan: (1,) float32 (4,) 30.0 30.0 0 0
Value 2 : shape, dtype, strides, min, max, n_inf, n_nan: (1,) float32 (4,) 0.0 0.0 0 0
Max Abs Diff: 30.0
Mean Abs Diff: 30.0
Median Abs Diff: 30.0
Std Abs Diff: 0.0
Max Rel Diff: 1.0
Mean Rel Diff: 1.0
Median Rel Diff: 1.0
Std Rel Diff: 0.0 |
I returned to stable NumPy 1.7 but am still having this issue. |
So from my understanding the problem is related to your blas library. To confirm this, can you test with this Theano flag? blas.ldflags= This will force to use numpy call. It should work. Do you have another blas? I see 2 possible causes, it don't like how we call it and generate an error that we don't catch, or your blas library is bugged. In all cases, we will need to add as a test your code to make sure it is detected when people run Theano's tests. |
So the good news is that with that flag, it does work. The bad news is that my blas is the standard one that Apple ships with every Mac in the Accelerate framework, so I've never touched it. |
It may be because we are using BLAS calls with undefined behaviour, that work correctly where we tested, but that are in general incorrect. |
Of course, let me know anything you want to run. I will try to find some other Macs as well. I should note that that on the two machines I am testing on right now, check_blas.py runs about 20% slower with |
It is normal that this flag slow thing down. It mean we don't call blas directly, but we always use the numpy python interface. If you can tell us the exact value of paremetter passed to sdot_(), it would help us as @lamblin think this could be an issue. |
Sure -- I'm not too familiar with c, though. Maybe you could write a line here for me to paste into blas_c at the appropriate spot, and I'll tell you the result? Sorry for the inconvenience. |
What about this line? You need to modify the method c_code_cache_version of the same op to return an empty tuple. This mean to always recompile it, so if needed you can change more that line without always changing the version.
The multiple %% and // are needed as python will interprete the line before it get passed to g++ |
It would also be nice to have the actual |
This is the output, if it helps:
|
I made an error in my line, can you use this one:
I got:
|
I have:
|
@lamblin do you see a problem with the fct input? I don't. The prt are aligned and the other paramter seam good. Do one of you know how to check for BLAS error? |
Regarding BLAS errors, I remember that you can declare a function with an appropriate name, and then this function will be used instead of BLAS error checking one (that does not do much). I remember having tried to do that in the past, but it's quite blurry in my mind. |
@jlowin can you test with 2 differents vector? Do this fix this case? |
Unfortunately I still get 0.0 with this code: import numpy as np
import theano
import theano.tensor as tt
v = tt.vector()
u = tt.vector()
f = theano.function([v, u], tt.dot(v, u))
testval = np.arange(5).astype(theano.config.floatX)
print f(testval, testval) |
@lamblin here are more test cases. I get 0.0 for all of them with floatX=='float32'. import numpy as np
import theano
import theano.tensor as tt
v = tt.vector()
u = tt.vector()
f = theano.function([v, u], tt.dot(v, u))
for n in [3, 5, 7, 12]:
testval1 = np.arange(n).astype(theano.config.floatX)
testval2 = 10 + np.arange(n).astype(theano.config.floatX)
print f(testval1, testval2) |
OK, thanks. |
Hm, so strange. One of the machines I'm testing on is almost brand new, I can't think of anything that would have affected the blas but I will keep trying to find something off. FWIW, I'm using Homebrew to set up my environment. theano.config.blas.ldflags is simply At the moment the only active flag in my .theanorc is floatX:
(and there are a couple other flags, like device, that I've commented out while working on this problem) |
@lamblin how did you installed python/numpy/blas? Can you make sure it use apple blas? I think that if you used EPD, it would work, as it use another BLAS. |
@nouiz: It's not EPD, but I'm not sure how it was installed. theano.config.blas.ldflags is -lblas, too, I don't know how to check how it's resolved.
|
@nouiz: so, "otool -L" (the equivalent of ldd) gives, in particular:
which looks like Apple's BLAS, but it's probably an older one. |
@lamblin here's the output for the four test cases
|
OK, so the call to sdot_ itself returns 0.0, which is puzzling. I'll try to come up with a pure C file that you can compile and run, to see if it reproduces the problem. In the mean time, can you add the following printing statement? I just want to check that the actual data being used is not 0.
|
Sure:
|
Sorry about the iterative debug process, I did not realize that you still have x != y.
|
That's ok, I appreciate the help. Here's the output. It appears that the data is there.
|
Thanks! So, there really seems to be something wrong with your Here is a small C program, that you can compile with #include <stdio.h>
float sdot_(int*, float*, int*, float*, int*);
int main(int argc, char** argv)
{
int Nx = 5;
int Sx = 1;
float x[5] = {0, 1, 2, 3, 4};
float r = sdot_(&Nx, x, &Sx, x, &Sx);
printf("r: %f\n", r);
return 0;
} |
It may be related to http://stackoverflow.com/questions/6887229/problem-on-multiplying-a-matrix-and-a-vector-with-veclib-framework-of-mac-os-x-1. |
Thank you -- so this fails, it returns 0.000000. It looks like you're definitely right and there is a problem with sdot. With these clues, I found some other people experiencing something similar with sdot in recent versions of Mac OS. See for example Homebrew/legacy-homebrew#6649 and http://www.macresearch.org/lapackblas-fortran-106. This is all a little over my head, but here's one response in particular (from the second link), wrt to Apple's implementation of sdot:
So I don't know if that's right or not, but it seems to fit the symptoms. Maybe some unique circumstances are making this only affect a small number of people. I will continue to see what I can learn. |
Thanks, that would make sense. double sdot_(int*, float*, int*, float*, int*); |
Sorry for the delay -- yes, declaring it as double does work. Maybe that confirms the description of the problem above. |
So, If I try the same (defining |
So, when you have time, can you try the following? #include <stdio.h>
float sdsdot_(int*, float*, float*, int*, float*, int*);
int main(int argc, char** argv)
{
int Nx = 5;
int Sx = 1;
float x[5] = {0, 1, 2, 3, 4};
float z = 0;
float r = sdsdot_(&Nx, &z, x, &Sx, x, &Sx);
printf("z: %f, r: %f\n", z, r);
return 0;
} z should still be 0 at the end. |
Sorry for the delay, I'm travelling today and tomorrow. This did not work (still 0). Also, if you're running 10.6.8, it bears out that this was a change in Lion (10.7). Based on the suggestions in the stackoverflow post you mentioned as well as here I tried calling the Accelerate framework's own sdot and it did actually work, returning 30.0. (need to add |
That seems a good solution to me. We could change the BLAS headers for Mac to |
The thing is that, however, we only want to do that if the specified blas flags are "-lblas", because other versions of blas would not have that problem, and we don't want to use Accelerate in that case. |
Also, do you have to specify different linking flags to use the cblas interface, like |
No -- I used the '-lblas' flag. Actually exactly the same as your original compile instruction. |
@jlowin Cool, then I have a plan to solve the problem in an unobtrusive way. |
Great to hear! Thanks for your help too! |
They should detect the bug described in Theanogh-1240
@lamblin -- unfortunately it's not there yet (the assertion fails and kills my Python). I'm tracing it as best as I can -- it looks like the first code snippet compiles but gives the wrong answer (as expected), but the second code snippet fails to compile. Do we need to link to the Accelerate library explicitly? |
Oops, sorry, I think I know what's wrong. I'll update the PR shortly. |
@jlowin Updated. |
@lamblin Ok, getting closer! This time it failed because the 'fabs' function was not declared, so I replaced it with the direct comparison from the first snippet. After that, it fails with the following:
However, before you added the |
Right, I forgot to add the declaration of cblas_sdot in the actual header text, I just corrected that. |
I'm getting the correct results from my test code -- so I think this is a working solution. However, I'm a little hesitant to declare "it works" based only on my experience -- hopefully we will find a way to test it on some other machines as well. Thanks so much for all the time you put into this! |
I was able to test this on a second machine, FWIW, and it works there too. |
gh-1250 fixed this, so I close. thanks |
I'm experiencing what appears to be a serious bug. According to
git bisect
, the recent commit 14af439 is the culprit, merged in gh-1202.The following code now gives the wrong result on my machine. It should be 30.0, but after this commit results in 0.0.
The text was updated successfully, but these errors were encountered: