Newly introduced vector-vector product bug #1240

jlowin · 2013-02-20T05:58:33Z

I'm experiencing what appears to be a serious bug. According to git bisect, the recent commit 14af439 is the culprit, merged in gh-1202.

The following code now gives the wrong result on my machine. It should be 30.0, but after this commit results in 0.0.

import numpy as np

import theano
import theano.tensor as tt
v = tt.vector()
f = theano.function([v], tt.dot(v, v))
testval = np.arange(5).astype(theano.config.floatX)
print f(testval)

The text was updated successfully, but these errors were encountered:

jlowin · 2013-02-20T06:33:42Z

More details:

The example code fails on the CPU with theano.config.floatX = 'float32'
The code works on the CPU with theano.config.floatX = 'float64'
The code works on the GPU

jlowin · 2013-02-20T14:32:08Z

This only appears to impact symbolic vectors. The nose tests don't use symbolic vectors, they convert numpy arrays to tensor variables, and I guess that's why tests pass.

For example, this code works when the symbolic version above fails:

import numpy as np
import theano
import theano.tensor as tt

testval = np.arange(5).astype(theano.config.floatX)
testvar = tt.as_tensor_variable(testval)
f_val = theano.function([], tt.dot(testvar, testvar))
print f_val()

nouiz · 2013-02-20T14:37:40Z

For me both version work.

Can you tell me the blas version used? The theano flags blas.ldflags? Your OS version? python/numpy/scipy version?

Can you run your tests and add this line and give me the output of this line:

theano.printing.debugprint(f, print_type=True)

I have this output:

>>> theano.printing.debugprint(f, print_type=True)
InplaceDimShuffle{} [@A] <TensorType(float32, scalar)> ''   2
 |CGemv{no_inplace} [@B] <TensorType(float32, (True,))> ''   1
   |TensorConstant{(1,) of 0.0} [@C] <TensorType(float32, (True,))>
   |TensorConstant{1.0} [@D] <TensorType(float32, scalar)>
   |InplaceDimShuffle{x,0} [@E] <TensorType(float32, row)> ''   0
   | |<TensorType(float32, vector)> [@F] <TensorType(float32, vector)>
   |<TensorType(float32, vector)> [@F] <TensorType(float32, vector)>
   |TensorConstant{0.0} [@G] <TensorType(float32, scalar)>

jlowin · 2013-02-20T14:53:34Z

Sure --

debug output:

InplaceDimShuffle{} [@A] <TensorType(float32, scalar)> ''   2
 |CGemv{no_inplace} [@B] <TensorType(float32, (True,))> ''   1
   |TensorConstant{(1,) of 0.0} [@C] <TensorType(float32, (True,))>
   |TensorConstant{1.0} [@D] <TensorType(float32, scalar)>
   |InplaceDimShuffle{x,0} [@E] <TensorType(float32, row)> ''   0
   | |<TensorType(float32, vector)> [@F] <TensorType(float32, vector)>
   |<TensorType(float32, vector)> [@F] <TensorType(float32, vector)>
   |TensorConstant{0.0} [@G] <TensorType(float32, scalar)>

OS: Mac OS X 10.8

Theano flags: floatX = float32

blas.ldflags: -lblas

numpy version : 1.8.0.dev-c932553
scipy version : 0.11.0
blas version is built-in Apple Accelerate platform. Using otool to look at the linked file:

>>> otool -L /usr/local/lib/python2.7/site-packages/numpy/core/_dotblas.so
/usr/local/lib/python2.7/site-packages/numpy/core/_dotblas.so:
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)

I'm going to see if it's possibly a numpy incompatibility since I'm using a dev version by uninstalling/reinstalling numpy/scipy/theano. I'll update once that's done.

lamblin · 2013-02-20T15:08:20Z

What is the error message in DebugMode? It might be informative.

jlowin · 2013-02-20T15:13:43Z

Good point -- this is the DebugMode error:

>>> v = tt.vector()
>>> f = theano.function([v], tt.dot(v, v), mode='DebugMode')
>>> testval = np.arange(5).astype(theano.config.floatX)
>>> print f(testval)
WARNING: ('Stride mismatch', ((1, 5), (1, 5), (20, 4), (4, 4), 'DimShuffle{1,0}'))
---------------------------------------------------------------------------
BadThunkOutput                            Traceback (most recent call last)
<ipython-input-1-e15c4b233b89> in <module>()
      2 f = theano.function([v], tt.dot(v, v), mode='DebugMode')
      3 testval = np.arange(5).astype(theano.config.floatX)
----> 4 print f(testval)

/Users/jlowin/git/Theano/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    578         t0_fn = time.time()
    579         try:
--> 580             outputs = self.fn()
    581         except Exception:
    582             if hasattr(self.fn, 'position_of_error'):

/Users/jlowin/git/Theano/theano/compile/debugmode.pyc in deco()
   2065                 TensorType.filter_checks_isfinite = self.maker.mode.check_isfinite
   2066                 try:
-> 2067                     return f()
   2068                 finally:
   2069                     # put back the filter_checks_isfinite

/Users/jlowin/git/Theano/theano/compile/debugmode.pyc in f()
   1944                                             thunk1='perform', val1=r_vals[r],
   1945                                             thunk2='c_code', val2=storage_map[r][0],
-> 1946                                             inputs_val=inputs_val)
   1947                             else:
   1948                                 #print >> sys.stderr, i, "DEBUGMODE storing reference output %x" % id(storage_map[r][0])

BadThunkOutput: BadThunkOutput
  variable    : CGemv{no_inplace}.0
  Outputs Type: TensorType(float32, (True,))
  Outputs Shape: (1,)
  Outputs Strides: (4,)
  Inputs Type : [TensorType(float32, (True,)), TensorType(float32, scalar), TensorType(float32, row), TensorType(float32, vector), TensorType(float32, scalar)]
  Inputs Shape: [(1,), (), (1, 5), (5,), ()]
  Inputs Strides: [(4,), (), (20, 4), (4,), ()]
  Apply   : CGemv{no_inplace}(TensorConstant{(1,) of 0.0}, TensorConstant{1.0}, InplaceDimShuffle{x,0}.0, <TensorType(float32, vector)>, TensorConstant{0.0})
  thunk1  : perform
  thunk2  : c_code
  val1    : [ 30.]
  val2    : [ 0.]
  op      : <class 'theano.tensor.blas_c.CGemv'>
  Value 1 : shape, dtype, strides, min, max, n_inf, n_nan: (1,) float32 (4,) 30.0 30.0 0 0
  Value 2 : shape, dtype, strides, min, max, n_inf, n_nan: (1,) float32 (4,) 0.0 0.0 0 0
  Max Abs Diff:  30.0
  Mean Abs Diff:  30.0
  Median Abs Diff:  30.0
  Std Abs Diff:  0.0
  Max Rel Diff:  1.0
  Mean Rel Diff:  1.0
  Median Rel Diff:  1.0
  Std Rel Diff:  0.0

jlowin · 2013-02-20T15:43:17Z

I returned to stable NumPy 1.7 but am still having this issue.

nouiz · 2013-02-20T15:51:05Z

So from my understanding the problem is related to your blas library. To confirm this, can you test with this Theano flag?

blas.ldflags=

This will force to use numpy call. It should work. Do you have another blas? I see 2 possible causes, it don't like how we call it and generate an error that we don't catch, or your blas library is bugged.

In all cases, we will need to add as a test your code to make sure it is detected when people run Theano's tests.

jlowin · 2013-02-20T15:58:43Z

So the good news is that with that flag, it does work.

The bad news is that my blas is the standard one that Apple ships with every Mac in the Accelerate framework, so I've never touched it.

lamblin · 2013-02-20T16:03:52Z

It may be because we are using BLAS calls with undefined behaviour, that work correctly where we tested, but that are in general incorrect.
For instance, I'm not sure the value we pass as Sx1 and Sx0 are correct for a call to dot_, because they were mucked with to be acceptable for gemv_ in the case x was a row or column vector.
I'll try to have access to a machine that can reproduce the bug. Otherwise, could you help me test instrumented versions of the code, so we understand exactly what's happening?

jlowin · 2013-02-20T16:06:37Z

Of course, let me know anything you want to run. I will try to find some other Macs as well.

I should note that that on the two machines I am testing on right now, check_blas.py runs about 20% slower with blas.ldflags =, so if it's possible to find a better solution, that's definitely preferable.

nouiz · 2013-02-20T16:11:13Z

It is normal that this flag slow thing down. It mean we don't call blas directly, but we always use the numpy python interface.

If you can tell us the exact value of paremetter passed to sdot_(), it would help us as @lamblin think this could be an issue.

jlowin · 2013-02-20T16:16:23Z

Sure -- I'm not too familiar with c, though. Maybe you could write a line here for me to paste into blas_c at the appropriate spot, and I'll tell you the result? Sorry for the inconvenience.

nouiz · 2013-02-20T16:22:58Z

What about this line?

You need to modify the method c_code_cache_version of the same op to return an empty tuple. This mean to always recompile it, so if needed you can change more that line without always changing the version.

printf("?dot_ params: %%d %%p %%d %%p %%d\\n", &Nx1, (float*)(PyArray_DATA(%(xx)s)), &Sx1, (float*)yy_data, &Sy);

The multiple %% and // are needed as python will interprete the line before it get passed to g++

lamblin · 2013-02-20T16:26:26Z

It would also be nice to have the actual PyArray_STRIDES(%(xx)s)[0] and [1] in addition of Sx0 and Sx1.

jlowin · 2013-02-20T16:42:33Z

This is the output, if it helps:

?dot_ params: 1430203228 0x7f8c5ad0e9f0 1430203220 0x7f8c5ad0e9f0 1430203212
?dot_ strides: 20 4

nouiz · 2013-02-20T18:31:27Z

I made an error in my line, can you use this one:

 printf("?dot_ params: %%d %%p %%d %%p %%d\\n", Nx1, (float*)(PyArray_DATA(%(xx)s)), Sx1, (float*)yy_data, Sy);

I got:

?dot_ params: 5 0x4025560 1 0x4025560 1
?dot_ strides: 20 4

jlowin · 2013-02-20T19:23:15Z

I have:

?dot_ params: 5 0x7fb38a4475a0 1 0x7fb38a4475a0 1
?dot_ strides: 20 4

nouiz · 2013-02-20T19:27:12Z

@lamblin do you see a problem with the fct input? I don't. The prt are aligned and the other paramter seam good.

Do one of you know how to check for BLAS error?

lamblin · 2013-02-20T19:50:56Z

Regarding BLAS errors, I remember that you can declare a function with an appropriate name, and then this function will be used instead of BLAS error checking one (that does not do much). I remember having tried to do that in the past, but it's quite blurry in my mind.
Regarding the values, let me think about it for a while. The problem might be that the memory is shared between both arguments of dot_.

nouiz · 2013-02-20T20:00:47Z

@jlowin can you test with 2 differents vector? Do this fix this case?

jlowin · 2013-02-20T20:02:55Z

Unfortunately I still get 0.0 with this code:

import numpy as np

import theano
import theano.tensor as tt
v = tt.vector()
u = tt.vector()
f = theano.function([v, u], tt.dot(v, u))
testval = np.arange(5).astype(theano.config.floatX)
print f(testval, testval)

lamblin · 2013-02-20T20:04:19Z

The error handling function seems to be xerbla. You may have to define it as xerbla_.
@jlowin Can you also try with something else than range(5)? Maybe range(3), range(7), range(12).
Also, I think that it would be worth testing it with two different ndarrays, for instance f(testval, testval+1).

jlowin · 2013-02-20T20:07:40Z

@lamblin here are more test cases. I get 0.0 for all of them with floatX=='float32'.

import numpy as np

import theano
import theano.tensor as tt

v = tt.vector()
u = tt.vector()
f = theano.function([v, u], tt.dot(v, u))

for n in [3, 5, 7, 12]:
    testval1 = np.arange(n).astype(theano.config.floatX)
    testval2 = 10 + np.arange(n).astype(theano.config.floatX)
    print f(testval1, testval2)

lamblin · 2013-02-20T21:48:01Z

OK, thanks.
I tried to reproduce your problem on a Mac, but I did not succeed yet.
What is in theano.config.blas.ldflags exactly? Did you change other things in theano.config (besides floatX, I guess)?

jlowin · 2013-02-20T22:04:21Z

Hm, so strange. One of the machines I'm testing on is almost brand new, I can't think of anything that would have affected the blas but I will keep trying to find something off. FWIW, I'm using Homebrew to set up my environment.

theano.config.blas.ldflags is simply '-lblas'

At the moment the only active flag in my .theanorc is floatX:

[global]
floatX = float32

(and there are a couple other flags, like device, that I've commented out while working on this problem)

nouiz · 2013-02-20T22:07:43Z

@lamblin how did you installed python/numpy/blas? Can you make sure it use apple blas?

I think that if you used EPD, it would work, as it use another BLAS.

lamblin · 2013-02-20T22:24:48Z

@nouiz: It's not EPD, but I'm not sure how it was installed. theano.config.blas.ldflags is -lblas, too, I don't know how to check how it's resolved.
@jlowin: can you add the following printf statements, to help us understand where things go wrong? The first ones should be the same as before.

printf("?dot_ params: %%d %%p %%d %%p %%d\\n", Nx1, (float*)(PyArray_DATA(%(xx)s)), Sx1, (float*)yy_data, Sy);
printf("?dot_ strides: %%d %%d\\n", PyArray_STRIDES(%(xx)s)[0], PyArray_STRIDES(%(xx)s)[1]);
printf("?dot_ dims: %%d %%d\\n", PyArray_DIMS(%(xx)s)[0], PyArray_DIMS(%(xx)s)[1]);
printf("?dot_ alpha: %%f fbeta %%f dbeta\\n", alpha, fbeta, dbeta);
float dot_result = sdot_(&Nx1, (float*)(PyArray_DATA(%(xx)s)), &Sx1, (float*)yy_data, &Sy);
printf("?dot_ result: %%f\\n", dot_result);

lamblin · 2013-02-20T22:28:28Z

@nouiz: so, "otool -L" (the equivalent of ldd) gives, in particular:

/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib (compatibility version 1.0.0, current version 219.0.0)

which looks like Apple's BLAS, but it's probably an older one.

jlowin · 2013-02-20T22:31:19Z

@lamblin here's the output for the four test cases

?dot_ params: 3 0x7fac3e1c5800 1 0x7fac3e1d13c0 1
?dot_ strides: 12 4
?dot_ dims: 1 3
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
0.0

?dot_ params: 5 0x7fac3e1b66a0 1 0x7fac3e1b4400 1
?dot_ strides: 20 4
?dot_ dims: 1 5
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
0.0

?dot_ params: 7 0x7fac3e1b4fd0 1 0x7fac3e1b6c10 1
?dot_ strides: 28 4
?dot_ dims: 1 7
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
0.0

?dot_ params: 12 0x7fac3e09abf0 1 0x7fac3e1e7cf0 1
?dot_ strides: 48 4
?dot_ dims: 1 12
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
0.0

lamblin · 2013-02-20T22:43:23Z

OK, so the call to sdot_ itself returns 0.0, which is puzzling. I'll try to come up with a pure C file that you can compile and run, to see if it reproduces the problem.

In the mean time, can you add the following printing statement? I just want to check that the actual data being used is not 0.

printf("?dot_ data[:3]: %%f %%f %%f\\n", ((float*)(PyArray_DATA(%(xx)s)))[0], ((float*)(PyArray_DATA(%(xx)s)))[1], ((float*)(PyArray_DATA(%(xx)s)))[2]);

jlowin · 2013-02-20T22:46:05Z

Sure:

?dot_ params: 3 0x7fc644500570 1 0x7fc644500c80 1
?dot_ strides: 12 4
?dot_ dims: 1 3
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ data[:3]: 10.000000 11.000000 12.000000
0.0

?dot_ params: 5 0x7fc644500c60 1 0x7fc644503430 1
?dot_ strides: 20 4
?dot_ dims: 1 5
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ data[:3]: 10.000000 11.000000 12.000000
0.0

?dot_ params: 7 0x7fc6445014f0 1 0x7fc644501460 1
?dot_ strides: 28 4
?dot_ dims: 1 7
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ data[:3]: 10.000000 11.000000 12.000000
0.0

?dot_ params: 12 0x7fc644500550 1 0x7fc644500000 1
?dot_ strides: 48 4
?dot_ dims: 1 12
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ data[:3]: 10.000000 11.000000 12.000000
0.0

lamblin · 2013-02-20T22:56:38Z

Sorry about the iterative debug process, I did not realize that you still have x != y.
Could you add this last printing statement?

printf("?dot_ y data[:3]: %%f %%f %%f\\n", ((float*)yy_data)[0], ((float*)yy_data)[1], ((float*)yy_data)[2]);

jlowin · 2013-02-20T23:00:54Z

That's ok, I appreciate the help. Here's the output. It appears that the data is there.

?dot_ params: 3 0x7fc6ee01d2e0 1 0x7fc6ee0c4380 1
?dot_ strides: 12 4
?dot_ dims: 1 3
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ x data[:3]: 10.000000 11.000000 12.000000
?dot_ y data[:3]: 0.000000 1.000000 2.000000
0.0

?dot_ params: 5 0x7fc6ee442270 1 0x7fc6ee44b220 1
?dot_ strides: 20 4
?dot_ dims: 1 5
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ x data[:3]: 10.000000 11.000000 12.000000
?dot_ y data[:3]: 0.000000 1.000000 2.000000
0.0

?dot_ params: 7 0x7fc6ee32fc70 1 0x7fc6ee3d61d0 1
?dot_ strides: 28 4
?dot_ dims: 1 7
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ x data[:3]: 10.000000 11.000000 12.000000
?dot_ y data[:3]: 0.000000 1.000000 2.000000
0.0

?dot_ params: 12 0x7fc6ee443200 1 0x7fc6ee44b220 1
?dot_ strides: 48 4
?dot_ dims: 1 12
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ x data[:3]: 10.000000 11.000000 12.000000
?dot_ y data[:3]: 0.000000 1.000000 2.000000
0.0

lamblin · 2013-02-20T23:06:02Z

Thanks! So, there really seems to be something wrong with your sdot_ function.

Here is a small C program, that you can compile with gcc blas_dot_test.c -o blas_dot_test -lblas. It is completely independent from Python and Theano. When I compile and run it, it prints r: 30.000000. Can you check if it fails for you? If so, is there any message printed on stdout or stderr?

#include <stdio.h>
float sdot_(int*, float*, int*, float*, int*);

int main(int argc, char** argv)
{
    int Nx = 5;
    int Sx = 1;
    float x[5] = {0, 1, 2, 3, 4};
    float r = sdot_(&Nx, x, &Sx, x, &Sx);
    printf("r: %f\n", r);
    return 0;
}

lamblin · 2013-02-20T23:19:04Z

It may be related to http://stackoverflow.com/questions/6887229/problem-on-multiplying-a-matrix-and-a-vector-with-veclib-framework-of-mac-os-x-1.
So, maybe we should not use sdot_ on MacOS, or maybe there is another lib than veclib that we should link with first.

jlowin · 2013-02-20T23:19:21Z

Thank you -- so this fails, it returns 0.000000. It looks like you're definitely right and there is a problem with sdot.

With these clues, I found some other people experiencing something similar with sdot in recent versions of Mac OS. See for example Homebrew/legacy-homebrew#6649 and http://www.macresearch.org/lapackblas-fortran-106. This is all a little over my head, but here's one response in particular (from the second link), wrt to Apple's implementation of sdot:

The issue with SDOT on 64-bit Intel is a legitimate bug with the BLAS not properly obeying the g77/f2c calling conventions. Specifically, the g77/f2c convention is for single precision results to be returned as double precision, but the OS X BLAS is incorrectly returning the result in single precision. When that single precision result is interpreted as a double precision value by the calling routine, the result is a very tiny (denormal) double-precision value, which rounds to exactly zero when converted to single.

So I don't know if that's right or not, but it seems to fit the symptoms. Maybe some unique circumstances are making this only affect a small number of people. I will continue to see what I can learn.

lamblin · 2013-02-20T23:47:46Z

Thanks, that would make sense.
The obvious solution is not to use sdot_ anymore.
If we want to continue using it, we'll have to figure out how to make it work with recent MacOS versions.
Does it work if you replace the prototype at the top of the C file with the following?

double sdot_(int*, float*, int*, float*, int*);

jlowin · 2013-02-21T01:34:13Z

Sorry for the delay -- yes, declaring it as double does work. Maybe that confirms the description of the problem above.

lamblin · 2013-02-21T18:11:35Z

So, If I try the same (defining sdot_ as returning a double) on Linux, I have a "symbol not defined" error.
If I try that on the Mac I have access to (Mac OS X 10.6.8), I actually get 0.00...
One solution would be testing which prototype should be used, at each run, by compiling and running code similar to the sample above.
Simply reverting the commit in question would not be enough, because sdot_ is also used in other Ops, where it does not have an easy allternative.
Another solution (if it works for you) would be to use sdsdot_ instead of sdot_, and remove the prototype for sdot_ from the blas headers included in Theano.

lamblin · 2013-02-21T18:28:46Z

So, when you have time, can you try the following?

#include <stdio.h>
float sdsdot_(int*, float*, float*, int*, float*, int*);

int main(int argc, char** argv)
{
    int Nx = 5;
    int Sx = 1;
    float x[5] = {0, 1, 2, 3, 4};
    float z = 0;
    float r = sdsdot_(&Nx, &z, x, &Sx, x, &Sx);
    printf("z: %f, r: %f\n", z, r);
    return 0;
}

z should still be 0 at the end.
Edit: corrected sdot_ -> sdsdot_.

jlowin · 2013-02-21T21:37:14Z

Sorry for the delay, I'm travelling today and tomorrow. This did not work (still 0). Also, if you're running 10.6.8, it bears out that this was a change in Lion (10.7).

Based on the suggestions in the stackoverflow post you mentioned as well as here I tried calling the Accelerate framework's own sdot and it did actually work, returning 30.0. (need to add #include <Accelerate/Accelerate.h> and call cblas_sdot(Nx, x, Sx, x, Sx)). Now, I suppose that might be a Mac-only solution, and perhaps this isn't useful at all, but I'll mention it just in case.

lamblin · 2013-02-21T21:50:11Z

That seems a good solution to me. We could change the BLAS headers for Mac to #include <Accelerate/Accelerate.h> and redefine sdot_ to call cblas_sdot, instead of just having the forward declaration.

lamblin · 2013-02-21T22:36:45Z

The thing is that, however, we only want to do that if the specified blas flags are "-lblas", because other versions of blas would not have that problem, and we don't want to use Accelerate in that case.

lamblin · 2013-02-22T19:19:30Z

Also, do you have to specify different linking flags to use the cblas interface, like -lcblas? That would make it more difficult to integrate cleanly.

jlowin · 2013-02-22T21:33:52Z

No -- I used the '-lblas' flag. Actually exactly the same as your original compile instruction.

lamblin · 2013-02-22T22:04:25Z

@jlowin Cool, then I have a plan to solve the problem in an unobtrusive way.
I'll start a development branch for that on Monday. Thanks for your help!

jlowin · 2013-02-22T22:04:58Z

Great to hear! Thanks for your help too!

They should detect the bug described in Theanogh-1240

lamblin · 2013-02-25T21:07:40Z

@jlowin: When you have time, can you check out the branch at gh-1250, and tell me if it correctly solves your problem? I'm still not able to test the fix here.

jlowin · 2013-02-25T22:37:54Z

@lamblin -- unfortunately it's not there yet (the assertion fails and kills my Python). I'm tracing it as best as I can -- it looks like the first code snippet compiles but gives the wrong answer (as expected), but the second code snippet fails to compile. Do we need to link to the Accelerate library explicitly?

lamblin · 2013-02-25T22:44:24Z

Oops, sorry, I think I know what's wrong. I'll update the PR shortly.

lamblin · 2013-02-25T22:50:33Z

@jlowin Updated.

jlowin · 2013-02-25T23:00:25Z

@lamblin Ok, getting closer! This time it failed because the 'fabs' function was not declared, so I replaced it with the direct comparison from the first snippet. After that, it fails with the following:

/Users/jlowin/.theano/compiledir_Darwin-12.2.1-x86_64-i386-64bit-i386-2.7.3/tmpMc8DVL/mod.cpp: In function ‘float sdot_(int*, float*, int*, float*, int*)’:
/Users/jlowin/.theano/compiledir_Darwin-12.2.1-x86_64-i386-64bit-i386-2.7.3/tmpMc8DVL/mod.cpp:214: error: ‘cblas_sdot’ was not declared in this scope

However, before you added the extern... line, it quit python; now it just raises this error in python but doesn't quit.

lamblin · 2013-02-25T23:15:50Z

Right, I forgot to add the declaration of cblas_sdot in the actual header text, I just corrected that.
So, now the detection itself seems to work correctly.
Regarding the quitting of Python, I intended that as a last measure if everything fails, so at least no incorrect computation is done. I'm not sure what I can do that would not prevent compilation and use of other blas functions, but really prevent sdot_ from being called.

jlowin · 2013-02-26T00:44:24Z

I'm getting the correct results from my test code -- so I think this is a working solution.

However, I'm a little hesitant to declare "it works" based only on my experience -- hopefully we will find a way to test it on some other machines as well.

Thanks so much for all the time you put into this!

jlowin · 2013-02-27T02:42:47Z

I was able to test this on a second machine, FWIW, and it works there too.

nouiz · 2013-02-27T19:02:21Z

gh-1250 fixed this, so I close.

thanks

lamblin added a commit to lamblin/Theano that referenced this issue Feb 25, 2013

Add tests for CGemv.

d9a522a

They should detect the bug described in Theanogh-1240

lamblin mentioned this issue Feb 25, 2013

Try to detect bug in Mac OS X parallel BLAS #1250

Merged

nouiz closed this as completed Feb 27, 2013

Newly introduced vector-vector product bug #1240

Newly introduced vector-vector product bug #1240

Comments

jlowin commented Feb 20, 2013

jlowin commented Feb 20, 2013

jlowin commented Feb 20, 2013

nouiz commented Feb 20, 2013

jlowin commented Feb 20, 2013

lamblin commented Feb 20, 2013

jlowin commented Feb 20, 2013

jlowin commented Feb 20, 2013

nouiz commented Feb 20, 2013

jlowin commented Feb 20, 2013

lamblin commented Feb 20, 2013

jlowin commented Feb 20, 2013

nouiz commented Feb 20, 2013

jlowin commented Feb 20, 2013

nouiz commented Feb 20, 2013

lamblin commented Feb 20, 2013

jlowin commented Feb 20, 2013

nouiz commented Feb 20, 2013

jlowin commented Feb 20, 2013

nouiz commented Feb 20, 2013

lamblin commented Feb 20, 2013

nouiz commented Feb 20, 2013

jlowin commented Feb 20, 2013

lamblin commented Feb 20, 2013

jlowin commented Feb 20, 2013

lamblin commented Feb 20, 2013

jlowin commented Feb 20, 2013

nouiz commented Feb 20, 2013

lamblin commented Feb 20, 2013

lamblin commented Feb 20, 2013

jlowin commented Feb 20, 2013

lamblin commented Feb 20, 2013

jlowin commented Feb 20, 2013

lamblin commented Feb 20, 2013

jlowin commented Feb 20, 2013

lamblin commented Feb 20, 2013

lamblin commented Feb 20, 2013

jlowin commented Feb 20, 2013

lamblin commented Feb 20, 2013

jlowin commented Feb 21, 2013

lamblin commented Feb 21, 2013

lamblin commented Feb 21, 2013

jlowin commented Feb 21, 2013

lamblin commented Feb 21, 2013

lamblin commented Feb 21, 2013

lamblin commented Feb 22, 2013

jlowin commented Feb 22, 2013

lamblin commented Feb 22, 2013

jlowin commented Feb 22, 2013

lamblin commented Feb 25, 2013

jlowin commented Feb 25, 2013

lamblin commented Feb 25, 2013

lamblin commented Feb 25, 2013

jlowin commented Feb 25, 2013

lamblin commented Feb 25, 2013

jlowin commented Feb 26, 2013

jlowin commented Feb 27, 2013

nouiz commented Feb 27, 2013