Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newly introduced vector-vector product bug #1240

Closed
jlowin opened this issue Feb 20, 2013 · 57 comments
Closed

Newly introduced vector-vector product bug #1240

jlowin opened this issue Feb 20, 2013 · 57 comments

Comments

@jlowin
Copy link
Contributor

jlowin commented Feb 20, 2013

I'm experiencing what appears to be a serious bug. According to git bisect, the recent commit 14af439 is the culprit, merged in gh-1202.

The following code now gives the wrong result on my machine. It should be 30.0, but after this commit results in 0.0.

import numpy as np

import theano
import theano.tensor as tt
v = tt.vector()
f = theano.function([v], tt.dot(v, v))
testval = np.arange(5).astype(theano.config.floatX)
print f(testval)
@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

More details:

  • The example code fails on the CPU with theano.config.floatX = 'float32'
  • The code works on the CPU with theano.config.floatX = 'float64'
  • The code works on the GPU

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

This only appears to impact symbolic vectors. The nose tests don't use symbolic vectors, they convert numpy arrays to tensor variables, and I guess that's why tests pass.

For example, this code works when the symbolic version above fails:

import numpy as np
import theano
import theano.tensor as tt

testval = np.arange(5).astype(theano.config.floatX)
testvar = tt.as_tensor_variable(testval)
f_val = theano.function([], tt.dot(testvar, testvar))
print f_val()

@nouiz
Copy link
Member

nouiz commented Feb 20, 2013

For me both version work.

Can you tell me the blas version used? The theano flags blas.ldflags? Your OS version? python/numpy/scipy version?

Can you run your tests and add this line and give me the output of this line:

theano.printing.debugprint(f, print_type=True)

I have this output:

>>> theano.printing.debugprint(f, print_type=True)
InplaceDimShuffle{} [@A] <TensorType(float32, scalar)> ''   2
 |CGemv{no_inplace} [@B] <TensorType(float32, (True,))> ''   1
   |TensorConstant{(1,) of 0.0} [@C] <TensorType(float32, (True,))>
   |TensorConstant{1.0} [@D] <TensorType(float32, scalar)>
   |InplaceDimShuffle{x,0} [@E] <TensorType(float32, row)> ''   0
   | |<TensorType(float32, vector)> [@F] <TensorType(float32, vector)>
   |<TensorType(float32, vector)> [@F] <TensorType(float32, vector)>
   |TensorConstant{0.0} [@G] <TensorType(float32, scalar)>

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

Sure --

debug output:

InplaceDimShuffle{} [@A] <TensorType(float32, scalar)> ''   2
 |CGemv{no_inplace} [@B] <TensorType(float32, (True,))> ''   1
   |TensorConstant{(1,) of 0.0} [@C] <TensorType(float32, (True,))>
   |TensorConstant{1.0} [@D] <TensorType(float32, scalar)>
   |InplaceDimShuffle{x,0} [@E] <TensorType(float32, row)> ''   0
   | |<TensorType(float32, vector)> [@F] <TensorType(float32, vector)>
   |<TensorType(float32, vector)> [@F] <TensorType(float32, vector)>
   |TensorConstant{0.0} [@G] <TensorType(float32, scalar)>

OS: Mac OS X 10.8

Theano flags: floatX = float32

blas.ldflags: -lblas

numpy version : 1.8.0.dev-c932553
scipy version : 0.11.0
blas version is built-in Apple Accelerate platform. Using otool to look at the linked file:

>>> otool -L /usr/local/lib/python2.7/site-packages/numpy/core/_dotblas.so
/usr/local/lib/python2.7/site-packages/numpy/core/_dotblas.so:
    /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate (compatibility version 1.0.0, current version 4.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)

I'm going to see if it's possibly a numpy incompatibility since I'm using a dev version by uninstalling/reinstalling numpy/scipy/theano. I'll update once that's done.

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

What is the error message in DebugMode? It might be informative.

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

Good point -- this is the DebugMode error:

>>> v = tt.vector()
>>> f = theano.function([v], tt.dot(v, v), mode='DebugMode')
>>> testval = np.arange(5).astype(theano.config.floatX)
>>> print f(testval)
WARNING: ('Stride mismatch', ((1, 5), (1, 5), (20, 4), (4, 4), 'DimShuffle{1,0}'))
---------------------------------------------------------------------------
BadThunkOutput                            Traceback (most recent call last)
<ipython-input-1-e15c4b233b89> in <module>()
      2 f = theano.function([v], tt.dot(v, v), mode='DebugMode')
      3 testval = np.arange(5).astype(theano.config.floatX)
----> 4 print f(testval)

/Users/jlowin/git/Theano/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
    578         t0_fn = time.time()
    579         try:
--> 580             outputs = self.fn()
    581         except Exception:
    582             if hasattr(self.fn, 'position_of_error'):

/Users/jlowin/git/Theano/theano/compile/debugmode.pyc in deco()
   2065                 TensorType.filter_checks_isfinite = self.maker.mode.check_isfinite
   2066                 try:
-> 2067                     return f()
   2068                 finally:
   2069                     # put back the filter_checks_isfinite

/Users/jlowin/git/Theano/theano/compile/debugmode.pyc in f()
   1944                                             thunk1='perform', val1=r_vals[r],
   1945                                             thunk2='c_code', val2=storage_map[r][0],
-> 1946                                             inputs_val=inputs_val)
   1947                             else:
   1948                                 #print >> sys.stderr, i, "DEBUGMODE storing reference output %x" % id(storage_map[r][0])

BadThunkOutput: BadThunkOutput
  variable    : CGemv{no_inplace}.0
  Outputs Type: TensorType(float32, (True,))
  Outputs Shape: (1,)
  Outputs Strides: (4,)
  Inputs Type : [TensorType(float32, (True,)), TensorType(float32, scalar), TensorType(float32, row), TensorType(float32, vector), TensorType(float32, scalar)]
  Inputs Shape: [(1,), (), (1, 5), (5,), ()]
  Inputs Strides: [(4,), (), (20, 4), (4,), ()]
  Apply   : CGemv{no_inplace}(TensorConstant{(1,) of 0.0}, TensorConstant{1.0}, InplaceDimShuffle{x,0}.0, <TensorType(float32, vector)>, TensorConstant{0.0})
  thunk1  : perform
  thunk2  : c_code
  val1    : [ 30.]
  val2    : [ 0.]
  op      : <class 'theano.tensor.blas_c.CGemv'>
  Value 1 : shape, dtype, strides, min, max, n_inf, n_nan: (1,) float32 (4,) 30.0 30.0 0 0
  Value 2 : shape, dtype, strides, min, max, n_inf, n_nan: (1,) float32 (4,) 0.0 0.0 0 0
  Max Abs Diff:  30.0
  Mean Abs Diff:  30.0
  Median Abs Diff:  30.0
  Std Abs Diff:  0.0
  Max Rel Diff:  1.0
  Mean Rel Diff:  1.0
  Median Rel Diff:  1.0
  Std Rel Diff:  0.0

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

I returned to stable NumPy 1.7 but am still having this issue.

@nouiz
Copy link
Member

nouiz commented Feb 20, 2013

So from my understanding the problem is related to your blas library. To confirm this, can you test with this Theano flag?

blas.ldflags=

This will force to use numpy call. It should work. Do you have another blas? I see 2 possible causes, it don't like how we call it and generate an error that we don't catch, or your blas library is bugged.

In all cases, we will need to add as a test your code to make sure it is detected when people run Theano's tests.

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

So the good news is that with that flag, it does work.

The bad news is that my blas is the standard one that Apple ships with every Mac in the Accelerate framework, so I've never touched it.

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

It may be because we are using BLAS calls with undefined behaviour, that work correctly where we tested, but that are in general incorrect.
For instance, I'm not sure the value we pass as Sx1 and Sx0 are correct for a call to dot_, because they were mucked with to be acceptable for gemv_ in the case x was a row or column vector.
I'll try to have access to a machine that can reproduce the bug. Otherwise, could you help me test instrumented versions of the code, so we understand exactly what's happening?

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

Of course, let me know anything you want to run. I will try to find some other Macs as well.

I should note that that on the two machines I am testing on right now, check_blas.py runs about 20% slower with blas.ldflags =, so if it's possible to find a better solution, that's definitely preferable.

@nouiz
Copy link
Member

nouiz commented Feb 20, 2013

It is normal that this flag slow thing down. It mean we don't call blas directly, but we always use the numpy python interface.

If you can tell us the exact value of paremetter passed to sdot_(), it would help us as @lamblin think this could be an issue.

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

Sure -- I'm not too familiar with c, though. Maybe you could write a line here for me to paste into blas_c at the appropriate spot, and I'll tell you the result? Sorry for the inconvenience.

@nouiz
Copy link
Member

nouiz commented Feb 20, 2013

What about this line?

You need to modify the method c_code_cache_version of the same op to return an empty tuple. This mean to always recompile it, so if needed you can change more that line without always changing the version.

printf("?dot_ params: %%d %%p %%d %%p %%d\\n", &Nx1, (float*)(PyArray_DATA(%(xx)s)), &Sx1, (float*)yy_data, &Sy);

The multiple %% and // are needed as python will interprete the line before it get passed to g++

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

It would also be nice to have the actual PyArray_STRIDES(%(xx)s)[0] and [1] in addition of Sx0 and Sx1.

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

This is the output, if it helps:

?dot_ params: 1430203228 0x7f8c5ad0e9f0 1430203220 0x7f8c5ad0e9f0 1430203212
?dot_ strides: 20 4

@nouiz
Copy link
Member

nouiz commented Feb 20, 2013

I made an error in my line, can you use this one:

 printf("?dot_ params: %%d %%p %%d %%p %%d\\n", Nx1, (float*)(PyArray_DATA(%(xx)s)), Sx1, (float*)yy_data, Sy);

I got:

?dot_ params: 5 0x4025560 1 0x4025560 1
?dot_ strides: 20 4

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

I have:

?dot_ params: 5 0x7fb38a4475a0 1 0x7fb38a4475a0 1
?dot_ strides: 20 4

@nouiz
Copy link
Member

nouiz commented Feb 20, 2013

@lamblin do you see a problem with the fct input? I don't. The prt are aligned and the other paramter seam good.

Do one of you know how to check for BLAS error?

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

Regarding BLAS errors, I remember that you can declare a function with an appropriate name, and then this function will be used instead of BLAS error checking one (that does not do much). I remember having tried to do that in the past, but it's quite blurry in my mind.
Regarding the values, let me think about it for a while. The problem might be that the memory is shared between both arguments of dot_.

@nouiz
Copy link
Member

nouiz commented Feb 20, 2013

@jlowin can you test with 2 differents vector? Do this fix this case?

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

Unfortunately I still get 0.0 with this code:

import numpy as np

import theano
import theano.tensor as tt
v = tt.vector()
u = tt.vector()
f = theano.function([v, u], tt.dot(v, u))
testval = np.arange(5).astype(theano.config.floatX)
print f(testval, testval)

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

The error handling function seems to be xerbla. You may have to define it as xerbla_.
@jlowin Can you also try with something else than range(5)? Maybe range(3), range(7), range(12).
Also, I think that it would be worth testing it with two different ndarrays, for instance f(testval, testval+1).

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

@lamblin here are more test cases. I get 0.0 for all of them with floatX=='float32'.

import numpy as np

import theano
import theano.tensor as tt

v = tt.vector()
u = tt.vector()
f = theano.function([v, u], tt.dot(v, u))

for n in [3, 5, 7, 12]:
    testval1 = np.arange(n).astype(theano.config.floatX)
    testval2 = 10 + np.arange(n).astype(theano.config.floatX)
    print f(testval1, testval2)

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

OK, thanks.
I tried to reproduce your problem on a Mac, but I did not succeed yet.
What is in theano.config.blas.ldflags exactly? Did you change other things in theano.config (besides floatX, I guess)?

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

Hm, so strange. One of the machines I'm testing on is almost brand new, I can't think of anything that would have affected the blas but I will keep trying to find something off. FWIW, I'm using Homebrew to set up my environment.

theano.config.blas.ldflags is simply '-lblas'

At the moment the only active flag in my .theanorc is floatX:

[global]
floatX = float32

(and there are a couple other flags, like device, that I've commented out while working on this problem)

@nouiz
Copy link
Member

nouiz commented Feb 20, 2013

@lamblin how did you installed python/numpy/blas? Can you make sure it use apple blas?

I think that if you used EPD, it would work, as it use another BLAS.

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

@nouiz: It's not EPD, but I'm not sure how it was installed. theano.config.blas.ldflags is -lblas, too, I don't know how to check how it's resolved.
@jlowin: can you add the following printf statements, to help us understand where things go wrong? The first ones should be the same as before.

printf("?dot_ params: %%d %%p %%d %%p %%d\\n", Nx1, (float*)(PyArray_DATA(%(xx)s)), Sx1, (float*)yy_data, Sy);
printf("?dot_ strides: %%d %%d\\n", PyArray_STRIDES(%(xx)s)[0], PyArray_STRIDES(%(xx)s)[1]);
printf("?dot_ dims: %%d %%d\\n", PyArray_DIMS(%(xx)s)[0], PyArray_DIMS(%(xx)s)[1]);
printf("?dot_ alpha: %%f fbeta %%f dbeta\\n", alpha, fbeta, dbeta);
float dot_result = sdot_(&Nx1, (float*)(PyArray_DATA(%(xx)s)), &Sx1, (float*)yy_data, &Sy);
printf("?dot_ result: %%f\\n", dot_result);

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

@nouiz: so, "otool -L" (the equivalent of ldd) gives, in particular:

/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib (compatibility version 1.0.0, current version 219.0.0)

which looks like Apple's BLAS, but it's probably an older one.

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

@lamblin here's the output for the four test cases

?dot_ params: 3 0x7fac3e1c5800 1 0x7fac3e1d13c0 1
?dot_ strides: 12 4
?dot_ dims: 1 3
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
0.0

?dot_ params: 5 0x7fac3e1b66a0 1 0x7fac3e1b4400 1
?dot_ strides: 20 4
?dot_ dims: 1 5
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
0.0

?dot_ params: 7 0x7fac3e1b4fd0 1 0x7fac3e1b6c10 1
?dot_ strides: 28 4
?dot_ dims: 1 7
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
0.0

?dot_ params: 12 0x7fac3e09abf0 1 0x7fac3e1e7cf0 1
?dot_ strides: 48 4
?dot_ dims: 1 12
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
0.0

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

OK, so the call to sdot_ itself returns 0.0, which is puzzling. I'll try to come up with a pure C file that you can compile and run, to see if it reproduces the problem.

In the mean time, can you add the following printing statement? I just want to check that the actual data being used is not 0.

printf("?dot_ data[:3]: %%f %%f %%f\\n", ((float*)(PyArray_DATA(%(xx)s)))[0], ((float*)(PyArray_DATA(%(xx)s)))[1], ((float*)(PyArray_DATA(%(xx)s)))[2]);

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

Sure:

?dot_ params: 3 0x7fc644500570 1 0x7fc644500c80 1
?dot_ strides: 12 4
?dot_ dims: 1 3
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ data[:3]: 10.000000 11.000000 12.000000
0.0

?dot_ params: 5 0x7fc644500c60 1 0x7fc644503430 1
?dot_ strides: 20 4
?dot_ dims: 1 5
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ data[:3]: 10.000000 11.000000 12.000000
0.0

?dot_ params: 7 0x7fc6445014f0 1 0x7fc644501460 1
?dot_ strides: 28 4
?dot_ dims: 1 7
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ data[:3]: 10.000000 11.000000 12.000000
0.0

?dot_ params: 12 0x7fc644500550 1 0x7fc644500000 1
?dot_ strides: 48 4
?dot_ dims: 1 12
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ data[:3]: 10.000000 11.000000 12.000000
0.0

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

Sorry about the iterative debug process, I did not realize that you still have x != y.
Could you add this last printing statement?

printf("?dot_ y data[:3]: %%f %%f %%f\\n", ((float*)yy_data)[0], ((float*)yy_data)[1], ((float*)yy_data)[2]);

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

That's ok, I appreciate the help. Here's the output. It appears that the data is there.

?dot_ params: 3 0x7fc6ee01d2e0 1 0x7fc6ee0c4380 1
?dot_ strides: 12 4
?dot_ dims: 1 3
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ x data[:3]: 10.000000 11.000000 12.000000
?dot_ y data[:3]: 0.000000 1.000000 2.000000
0.0

?dot_ params: 5 0x7fc6ee442270 1 0x7fc6ee44b220 1
?dot_ strides: 20 4
?dot_ dims: 1 5
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ x data[:3]: 10.000000 11.000000 12.000000
?dot_ y data[:3]: 0.000000 1.000000 2.000000
0.0

?dot_ params: 7 0x7fc6ee32fc70 1 0x7fc6ee3d61d0 1
?dot_ strides: 28 4
?dot_ dims: 1 7
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ x data[:3]: 10.000000 11.000000 12.000000
?dot_ y data[:3]: 0.000000 1.000000 2.000000
0.0

?dot_ params: 12 0x7fc6ee443200 1 0x7fc6ee44b220 1
?dot_ strides: 48 4
?dot_ dims: 1 12
?dot_ alpha: 1.000000 fbeta 0.000000 dbeta
?dot_ result: 0.000000
?dot_ x data[:3]: 10.000000 11.000000 12.000000
?dot_ y data[:3]: 0.000000 1.000000 2.000000
0.0

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

Thanks! So, there really seems to be something wrong with your sdot_ function.

Here is a small C program, that you can compile with gcc blas_dot_test.c -o blas_dot_test -lblas. It is completely independent from Python and Theano. When I compile and run it, it prints r: 30.000000. Can you check if it fails for you? If so, is there any message printed on stdout or stderr?

#include <stdio.h>
float sdot_(int*, float*, int*, float*, int*);

int main(int argc, char** argv)
{
    int Nx = 5;
    int Sx = 1;
    float x[5] = {0, 1, 2, 3, 4};
    float r = sdot_(&Nx, x, &Sx, x, &Sx);
    printf("r: %f\n", r);
    return 0;
}

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

It may be related to http://stackoverflow.com/questions/6887229/problem-on-multiplying-a-matrix-and-a-vector-with-veclib-framework-of-mac-os-x-1.
So, maybe we should not use sdot_ on MacOS, or maybe there is another lib than veclib that we should link with first.

@jlowin
Copy link
Contributor Author

jlowin commented Feb 20, 2013

Thank you -- so this fails, it returns 0.000000. It looks like you're definitely right and there is a problem with sdot.

With these clues, I found some other people experiencing something similar with sdot in recent versions of Mac OS. See for example Homebrew/legacy-homebrew#6649 and http://www.macresearch.org/lapackblas-fortran-106. This is all a little over my head, but here's one response in particular (from the second link), wrt to Apple's implementation of sdot:

The issue with SDOT on 64-bit Intel is a legitimate bug with the BLAS not properly obeying the g77/f2c calling conventions. Specifically, the g77/f2c convention is for single precision results to be returned as double precision, but the OS X BLAS is incorrectly returning the result in single precision. When that single precision result is interpreted as a double precision value by the calling routine, the result is a very tiny (denormal) double-precision value, which rounds to exactly zero when converted to single.

So I don't know if that's right or not, but it seems to fit the symptoms. Maybe some unique circumstances are making this only affect a small number of people. I will continue to see what I can learn.

@lamblin
Copy link
Member

lamblin commented Feb 20, 2013

Thanks, that would make sense.
The obvious solution is not to use sdot_ anymore.
If we want to continue using it, we'll have to figure out how to make it work with recent MacOS versions.
Does it work if you replace the prototype at the top of the C file with the following?

double sdot_(int*, float*, int*, float*, int*);

@jlowin
Copy link
Contributor Author

jlowin commented Feb 21, 2013

Sorry for the delay -- yes, declaring it as double does work. Maybe that confirms the description of the problem above.

@lamblin
Copy link
Member

lamblin commented Feb 21, 2013

So, If I try the same (defining sdot_ as returning a double) on Linux, I have a "symbol not defined" error.
If I try that on the Mac I have access to (Mac OS X 10.6.8), I actually get 0.00...
One solution would be testing which prototype should be used, at each run, by compiling and running code similar to the sample above.
Simply reverting the commit in question would not be enough, because sdot_ is also used in other Ops, where it does not have an easy allternative.
Another solution (if it works for you) would be to use sdsdot_ instead of sdot_, and remove the prototype for sdot_ from the blas headers included in Theano.

@lamblin
Copy link
Member

lamblin commented Feb 21, 2013

So, when you have time, can you try the following?

#include <stdio.h>
float sdsdot_(int*, float*, float*, int*, float*, int*);

int main(int argc, char** argv)
{
    int Nx = 5;
    int Sx = 1;
    float x[5] = {0, 1, 2, 3, 4};
    float z = 0;
    float r = sdsdot_(&Nx, &z, x, &Sx, x, &Sx);
    printf("z: %f, r: %f\n", z, r);
    return 0;
}

z should still be 0 at the end.
Edit: corrected sdot_ -> sdsdot_.

@jlowin
Copy link
Contributor Author

jlowin commented Feb 21, 2013

Sorry for the delay, I'm travelling today and tomorrow. This did not work (still 0). Also, if you're running 10.6.8, it bears out that this was a change in Lion (10.7).

Based on the suggestions in the stackoverflow post you mentioned as well as here I tried calling the Accelerate framework's own sdot and it did actually work, returning 30.0. (need to add #include <Accelerate/Accelerate.h> and call cblas_sdot(Nx, x, Sx, x, Sx)). Now, I suppose that might be a Mac-only solution, and perhaps this isn't useful at all, but I'll mention it just in case.

@lamblin
Copy link
Member

lamblin commented Feb 21, 2013

That seems a good solution to me. We could change the BLAS headers for Mac to #include <Accelerate/Accelerate.h> and redefine sdot_ to call cblas_sdot, instead of just having the forward declaration.

@lamblin
Copy link
Member

lamblin commented Feb 21, 2013

The thing is that, however, we only want to do that if the specified blas flags are "-lblas", because other versions of blas would not have that problem, and we don't want to use Accelerate in that case.

@lamblin
Copy link
Member

lamblin commented Feb 22, 2013

Also, do you have to specify different linking flags to use the cblas interface, like -lcblas? That would make it more difficult to integrate cleanly.

@jlowin
Copy link
Contributor Author

jlowin commented Feb 22, 2013

No -- I used the '-lblas' flag. Actually exactly the same as your original compile instruction.

@lamblin
Copy link
Member

lamblin commented Feb 22, 2013

@jlowin Cool, then I have a plan to solve the problem in an unobtrusive way.
I'll start a development branch for that on Monday. Thanks for your help!

@jlowin
Copy link
Contributor Author

jlowin commented Feb 22, 2013

Great to hear! Thanks for your help too!

lamblin added a commit to lamblin/Theano that referenced this issue Feb 25, 2013
They should detect the bug described in Theanogh-1240
@lamblin
Copy link
Member

lamblin commented Feb 25, 2013

@jlowin: When you have time, can you check out the branch at gh-1250, and tell me if it correctly solves your problem? I'm still not able to test the fix here.

@jlowin
Copy link
Contributor Author

jlowin commented Feb 25, 2013

@lamblin -- unfortunately it's not there yet (the assertion fails and kills my Python). I'm tracing it as best as I can -- it looks like the first code snippet compiles but gives the wrong answer (as expected), but the second code snippet fails to compile. Do we need to link to the Accelerate library explicitly?

@lamblin
Copy link
Member

lamblin commented Feb 25, 2013

Oops, sorry, I think I know what's wrong. I'll update the PR shortly.

@lamblin
Copy link
Member

lamblin commented Feb 25, 2013

@jlowin Updated.

@jlowin
Copy link
Contributor Author

jlowin commented Feb 25, 2013

@lamblin Ok, getting closer! This time it failed because the 'fabs' function was not declared, so I replaced it with the direct comparison from the first snippet. After that, it fails with the following:

/Users/jlowin/.theano/compiledir_Darwin-12.2.1-x86_64-i386-64bit-i386-2.7.3/tmpMc8DVL/mod.cpp: In function ‘float sdot_(int*, float*, int*, float*, int*)’:
/Users/jlowin/.theano/compiledir_Darwin-12.2.1-x86_64-i386-64bit-i386-2.7.3/tmpMc8DVL/mod.cpp:214: error: ‘cblas_sdot’ was not declared in this scope

However, before you added the extern... line, it quit python; now it just raises this error in python but doesn't quit.

@lamblin
Copy link
Member

lamblin commented Feb 25, 2013

Right, I forgot to add the declaration of cblas_sdot in the actual header text, I just corrected that.
So, now the detection itself seems to work correctly.
Regarding the quitting of Python, I intended that as a last measure if everything fails, so at least no incorrect computation is done. I'm not sure what I can do that would not prevent compilation and use of other blas functions, but really prevent sdot_ from being called.

@jlowin
Copy link
Contributor Author

jlowin commented Feb 26, 2013

I'm getting the correct results from my test code -- so I think this is a working solution.

However, I'm a little hesitant to declare "it works" based only on my experience -- hopefully we will find a way to test it on some other machines as well.

Thanks so much for all the time you put into this!

@jlowin
Copy link
Contributor Author

jlowin commented Feb 27, 2013

I was able to test this on a second machine, FWIW, and it works there too.

@nouiz
Copy link
Member

nouiz commented Feb 27, 2013

gh-1250 fixed this, so I close.

thanks

@nouiz nouiz closed this as completed Feb 27, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants