New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_mnist.py doesn't work on Mac OS X's vecLib #704

Closed
tkng opened this Issue Nov 30, 2015 · 7 comments

Comments

Projects
None yet
2 participants
@tkng
Copy link
Contributor

tkng commented Nov 30, 2015

When I executed train_mnist.py on Mac OS X, accuracy seems too low. Here's the result:

load MNIST dataset
epoch 1
graph generated
/Users/tkng/.pyenv/versions/miniconda3-3.9.1/lib/python3.4/site-packages/chainer/functions/loss/softmax_cross_entropy.py:51: RuntimeWarning: divide by zero encountered in log
  y = (numpy.log(p) * (t.flat != self.ignore_label)).sum(keepdims=True) \
train mean loss=inf, accuracy=0.6420666663100322
test  mean loss=inf, accuracy=0.5777999994158745
epoch 2
train mean loss=inf, accuracy=0.6492999997238318
test  mean loss=inf, accuracy=0.7515000009536743
(snip)
epoch 20
train mean loss=inf, accuracy=0.7512333329518636
test  mean loss=inf, accuracy=0.7049000000953675

I doubt the vecLib (default Blas library in Mac OS X) as the root cause of this problem, because when I install OpenBlas, training accuracy go up to around 99%, it seems working correctly.

Maybe this problem is related to #584, but I'm not sure.

If this problem is reproducible, it's nice to add some notes to use other BLAS library in the installation procedure.

I'm tested latest Chainer (1.5.0.2) on CPU mode, (because I have no GPU). OS is Mac OS X 10.11 (el capitan).

@unnonouno

This comment has been minimized.

Copy link
Member

unnonouno commented Dec 2, 2015

I made a stable version of softmax cross entropy #712. Can you try it?

@tkng

This comment has been minimized.

Copy link
Contributor

tkng commented Dec 2, 2015

Hi, I tried #712. Here's the result.

load MNIST dataset
epoch 1
graph generated
train mean loss=70.7536150876681, accuracy=0.5026999987661839
test  mean loss=178.1525563812256, accuracy=0.4335999980568886
epoch 2
train mean loss=437.69503204981487, accuracy=0.4809999985123674
test  mean loss=71.92813982486724, accuracy=0.7940999990701676
(snip)
epoch 20
train mean loss=24089.5984366862, accuracy=0.6338166661560536
test  mean loss=36033.73759765625, accuracy=0.4923999959230423

Unfortunately, performance is still worse... How about on your environment?

@unnonouno

This comment has been minimized.

Copy link
Member

unnonouno commented Dec 6, 2015

In my environment it maybe works well:

epoch 1
graph generated
train mean loss=0.19052520078917345, accuracy=0.9424500016309321
test  mean loss=0.11244978020898998, accuracy=0.9639000070095062
epoch 2
train mean loss=0.07452765207039193, accuracy=0.9771833428740502
test  mean loss=0.07413073327858001, accuracy=0.9763000059127808
epoch 3
train mean loss=0.047960470671532675, accuracy=0.9848000114162763
test  mean loss=0.08244042314967373, accuracy=0.9765000087022782
epoch 4
train mean loss=0.034554468141674684, accuracy=0.9886333421866099
test  mean loss=0.06305672405585938, accuracy=0.9816000062227249

I uninstalled mkl, and open blas.

Here is my numpy.show_config():

% python
Python 3.4.1 (default, Sep 15 2014, 13:51:31)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.show_config()
atlas_blas_threads_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
lapack_opt_info:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
    extra_compile_args = ['-msse3']
blas_opt_info:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
    extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
mkl_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
atlas_3_10_blas_threads_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
atlas_3_10_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
atlas_3_10_blas_info:
  NOT AVAILABLE
@tkng

This comment has been minimized.

Copy link
Contributor

tkng commented Dec 7, 2015

here's my result:

python
Python 3.4.3 |Continuum Analytics, Inc.| (default, Oct 20 2015, 14:27:51) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.show_config()
atlas_3_10_blas_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
atlas_3_10_blas_threads_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
blas_opt_info:
    extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
atlas_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
lapack_opt_info:
    extra_compile_args = ['-msse3']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
atlas_3_10_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE

I notice the difference between mine and yours, say, I'm using miniconda python.
Therefore I uninstalled my miniconda and reinstalled python 3.4.3 from pyenv (pyenv install 3.4.3), but situation wasn't changed.

Here's my new result of numpy.show_config()

python                                                                                                                                                                                 Python 3.4.3 (default, Dec  7 2015, 19:21:52) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> numpy.show_config()
atlas_3_10_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE
lapack_opt_info:
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
    extra_compile_args = ['-msse3']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
blas_opt_info:
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
    extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
atlas_3_10_blas_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
atlas_3_10_blas_threads_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE

The result of train_mnist.py is still bad... (accuracy is always under 0.7, even for training data)

I have no clue for now, I hope someone will resolve the issue in the near future.
Now, let's satisfied with the work-around. (install openblas!)

@unnonouno

This comment has been minimized.

Copy link
Member

unnonouno commented Dec 8, 2015

Version of my LLVM (=5.1) differs from yours (=7.0.0). Is it related?

@tkng

This comment has been minimized.

Copy link
Contributor

tkng commented Dec 11, 2015

Maybe so, maybe not.

Since it's hard to install LLVM 5.1, I installed gcc (5.2.0) from homebrew instead. And then, I removed OpenBlas, python and numpy, then installed my python (by pyenv) and numpy (from source). This means, all code other than vecLib are compiled by gcc.

The result of train_mnist.py is still bad (accuracy for both of train/test is around 0.7.)

Then, I installed OpenBlas and numpy from source. Surprisingly, the result of train_mnist.py is bad, too.
Once, result of train_mnist.py was good with OpenBlas. Now I cannot reproduce the result.

Maybe it's a good time to reinstall my OS...

@unnonouno unnonouno added this to the 1.7.1 milestone Mar 2, 2016

@unnonouno unnonouno self-assigned this Mar 2, 2016

@unnonouno

This comment has been minimized.

Copy link
Member

unnonouno commented Mar 2, 2016

Other people who uses Mac OSX faced the same problem. We recommend to use openblas or instead of veclib.
I'll write a tips about this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment