Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

Minimally upgrade blosc 1.7.0 #285

Merged
merged 14 commits into from Dec 16, 2015

Conversation

alimanfoo
Copy link
Contributor

This PR contains the bare minimum required to upgrade c-blosc to 1.7.0.

Test suite passes with 5 tests skipped on my Linux laptop so I don't believe any code changes are required within bcolz itself to accommodate this upgrade.

@alimanfoo
Copy link
Contributor Author

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
bcolz version:     0.12.2.dev11
bcolz git info:    b'0.12.1-11-g2ec4081'
NumPy version:     1.10.1
Blosc version:     1.7.0 ($Date:: 2015-07-05 #$)
Blosc compressors: ['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib']
Numexpr version:   2.4.6
Python version:    3.4.3+ (default, Oct 14 2015, 16:03:50) 
[GCC 5.2.1 20151010]
Platform:          linux-x86_64
Byte-ordering:     little
Detected cores:    4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Performing the complete test suite!
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
........................................................................................................................................................................................................................................................................s...................................................................................................................................................................................................................................................................................................................ssss......................................................................................................................................................................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 966 tests in 19.881s

OK (skipped=5)

This was referenced Dec 14, 2015
@alimanfoo
Copy link
Contributor Author

Er, help! How do I get "/arch:sse2" added to the list of compiler options when compiling under windows? Not via CFLAGS I guess, this from the latest appveyor build...

C:\Users\appveyor\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -DHAVE_LZ4=1 -DHAVE_SNAPPY=1 -DHAVE_ZLIB=1 -Ibcolz -Ic-blosc\blosc -Ic-blosc/internal-complibs\lz4-1.7.0 -Ic-blosc/internal-complibs\snappy-1.1.1 -Ic-blosc/internal-complibs\zlib-1.2.8 -IC:\Python26_32\lib\site-packages\numpy\core\include -IC:\Python26_32\include -IC:\Python26_32\PC /Tcc-blosc/blosc\bitshuffle-sse2.c /Fobuild\temp.win32-2.7\Release\c-blosc/blosc\bitshuffle-sse2.obj

@FrancescAlted
Copy link
Member

Hmm, perhaps the problem is the output of platform.machine() does not match 'i.86' on win32?

@alimanfoo
Copy link
Contributor Author

One way to find out...

@alimanfoo
Copy link
Contributor Author

...so it looks like platform.machine() is returning "AMD64" on all appveyor builds. I guess this is the underlying machine, whereas we want to know if we're building against 32-bit or 64-bit Python?

@FrancescAlted
Copy link
Member

Hmm tricky. Perhaps this may help: https://docs.python.org/2/library/platform.html

@FrancescAlted
Copy link
Member

Here it is a trusty way to detect if the Python interpreter is 32 or 64bit:

In []: import ctypes
In []: 32 if ctypes.sizeof(ctypes.c_voidp) == 4 else 64, 'bit CPU'
Out[]: (64, 'bit CPU')

@alimanfoo
Copy link
Contributor Author

The appveyor builds now detect 32bit correctly, but the compiler option '/arch:sse2' is being ignored, generating the "SSE2 is not supported by the target architecture/platform and/or this compiler" error. From previous experiments the machine the build is running on is AMD64, I don't know if this makes a difference? Any suggestions?

@alimanfoo
Copy link
Contributor Author

This project looks like it's doing a 32-bit build with /arch:sse2, not sure what's different.

@FrancescAlted
Copy link
Member

Yes, that's really bizarre (specially because the project you referred to has almost the same setup!). Well, I suppose this is a weirdness of the appveyor environment, so do not spend much time on this. Just tell me when you are done with tests and I will merge.

@alimanfoo
Copy link
Contributor Author

The test suite passes reliably, however when running the scripts in the bench/ directory I am getting intermittent segfaults. E.g., running bench/iter.py repeatedly, I will get ~10 runs fine, then a segfault. It is not reliably reproducible, it's intermittent. When recompiling with -O0 or -O1 this problem goes away. I'm sorry I don't have the skills to debug a segfault, so I'll leave this here for now.

Just in case we lose the thread, current status of this PR is two unresolved issues:

  • Failing build under Windows with 32-bit Python because /arch:sse2 option is not recognised by compiler
  • Intermittent segfaults when compiled with -O2 (default)

@alimanfoo alimanfoo mentioned this pull request Dec 15, 2015
@FrancescAlted
Copy link
Member

I have spent quite a long time exercising tests and benchmarks on two Linux boxes (Ubuntu 15.04 + gcc 4.9.2 and Gentoo 2.2 + gcc 4.9.3) and I have not being able to see any single glitch. Which platform and compiler are you using?

All in all, this looks pretty good to me, so I want to merge soon (unless you think it is a bit green yet). We can catch with the Windows issues later on.

@alimanfoo
Copy link
Contributor Author

I'm on Ubuntu 15.10, gcc (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010.

FWIW I'm in favour of merging, there's nothing else I can add. Would be useful to have in master so others can also run tests and benchmarks.

FrancescAlted added a commit that referenced this pull request Dec 16, 2015
@FrancescAlted FrancescAlted merged commit 562fd30 into Blosc:master Dec 16, 2015
@FrancescAlted
Copy link
Member

This has been merged, but we should dig a bit on what's going on with gcc 5.2.1. @alimanfoo could you please try with a gcc version less than 5.0? Using a clang compiler could be also useful.

@alimanfoo
Copy link
Contributor Author

Btw found this in numpy https://github.com/numpy/numpy/blob/master/numpy/distutils/cpuinfo.py has methods for SSE2 detection (although not AVX2 sadly).

@FrancescAlted
Copy link
Member

Yes. I wonder how difficult would be adding AVX2 detection to this.

2016-01-14 11:10 GMT+01:00 Alistair Miles notifications@github.com:

Btw found this in numpy
https://github.com/numpy/numpy/blob/master/numpy/distutils/cpuinfo.py has
methods for SSE2 detection (although not AVX2 sadly).


Reply to this email directly or view it on GitHub
#285 (comment).

Francesc Alted

@alimanfoo
Copy link
Contributor Author

I also found https://github.com/workhorsy/py-cpuinfo which may enable AVX2
detection, will try it out.

On Thursday, January 14, 2016, FrancescAlted notifications@github.com
wrote:

Yes. I wonder how difficult would be adding AVX2 detection to this.

2016-01-14 11:10 GMT+01:00 Alistair Miles <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>:

Btw found this in numpy
https://github.com/numpy/numpy/blob/master/numpy/distutils/cpuinfo.py
has
methods for SSE2 detection (although not AVX2 sadly).


Reply to this email directly or view it on GitHub
#285 (comment).

Francesc Alted


Reply to this email directly or view it on GitHub
#285 (comment).

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Tel: +44 (0)1865 287721

@FrancescAlted
Copy link
Member

Looks pretty good. A shame that AVX2 and NEON (for ARM) are not there
yet, but that should be easy to add (specially the former).

2016-01-18 11:07 GMT+01:00 Alistair Miles notifications@github.com:

I also found https://github.com/workhorsy/py-cpuinfo which may enable AVX2
detection, will try it out.

On Thursday, January 14, 2016, FrancescAlted notifications@github.com
wrote:

Yes. I wonder how difficult would be adding AVX2 detection to this.

2016-01-14 11:10 GMT+01:00 Alistair Miles <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>:

Btw found this in numpy
https://github.com/numpy/numpy/blob/master/numpy/distutils/cpuinfo.py
has
methods for SSE2 detection (although not AVX2 sadly).


Reply to this email directly or view it on GitHub
#285 (comment).

Francesc Alted


Reply to this email directly or view it on GitHub
#285 (comment).

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Tel: +44 (0)1865 287721


Reply to this email directly or view it on GitHub
#285 (comment).

Francesc Alted

@alimanfoo
Copy link
Contributor Author

For the record, another possible approach is to try and compile a short spike program, used e.g. here to detect openmp support in the compiler. I don't think this is the right way to go for SSE2 or AVX2 however, as I think you want to know if the CPU on the target system supports the required feature, not just the compiler. Out of my depth here tho, comments welcome.

@FrancescAlted
Copy link
Member

After having actually tried it out (and not just reading the code, which is beautiful), it turns out that https://github.com/workhorsy/py-cpuinfo works like a charm and detects sse2 and avx2 (probably neon would be detected on ARM processors too). Provided that py-cpuinfo comes with a MIT license, I think the best solution would be including the cpuinfo/cpuinfo.py sources in bcolz.

@alimanfoo
Copy link
Contributor Author

Awesome! I did ask @workhorsy about AXV2 detection - workhorsy/py-cpuinfo#29 - apparently AVX2 detection via CPUID is not implemented, however CPUID is only one of several methods available for detecting, so if it works via other methods then we're good.

@alimanfoo
Copy link
Contributor Author

Sorry for slow response regarding segfaults and gcc. Building current master with gcc-4.9 I see no segfaults. However with gcc 5.2 I still see random intermittent segfaults on my laptop, about 6 in 100 runs of bench/iter.py.

$ gcc-4.9 --version
gcc-4.9 (Ubuntu 4.9.3-5ubuntu1) 4.9.3
$ x86_64-linux-gnu-gcc --version
x86_64-linux-gnu-gcc (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010

@FrancescAlted
Copy link
Member

Do you have a minimal example that segfaults so that I can have a look?

@alimanfoo
Copy link
Contributor Author

Running any of these bench scripts triggers an intermittent segfault:
arange.py, column-iter.py, concat.py, ctable-query.py, eval-profile.py,
expression.py, iter.py, query.py.

I haven't seen a segfault running the other bench scripts but they are
fairly rare (~6 in 100 runs) so I may not have run enough times.

For something minimal, this will trigger a segfault:

import numpy as np
import bcolz

N = 1e6
a = np.arange(N)
b = bcolz.carray(a)

s = sum((v for v in b.iter(2, None, 3) if v < 10))
print(s)

...and so will this:

import numpy as np
import bcolz

N = 1e8
dtype = 'i4'
start, stop, step = 5, N, 4
ac = bcolz.arange(start, stop, step, dtype=dtype)

On Wednesday, January 27, 2016, FrancescAlted notifications@github.com
wrote:

Do you have a minimal example that segfault so that I can have a look?


Reply to this email directly or view it on GitHub
#285 (comment).

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Tel: +44 (0)1865 287721

@alimanfoo
Copy link
Contributor Author

In case it helps..

$ cat /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 58
model name  : Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz
stepping    : 9
microcode   : 0x12
cpu MHz     : 968.261
cache size  : 4096 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 2
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
bugs        :
bogomips    : 4988.63
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:
$ python -c "import bcolz; bcolz.test(heavy=True)"
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
bcolz version:     0.12.2.dev24
bcolz git info:    b'0.12.1-24-g9fb6e1c'
NumPy version:     1.10.1
Blosc version:     1.7.0 ($Date:: 2015-07-05 #$)
Blosc compressors: ['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib']
Numexpr version:   2.4.6
Python version:    3.4.3+ (default, Oct 14 2015, 16:03:50) 
[GCC 5.2.1 20151010]
Platform:          linux-x86_64
Byte-ordering:     little
Detected cores:    4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Performing the complete test suite!
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
........................................................................................................................................................................................................................................................................s.......................................................................................................................................................................................................................................................................................................................ssss......................................................................................................................................................................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 970 tests in 20.049s

OK (skipped=5)

@FrancescAlted
Copy link
Member

Hmm, can you try with c-blosc in master? I have made some changes since
1.7.0 that could help here.

2016-01-27 11:57 GMT+01:00 Alistair Miles notifications@github.com:

In case it helps..

$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz
stepping : 9
microcode : 0x12
cpu MHz : 968.261
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt
bugs :
bogomips : 4988.63
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

$ python -c "import bcolz; bcolz.test(heavy=True)"
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
bcolz version: 0.12.2.dev24
bcolz git info: b'0.12.1-24-g9fb6e1c'
NumPy version: 1.10.1
Blosc version: 1.7.0 ($Date:: 2015-07-05 #$)
Blosc compressors: ['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib']
Numexpr version: 2.4.6
Python version: 3.4.3+ (default, Oct 14 2015, 16:03:50)
[GCC 5.2.1 20151010]
Platform: linux-x86_64
Byte-ordering: little
Detected cores: 4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Performing the complete test suite!
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

........................................................................................................................................................................................................................................................................s.......................................................................................................................................................................................................................................................................................................................ssss......................................................................................................................................................................................................................................................................................................................................................................................................

Ran 970 tests in 20.049s

OK (skipped=5)


Reply to this email directly or view it on GitHub
#285 (comment).

Francesc Alted

@FrancescAlted
Copy link
Member

Just fixed this in c-blosc master. Also, I did some cleanup in the chunk extension for bcolz. I cannot get any segfault, but I am using just gcc 4.9.2 and clang 3.6 here.

@alimanfoo
Copy link
Contributor Author

OK, thanks, will try on my machine.

On Thu, Jan 28, 2016 at 4:26 PM, FrancescAlted notifications@github.com
wrote:

Just fixed this
Blosc/c-blosc@c7b6720
in c-blosc master. Also, I did some cleanup in the chunk extension for
bcolz. I cannot get any segfault, but I am using just gcc 4.9.2 and clang
3.6 here.


Reply to this email directly or view it on GitHub
#285 (comment).

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Tel: +44 (0)1865 287721

@alimanfoo
Copy link
Contributor Author

Still getting segfaults with latest bcolz and c-blosc (38a3e2a and Blosc/c-blosc@c7b6720).

@FrancescAlted
Copy link
Member

Uh. Ok will try to setup an Ubuntu box with gcc 5.x and will have a try myself.

@FrancescAlted
Copy link
Member

Finally I have setup a Debian box (unstable) and compiled latest c-blosc and in bcolz in master using gcc-5 (Debian 5.3.1-7) 5.3.1 20160121, and I cannot get a single segfault either re-running the benchs lots of times, or the scripts that you attached. For example, for the second example that you pasted:

$ for n in {1..100}; do python p2.py; done
$ echo $?
0

I am using a vagrant box (via VirtualBox), but SSE2 is enabled there:

$ cat /proc/cpuinfo
[clip]
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni ssse3 lahf_lm
[clip]

So I would say that your segfaults should be related with you compiler version or something else.

@alimanfoo
Copy link
Contributor Author

FWIW I've tried this on two different machines with slightly different
hardware and get segfaults on both, with both running the same OS (Ubuntu
15.10) and gcc version etc. I guess it must be the compiler version, the
fact that segfaults go away if compiled with -O0 or -O1 suggests so?

On Mon, Feb 1, 2016 at 1:07 PM, FrancescAlted notifications@github.com
wrote:

Finally I have setup a Debian box (unstable) and compiled latest c-blosc
and in bcolz in master using gcc-5 (Debian 5.3.1-7) 5.3.1 20160121, and I
cannot get a single segfault either re-running the benchs lots of times, or
the scripts that you attached. For example, for the second example that you
pasted:

$ for n in {1..100}; do python p2.py; done
$ echo $?
0

I am using a vagrant box (via VirtualBox), but SSE2 is enabled there:

$ cat /proc/cpuinfo
[clip]
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl pni ssse3 lahf_lm
[clip]

So I would say that your segfaults should be related with you compiler
version or something else.


Reply to this email directly or view it on GitHub
#285 (comment).

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Web: http://purl.org/net/aliman
Email: alimanfoo@googlemail.com alimanfoo@gmail.com
Tel: +44 (0)1865 287721

@FrancescAlted
Copy link
Member

Yes, the fact that -O0 or -O1 makes the segfaults go away in gcc 5.2 is a good indication that something might be broken with the compiler optimizer. I have even tried with clang 3.7 and 3.8 on my Debian box and I was unable to reproduce the segfault.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants