You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.
bcolz ctable seems to suffer from a numpy bottleneck:
import numpy as np
import bcolz
a = bcolz.open(rootdir='mydata.bcolz')
a
ctable((8769282,), [('date', 'S10'), ('valueSignificand', '<i4'), (
'valueExponent', 'i1'), ('id', 'S12'), ('location', 'S20'), ... some other columns ...)
nbytes: 1.05 GB; cbytes: 106.91 MB; ratio: 10.01
cparams := cparams(clevel=5, shuffle=True, cname='blosclz')
rootdir := mydata.bcolz'
... snip ...
%timeit -r10 test = a[['id', 'date', 'location', 'valueSignificand', 'valueExponent']][:]
1 loops, best of 10: 2.59 s per loop
%%timeit -r10
test = np.ndarray(shape=(len(a),), dtype=a[['id', 'date', 'location', 'valueSignificand', 'valueExponent']].dtype)
test['id'] = a['id'][:]
test['date'] = a['date'][:]
test['location'] = a['location'][:]
test['valueSignificand'] = a['valueSignificand'][:]
test['valueExponent'] = a['valueExponent'][:]
1 loops, best of 10: 2.59 s per loop
%%timeit -r10
test1 = a['id'][:]
test2 = a['date'][:]
test3 = a['location'][:]
test4 = a['valueSignificand'][:]
test5 = a['valueExponent'][:]
1 loops, best of 10: 1.16 s per loop
bcolz.print_versions()
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
bcolz version: 0.9.0-dev
NumPy version: 1.9.2
Blosc version: 1.5.5.dev ($Date:: 2015-04-14 #$)
Blosc compressors: ['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib']
Numexpr version: 2.3.1
Python version: 2.7.9 |Continuum Analytics, Inc.| (default, Dec 18 2014, 17:00:07) [MSC v.1500 32 bit (Intel)]
Byte-ordering: little
Detected cores: 2
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
It appears that the assignment to the numpy recarray is a factor >2 slower. In addition, (on my machine at least) this prevents using the multi-core processor with ctables efficiently. This is especially a shame since it works fairly well for carrays.
Is this a known issue with ctable? Can anybody think of a way of fixing this bottleneck? I would be happy to help.
The text was updated successfully, but these errors were encountered:
bcolz ctable seems to suffer from a numpy bottleneck:
It appears that the assignment to the numpy recarray is a factor >2 slower. In addition, (on my machine at least) this prevents using the multi-core processor with ctables efficiently. This is especially a shame since it works fairly well for carrays.
Is this a known issue with ctable? Can anybody think of a way of fixing this bottleneck? I would be happy to help.
The text was updated successfully, but these errors were encountered: