Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selecting rows using 'log10' in the query returns wrong results #534

Closed
153957 opened this issue Mar 29, 2016 · 3 comments
Closed

Selecting rows using 'log10' in the query returns wrong results #534

153957 opened this issue Mar 29, 2016 · 3 comments

Comments

@153957
Copy link
Contributor

153957 commented Mar 29, 2016

I have PyTables (v3.2.2) running on multiple systems (OS X, Scientific Linux CERN, and Cent OS).
On the Mac numpy 1.9.4 and numexpr 2.4.4 are installed
On the SLC machine numpy 1.10.4 and numexpr 2.4.6 are installed.

I have a table (sims) with a column (energy) containing values ranging from 1e12 to 1e18. Most values are at full powers of ten (e.g 1e13 and 1e14) or in between (e.g. 10**13.5 and 10**15.5). So to easily query these values I use the log10 option for PyTables queries. The following query works on the Mac, but returns the wrong result on the SLC machine:

import numpy
numpy.log10(sims.read_where('log10(energy) == 12')['energy'][:9])
# On Mac: array([ 12.,  12.,  12.,  12.,  12.,  12.,  12.,  12.,  12.], dtype=float32)
# On SLC: array([ 15.,  12.,  16.,  15.,  17.,  13.,  15.,  15.,  15.], dtype=float32)

So this may not be a PyTables bug perse, perhaps the fault lies with Numexpr. However, I encountered it while using PyTables. It would be good for PyTables to require a properly working version of Numexpr. Also, what is the source of this bug? Numexpr, Numpy, or something else?

We encountered a similar bug/inconsistency recently concerning the accuracy of the query when using log10: HiSPARC/sapphire#114

(Our issue on this problem HiSPARC/sapphire#116)

@153957
Copy link
Contributor Author

153957 commented Mar 29, 2016

It appears that Numexpr 2.4.6 is the culprit. We can downgrade to 2.4.4, unfortunately upgrading to 2.5.0 is not an option because that also breaks some parts of the code: HiSPARC/publicdb#131

@FrancescAlted
Copy link
Member

The problem seems to be in the conda package . Closing this.

@tomkooij
Copy link
Contributor

This needs reopening.

When using Intel MKL with numexpr (default in conda from numexpr=2.4.6) all table.read_where() queries with (VML) functions are broken.

Numexpr uses VML functions which need contiguous memory, but tables are numpy.rec.array()s which are non-contiguous.

This can be fixed by getting and passing the right arguments to NumExpr.run(). I have a fix, will make PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants