Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wierd error with a comparison #278

Closed
doris75 opened this issue Aug 21, 2013 · 4 comments
Closed

wierd error with a comparison #278

doris75 opened this issue Aug 21, 2013 · 4 comments
Milestone

Comments

@doris75
Copy link

doris75 commented Aug 21, 2013

I'm new to Pytables and wanted to perform some tests on reading big tables.
So what i did is I created a Pytable with 64000000 entries. As the table is big, i set the expectedrows. if i set this parameter to a higher value than actual size (like ten times) , then i'm getting the following error when i'm performing some queries but i really don't understand why. But when I set the expectedrows less than the actual size, then all queries are Ok.

print "Selecting values for channel :TEST-COMM-FC:VAR1-CH1",
nrows = 0; t1 = time()
for row in table.where('physicsname == "TEST-COMM-FC:VAR1-CH1"'):
nrows += 1
print nrows
print "Done --- Time:", round((time()-t1), 3), "sec"

print "Variable TEST-COMM-FC:VAR1-CH1 value between start time and start time+ 10sec:",
nrows = 0; t1 = time()
for row in table.where('(physicsname == "TEST-COMM-FC:VAR1-CH1") & (smpl_nb <1000)'):
nrows += 1
print nrows
print "Done --- Time:", round((time()-t1), 3), "sec"

Selecting values for channel :TEST-COMM-FC:VAR1-CH1 2000000
Done --- Time: 38.338 sec
Variable TEST-COMM-FC:VAR1-CH1 value between start time and start time+ 10sec:
Traceback (most recent call last):
File "testboardwithcompression.py", line 80, in
for row in table.where('(physicsname == "TEST-COMM-FC:VAR1-CH1") & (smpl_nb <1000)'):
File "tableextension.pyx", line 891, in tables.tableextension.Row.next (tables/tableextension.c:9448)
File "tableextension.pyx", line 951, in tables.tableextension.Row.__next__indexed (tables/tableextension.c:9930)
File "tableextension.pyx", line 437, in tables.tableextension.Table._convert_types (tables/tableextension.c:5469)
File "tableextension.pyx", line 461, in tables.tableextension.Table._convert_types (tables/tableextension.c:5355)
File "tableextension.pyx", line 431, in tables.tableextension.Table.convert_time64 (tables/tableextension.c:4975)
ZeroDivisionError: integer division or modulo by zero
Closing remaining open files: /tmp/measurement-2.h5... done

table declaration is
class Measurement(IsDescription):
#event_id = Int32Col(pos=1, indexed=True) # event id (indexed)
start_time = Time64Col(pos=1)
physicsname = StringCol(41, pos=2) # CBS1-CBS2-CBS3:VARNAME
smpl_nb = Int32Col(pos=3) # event id (not indexed)
rawval = Float64Col(pos=4) # particle id in the event

@avalentino
Copy link
Member

Hi @doris75,
sorry I can't reproduce this issue with PyTables 3.0 with Python 2.
I used the following test program:

from time import time
import tables as tb
from tables import IsDescription, Time64Col, StringCol, Int32Col, Float64Col

FILENAME = 'measurement-2.h5'

class Measurement(IsDescription):
    #event_id = Int32Col(pos=1, indexed=True) # event id (indexed)
    start_time = Time64Col(pos=1)
    physicsname = StringCol(41, pos=2) # CBS1-CBS2-CBS3:VARNAME
    smpl_nb = Int32Col(pos=3) # event id (not indexed)
    rawval = Float64Col(pos=4) # particle id in the event

def test1(table):
    print "Selecting values for channel :TEST-COMM-FC:VAR1-CH1"
    nrows = 0; t1 = time()
    for row in table.where('physicsname == "TEST-COMM-FC:VAR1-CH1"'):
        nrows += 1
        print nrows
    print "Done --- Time:", round((time()-t1), 3), "sec"

def test2(table):
    print "Variable TEST-COMM-FC:VAR1-CH1 value between start time and start time+ 10sec:"
    nrows = 0; t1 = time()
    for row in table.where('(physicsname == "TEST-COMM-FC:VAR1-CH1") & (smpl_nb <1000)'):
        nrows += 1
        print nrows
    print "Done --- Time:", round((time()-t1), 3), "sec"

def write():
    N = 64000000
    #N = 10
    with tb.open_file(FILENAME, 'w') as fd:
        t = fd.create_table('/', 'table', description=Measurement)
        row = t.row
        for i in range(N):
            row['start_time'] = i
            row['physicsname'] = b'TEST-COMM-FC:VAR1-CH1'
            row['smpl_nb'] = i + 995
            row['rawval'] = i / 10.
            row.append()
        t.flush()

def main():
    write()

    with tb.open_file(FILENAME) as fd:
        test1(fd.root.table)

    with tb.open_file(FILENAME) as fd:
        test2(fd.root.table)

if __name__ == '__main__':
    main()

Can toy please provide more info about your platform ( e.g. using tables.print_versions())?
Also can you please confirm that the above test program triggers the issue for you?

@doris75
Copy link
Author

doris75 commented Aug 22, 2013

Hi
so your program works fine.
By my program still fails with the following error.
Machine : virtual machine,
@sddtest examples]$ uname -a
Linux sddtest 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
@sddtest examples]$

@sddtest examples]$ python
Python 2.6.6 (r266:84292, May 1 2012, 13:52:17)
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import tables

tables.print_versions()
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version: 3.0.0
HDF5 version: 1.8.11
NumPy version: 1.7.0
Numexpr version: 2.1 (not using Intel's VML/MKL)
Zlib version: 1.2.3 (in Python interpreter)
BZIP2 version: 1.0.5 (10-Dec-2007)
Blosc version: 1.2.3 (2013-05-17)
Cython version: 0.19.1
Python version: 2.6.6 (r266:84292, May 1 2012, 13:52:17)
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)]
Platform: linux2-x86_64
Byte-ordering: little
Detected cores: 4
Default encoding: ascii
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Error I get
Variable TEST-COMM-FC:VAR1-CH1 value between start time and start time+ 10sec:
Traceback (most recent call last):
File "testboardwithcompression.py", line 80, in
for row in table.where('(physicsname == "TEST-COMM-FC:VAR1-CH1") & (smpl_nb <1000)'):
File "tableextension.pyx", line 891, in tables.tableextension.Row.next (tables/tableextension.c:9448)
File "tableextension.pyx", line 951, in tables.tableextension.Row.__next__indexed (tables/tableextension.c:9930)
File "tableextension.pyx", line 437, in tables.tableextension.Table._convert_types (tables/tableextension.c:5469)
File "tableextension.pyx", line 461, in tables.tableextension.Table._convert_types (tables/tableextension.c:5355)
File "tableextension.pyx", line 431, in tables.tableextension.Table.convert_time64 (tables/tableextension.c:4975)
ZeroDivisionError: integer division or modulo by zero
Closing remaining open files: /tmp/measurement-comp.h5... done
[abadiel@sddtest examples]$

So below my program
from future import division
from time import time # use clock for Win
import numpy
from tables import *

#NEVENTS = 10000
NEVENTS = 2000000
MAX_PARTICLES_PER_EVENT = 32
SHOTNUMBER=14555
SAMPLINGSTEP=0.01
EXPECTEDROWS=NEVENTS*MAX_PARTICLES_PER_EVENT

Particle description

class Measurement(IsDescription):
#event_id = Int32Col(pos=1, indexed=True) # event id (indexed)
start_time = Time64Col(pos=1)
physicsname = StringCol(41, pos=2) # CBS1-CBS2-CBS3:VARNAME
smpl_nb = Int32Col(pos=3) # event id (not indexed)
rawval = Float64Col(pos=4) # particle id in the event

# Create a new table for events

t1 = time()
print "Creating a table with %s entries aprox.. Wait please..." %
(int(NEVENTS*(MAX_PARTICLES_PER_EVENT/2.)))
fileh = open_file("/tmp/measurement-comp.h5", mode = "w")
group = fileh.create_group(fileh.root, "measurements")
#table = fileh.create_table(group, 'test', Measurement, "A table", Filters(0))

Choose this line if you want data compression

table = fileh.create_table(group, 'test', Measurement, "A table", Filters(9),EXPECTEDROWS)

Fill the table with events

numpy.random.seed(1) # In order to have reproducible results
particle = table.row
for i in xrange(NEVENTS):
for j in xrange(MAX_PARTICLES_PER_EVENT):
particle['start_time'] = time()
seq = ("TEST-COMM-FC:VAR1-CH",str(j))
particle['physicsname'] = ''.join(seq)
particle['smpl_nb'] = i
particle['rawval'] = numpy.random.normal(5.0, 2.0)
# This injects the row values.
particle.append()
table.flush()
print "Added %s entries --- Time: %s sec" % (table.nrows, round((time()-t1), 3))

t1 = time()
print "Creating index..."
table.cols.physicsname.create_index()
print "Index created --- Time: %s sec" % (round((time()-t1), 3))
print "Creating index..."
table.cols.smpl_nb.create_index()
print "Index created --- Time: %s sec" % (round((time()-t1), 3))

Add the number of events as an attribute

table.attrs.stoptime = time()
table.attrs.shotnumber= SHOTNUMBER
table.attrs.samplingstep = SAMPLINGSTEP
fileh.close()

Open the file en read only mode and start selections

print "Selecting events..."
fileh = open_file("/tmp/measurement-comp.h5", mode = "r")
table = fileh.root.measurements.test

print "Selecting values for channel :TEST-COMM-FC:VAR1-CH1",
nrows = 0; t1 = time()
for row in table.where('physicsname == "TEST-COMM-FC:VAR1-CH1"'):
nrows += 1
print nrows
print "Done --- Time:", round((time()-t1), 3), "sec"

print "Variable TEST-COMM-FC:VAR1-CH1 value between start time and start time+ 10sec:",
nrows = 0; t1 = time()
for row in table.where('(physicsname == "TEST-COMM-FC:VAR1-CH1") & (smpl_nb <1000)'):
nrows += 1
print nrows
print "Done --- Time:", round((time()-t1), 3), "sec"

fileh.close()

Thanks and let me know if you need more information

@doris75
Copy link
Author

doris75 commented Aug 22, 2013

sounds that the program didn't appear well

2013/8/21 Antonio Valentino notifications@github.com

Hi @doris75 https://github.com/doris75,
sorry I can't reproduce this issue with PyTables 3.0 with Python 2.
I used the following test program:

from time import time
import tables as tb
from tables import IsDescription, Time64Col, StringCol, Int32Col, Float64Col

FILENAME = 'measurement-2.h5'

class Measurement(IsDescription):
#event_id = Int32Col(pos=1, indexed=True) # event id (indexed)
start_time = Time64Col(pos=1)
physicsname = StringCol(41, pos=2) # CBS1-CBS2-CBS3:VARNAME
smpl_nb = Int32Col(pos=3) # event id (not indexed)
rawval = Float64Col(pos=4) # particle id in the event

def test1(table):
print "Selecting values for channel :TEST-COMM-FC:VAR1-CH1"
nrows = 0; t1 = time()
for row in table.where('physicsname == "TEST-COMM-FC:VAR1-CH1"'):
nrows += 1
print nrows
print "Done --- Time:", round((time()-t1), 3), "sec"

def test2(table):
print "Variable TEST-COMM-FC:VAR1-CH1 value between start time and start time+ 10sec:"
nrows = 0; t1 = time()
for row in table.where('(physicsname == "TEST-COMM-FC:VAR1-CH1") & (smpl_nb <1000)'):
nrows += 1
print nrows
print "Done --- Time:", round((time()-t1), 3), "sec"

def write():
N = 64000000
#N = 10
with tb.open_file(FILENAME, 'w') as fd:
t = fd.create_table('/', 'table', description=Measurement)
row = t.row
for i in range(N):
row['start_time'] = i
row['physicsname'] = b'TEST-COMM-FC:VAR1-CH1'
row['smpl_nb'] = i + 995
row['rawval'] = i / 10.
row.append()
t.flush()

def main():
write()

with tb.open_file(FILENAME) as fd:
    test1(fd.root.table)

with tb.open_file(FILENAME) as fd:
    test2(fd.root.table)

if name == 'main':
main()

Can toy please provide more info about your platform ( e.g. using
tables.print_versions())?
Also can you please confirm that the above test program triggers the issue
for you?


Reply to this email directly or view it on GitHubhttps://github.com//issues/278#issuecomment-23032912
.

@FrancescAlted
Copy link
Member

This is a bit old so will close. Feel free to reopen it if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants