random error with postgresql data source #1191

wavexx · 2015-07-31T14:08:05Z

I'm new to blaze, so pardon my ignorance here. I have no idea if I have to report this to odo/datashape or something else.

I'm using blaze.Data on a postgresql table ("postgresql://"). When I try to get some data off the table with list(head(10)); in 50% of the cases (without any change on the db), I get this error:

  File "/usr/local/lib/python2.7/dist-packages/odo/into.py", line 122, in curried_into
    return into(o, other, **merge(kwargs2, kwargs1))
  File "/usr/local/lib/python2.7/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/blaze/interactive.py", line 309, in into
    return into(a, result, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/odo/into.py", line 25, in into_type
    return convert(a, b, dshape=dshape, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/odo/core.py", line 30, in __call__
    return _transform(self.graph, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/odo/core.py", line 46, in _transform
    x = f(x, excluded_edges=excluded_edges, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/odo/convert.py", line 21, in dataframe_to_numpy
    dtype = dshape_to_numpy(dshape or discover(df))
  File "/usr/local/lib/python2.7/dist-packages/odo/numpy_dtype.py", line 55, in dshape_to_numpy
    for name, typ in zip(ds.names, ds.types)])
  File "/usr/local/lib/python2.7/dist-packages/odo/numpy_dtype.py", line 26, in unit_to_dtype
    return unit_to_dtype(str(ds).replace('int', 'float').replace('?', ''))
  File "/usr/local/lib/python2.7/dist-packages/odo/numpy_dtype.py", line 22, in unit_to_dtype
    ds = dshape(ds)
  File "/usr/local/lib/python2.7/dist-packages/datashape/util.py", line 49, in dshape
    ds = parser.parse(o, type_symbol_table.sym)
  File "/usr/local/lib/python2.7/dist-packages/datashape/parser.py", line 575, in parse
    dsp.raise_error('Invalid datashape')
  File "/usr/local/lib/python2.7/dist-packages/datashape/parser.py", line 57, in raise_error
    self.ds_str, errmsg)
datashape.error.DataShapeSyntaxError: 

  File <nofile>, line 1
    float16
    ^

DataShapeSyntaxError: Invalid datashape

I actually wonder why this error is not reproducible. Looks like odo is randomly choosing a different conversion/coercion route? In fact, it's so random I cannot even determine whether there's a specific column type that could cause the issue.

The text was updated successfully, but these errors were encountered:

cpcloud · 2015-07-31T14:19:48Z

Looks like it hasn't been added to the parser.

In [5]: import datashape as ds

In [6]: ds.__version__
Out[6]: '0.4.6+67.ge3431f9'

In [7]: ds.float16
Out[7]: ctype("float16")

In [8]: ds.dshape('float16')
  File "<nofile>", line 1
    float16
DataShapeSyntaxError: Invalid datashape

I'm on it!

wavexx · 2015-07-31T14:21:26Z

On 31/07/15 16:19, Phillip Cloud wrote:

Looks like it hasn't be added to the parser.

|In [5]: import datashape as ds In [6]: ds.version Out[6]:
'0.4.6+67.ge3431f9' In [7]: ds.float16 Out[7]: ctype("float16") In [8]:
ds.dshape('float16') File "", line 1 float16
DataShapeSyntaxError: Invalid datashape |

I'm on it!

Why does it happen randomly though?

cpcloud · 2015-07-31T14:22:45Z

@wavexx Can you show

Data('postgresql://your uri here').dshape

wavexx · 2015-07-31T14:29:53Z

On 31/07/15 16:22, Phillip Cloud wrote:

@wavexx https://github.com/wavexx Can you show

Data('postgresql://your uri here').dshape

One example:

shape("""var * {
  aid: ?string[13],
  sid: ?int32,
  date: ?date,
  bdate: ?date,
  gender: ?int16,
  place: ?string[100],
  bplace: ?string[50],
  appointment_type_id: ?int32
  }""")

cpcloud · 2015-07-31T14:33:12Z

what is the exact expression you are trying to run? i don't need to see the database connection string, just the actual line of code that randomly giving the error

cpcloud · 2015-07-31T14:33:47Z

@wavexx when it doesn't fail, does it show you the correct data?

wavexx · 2015-07-31T14:34:18Z

On 31/07/15 16:33, Phillip Cloud wrote:

@wavexx https://github.com/wavexx when it /doesn't/ fail, does it show
you the correct data?

It seems to, yes.

wavexx · 2015-07-31T14:35:36Z

On 31/07/15 16:33, Phillip Cloud wrote:

what is the exact expression you are trying to run? i don't need to see
the database connection string, just the actual line of code that
randomly giving the error

It's a slice on both columns and rows:

list(data[data.fields[a:b]][c:d])

cpcloud · 2015-07-31T14:39:38Z

are a, b, c and d always the same (not equal to each other, but across runs of whatever function these are in)? ie do you get random errors when repeatedly running the exact same expression?

wavexx · 2015-07-31T14:46:24Z

On 31/07/15 16:39, Phillip Cloud wrote:

are |a|, |b|, |c| and |d| always the same (not equal to each other, but
across runs of whatever function these are in)? ie do you get random
errors when repeatedly running the exact same expression?

You can try the full code yourself if you want:

https://github.com/wavexx/gtabview

PYTHONPATH=$PWD ./bin/gtabview postgresql://something/db::table

The slicing occurs in gtabview/models.py:157

For the first query though, it often means always:

list(data[data.fields[0:len(data.fields)][0:min(16384,int(data.nrows)])

So I assume it's constant if int(data.nrows) and len(data.fields) return
the same value. [never checked, but the db is read-only]

Despite being constant, it still fails randomly.

wavexx · 2015-07-31T14:58:54Z

On 31/07/15 16:46, Yuri D'Elia wrote:

On 31/07/15 16:39, Phillip Cloud wrote:

are |a|, |b|, |c| and |d| always the same (not equal to each other, but
across runs of whatever function these are in)? ie do you get random
errors when repeatedly running the exact same expression?

However, do I infer that an empty slice wouldn't be valid?

I do expect data[...][0:0] to return an empty list.

cpcloud · 2015-07-31T15:01:54Z

However, do I infer that an empty slice wouldn't be valid?

Yes this is valid.

@wavexx Can you show:

import blaze, odo, datashape

blaze.__version__
odo.__version__
datashape.__version__

cpcloud · 2015-07-31T15:05:52Z

FWIW, the only "random" element here is that the order of your result set is undefined without an ORDER BY clause. So, an operation like slicing e.g., table[5:15], is implemented with select * from table limit 10 offset 5 and could potentially have a different set of rows. This doesn't strike me as the issue, since the types of the columns will be the same no matter what result set you get back.

cpcloud · 2015-07-31T15:16:54Z

@wavexx What do you get when you run the following code? I'm using IPython, but you can use vanilla Python as well

In [30]: from blaze import Data

In [31]: d = Data('postgresql://localhost::table')

In [32]: d.head()

wavexx · 2015-07-31T15:24:46Z

On 31/07/15 17:17, Phillip Cloud wrote:

@wavexx https://github.com/wavexx What do you get when you run the
following code? I'm using IPython, but you can use vanilla Python as well

In [30]: from blaze import Data, odo, compute
In [31]: d = Data('postgresql://localhost::table')
In [32]: d.head()

If I run it on ipython, right now it works all the time.

If I stick it in a file:

from blaze import Data, odo, compute
d = Data('postgresql://host/db::table')
print(list(d.head()))

and run it with:

python test.py

it fails with float16 not being recognized (same error as reported before).

Now it's interesting, if I import os, sys at the beginning, it works:

import os, sys
from blaze import Data, odo, compute
d = Data('postgresql://host/db::table')
print(list(d.head()))

I did that only because that's what I do in the ipython startup.

cpcloud · 2015-07-31T15:26:36Z

in both cases can you show the versions of blaze, odo and datashape?

cpcloud · 2015-07-31T15:47:18Z

@wavexx also, d.head() and list(d.head()) are doing very different things. It's unrelated to whether it's in a file or not.

In the first case you're seeing something like this

print(repr(odo(compute(d.head()).execute().fetchall(), pd.DataFrame)))

Which works, because sqlalchemy works.

In the second case something like [x for x in odo(d.head(), Iterator)] is called, and the conversion path from sqlalchemy.Selectables to Iterators involves a conversion to a numpy array. This should go directly to an iterator after calling selectable.execute(), so that looks like a bug.

I have no idea why importing os and sys would make this work.

cpcloud · 2015-07-31T15:50:21Z

I have no explanation for the randomness. I'd need you to put a halt into a debugger right before the expression is converted to see what the issue is.

cpcloud · 2015-07-31T15:51:00Z

I'm pretty sure that blaze/datashape#163 will fix all of these errors.

wavexx · 2015-07-31T16:13:45Z

On 31/07/15 17:47, Phillip Cloud wrote:

@wavexx https://github.com/wavexx also, |d.head()| and
|list(d.head())| are doing very different things. It's unrelated to
whether it's in a file or not.

Sure, I just noticed now because of the ipython startup I had.

In the first case you're seeing something like this

print(repr(odo(compute(d.head()).execute().fetchall(), pd.DataFrame)))

Which works, because |sqlalchemy| works.

Which is why I explicitly do list().

In the second case something like |[x for x in odo(d.head(), Iterator)]|
is called, and the conversion path from |sqlalchemy.Selectable|s to
|Iterator|s involves a conversion to a numpy array. This /should/ go
directly to an iterator after calling |selectable.execute()|, so that
looks like a bug.

I have no idea why importing |os| and |sys| would make this work.

The source I'm using is:

import blaze, odo, datashape
from blaze import Data
print(blaze.version)
print(odo.version)
print(datashape.version)
d = Data('postgresql://host/db::table')
print(list(d.head()))

When run with python:

$ python test.py
0.8.2
0.3.3
0.4.6

File , line 1
float16
^
[and the rest of the traceback]

You mentioning iterators made me think, it seems that adding

from future import generators

is sufficient:

$ python test.py
0.8.2
0.3.3
0.4.6
[(......

could that be that you expect some builtins to emit generators, somewhere?

wavexx · 2015-07-31T16:30:41Z

On 31/07/15 17:50, Phillip Cloud wrote:

I have no explanation for the randomness. I'd need you to put a halt
into a debugger right before the expression is converted to see what the
issue is.

I need some guidance.
Where exactly would you inspect for differences?

I need at least some functions names and/or things to look for.

cpcloud · 2015-07-31T17:08:42Z

what are o and other in this line?

  File "/usr/local/lib/python2.7/dist-packages/odo/into.py", line 122, in curried_into
    return into(o, other, **merge(kwargs2, kwargs1))

cpcloud · 2015-12-04T20:00:23Z

@wavexx closing. pls reopen if this is still an issue

cpcloud self-assigned this Jul 31, 2015

cpcloud added this to the 0.8.3 milestone Jul 31, 2015

cpcloud added the bug label Jul 31, 2015

cpcloud mentioned this issue Jul 31, 2015

Allow float16 in the parser blaze/datashape#163

Merged

wavexx mentioned this issue Aug 6, 2015

UnboundLocalError: local variable 'intermediate' referenced before assignment TabViewer/gtabview#18

Closed

cpcloud modified the milestones: 0.8.3, 0.9.0 Sep 15, 2015

cpcloud removed this from the 0.8.3 milestone Sep 15, 2015

cpcloud closed this as completed Dec 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

random error with postgresql data source #1191

random error with postgresql data source #1191

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Dec 4, 2015

random error with postgresql data source #1191

random error with postgresql data source #1191

Comments

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Jul 31, 2015

wavexx commented Jul 31, 2015

wavexx commented Jul 31, 2015

cpcloud commented Jul 31, 2015

cpcloud commented Dec 4, 2015