Blaze support #127

scls19fr · 2015-07-31T15:01:49Z

Hello,

@wavexx did an excellent work to provide a Blaze support to gtabview
see TabViewer/gtabview#10

it's now possible to connect to any database supported by SQLAlchemy and display a table (even a very long table) using a table URI http://blaze.pydata.org/en/latest/uri.html

Some other improvements have been exposed #116
(especially about Pandas DataFrame with multi index)

It will be nice if we could have such a Blaze support on tabview side because it will become possible to display content of very long tables from SSH connection (for example)

Kind regards

firecat53 · 2015-08-05T19:45:30Z

I'll see what I can do. It should be combined with working on splitting out the common code from tabview and gtabview.

I was a little disappointed in the file:// behavior for large files. One very large (~800M with a couple million rows, I believe) file didn't ever open at all in gtabview after working on it for about 5 minutes. A smaller file (380M) opened, but there was an almost 10 second lag each time it loaded a new section of the file. It also didn't work at all for a Latin-1 encoded file. I tried it with mysql tables and of course most of my tables have a DECIMAL(10,2) data-type...which isn't yet supported by odo (blaze/odo#206). Just a little frustrating that it wasn't handling very well the data I was throwing at it!

Scott

scls19fr · 2015-08-06T06:35:44Z

About DECIMAL issue, it's more a datashape issue than odoblaze/datashape#118

I just add here some code to create big random CSV file

import pandas as pd
import numpy as np
(rows, cols) = (4000000, 10)
a = np.random.random((rows, cols))
df = pd.DataFrame(a)
filename = "big_random.csv"
df.to_csv(filename, index=False)

I tried both

gtabview file://big_random.csv
gtabview big_random.csv

and you are right that's not usable with big file size !

wavexx · 2015-08-06T11:01:26Z

AFAIK blaze is just reading the file in chunks somehow. It initially opens quicker, but then it's just as slow as requiring each new chunk.

And if I'm not mistaken, pandas read_csv is just csv in disguise, without the little tweaks we added in tabview.

We could do much better than that assuming files and sequential reads: ie: read only 'n' lines (exactly one chunk) when the file size is beyond a certain threshold. For files (not streams) we could do that in both forward and reverse to avoid allocating memory at the expense of extra I/O.

firecat53 mentioned this issue Aug 5, 2015

Passing encoding to Blaze URI's TabViewer/gtabview#17

Closed

scls19fr mentioned this issue Aug 6, 2015

UnboundLocalError: local variable 'intermediate' referenced before assignment TabViewer/gtabview#18

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blaze support #127

Blaze support #127

scls19fr commented Jul 31, 2015

firecat53 commented Aug 5, 2015

scls19fr commented Aug 6, 2015

wavexx commented Aug 6, 2015

Blaze support #127

Blaze support #127

Comments

scls19fr commented Jul 31, 2015

firecat53 commented Aug 5, 2015

scls19fr commented Aug 6, 2015

wavexx commented Aug 6, 2015