Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnboundLocalError: local variable 'intermediate' referenced before assignment #18

Closed
scls19fr opened this issue Aug 6, 2015 · 8 comments

Comments

@scls19fr
Copy link

scls19fr commented Aug 6, 2015

Hi Yuri,

I created a big CSV file to try what @firecat53 noticed about very long time to open file and scroll data (see TabViewer/tabview#127 )

import pandas as pd
import numpy as np
(rows, cols) = (4000000, 10)
a = np.random.random((rows, cols))
df = pd.DataFrame(a)
filename = "big_random.csv"
df.to_csv(filename, index=False)

It should create a 770 Mb file !

Then I tried to open this file using gtabview

$ gtabview file://big_random.csv
Traceback (most recent call last):
  File "//anaconda/bin/gtabview", line 4, in <module>
    __import__('pkg_resources').run_script('gtabview==0.3', 'gtabview')
  File "//anaconda/lib/python3.4/site-packages/setuptools-18.0.1-py3.4.egg/pkg_resources/__init__.py", line 735, in run_script
  File "//anaconda/lib/python3.4/site-packages/setuptools-18.0.1-py3.4.egg/pkg_resources/__init__.py", line 1652, in run_script
  File "/anaconda/lib/python3.4/site-packages/gtabview-0.3-py3.4.egg/EGG-INFO/scripts/gtabview", line 94, in <module>
    transpose=args.transpose, metavar=args.filename)
  File "//anaconda/lib/python3.4/site-packages/gtabview-0.3-py3.4.egg/gtabview/__init__.py", line 144, in view
    model = as_model(data, hdr_rows=hdr_rows, idx_cols=idx_cols, transpose=transpose)
  File "//anaconda/lib/python3.4/site-packages/gtabview-0.3-py3.4.egg/gtabview/models.py", line 237, in as_model
    model = ExtBlazeModel(data)
  File "//anaconda/lib/python3.4/site-packages/gtabview-0.3-py3.4.egg/gtabview/models.py", line 147, in __init__
    self._shape = (int(data.nrows), len(data.fields))
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/interactive.py", line 343, in <lambda>
    Expr.__int__ = lambda x: convert_base(int, x)
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/interactive.py", line 336, in convert_base
    x = compute(x)
  File "//anaconda/lib/python3.4/site-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/interactive.py", line 172, in compute
    return compute(expr, resources, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/compute/core.py", line 471, in compute
    result = top_then_bottom_then_top_again_etc(expr3, d4, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/compute/core.py", line 159, in top_then_bottom_then_top_again_etc
    return compute_down(expr, *leaf_data, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/multipledispatch/dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "//anaconda/lib/python3.4/site-packages/blaze-0.8.2+66.g1a8acda-py3.4.egg/blaze/compute/chunks.py", line 55, in compute_down
    return compute(agg_expr, {agg: intermediate})
UnboundLocalError: local variable 'intermediate' referenced before assignment

Not sure if that's a gtabview issue or a Blaze issue.

Pinging also @cpcloud and @llllllllll

Kind regards

@wavexx
Copy link
Member

wavexx commented Aug 6, 2015

Looks like an issue into blaze.

By the way @scls19fr, I've also run into this: blaze/blaze#1191 but didn't have time to debug it yet.

Do you have anything similar happening?

@wavexx
Copy link
Member

wavexx commented Aug 6, 2015

It generates a 573MB file for me, but I could open it using python 2.7 and blaze 0.8.2 (from pypy).
It initially opens quicker than gtabview big_random.csv, but then it's amazingly slow. Unusable actually.

@scls19fr
Copy link
Author

scls19fr commented Aug 6, 2015

That's odd that you don't get same file size

pc:~ scls$ cat big.py
import pandas as pd
import numpy as np
(rows, cols) = (4000000, 10)
a = np.random.random((rows, cols))
df = pd.DataFrame(a)
filename = "big_random.csv"
df.to_csv(filename, index=False)
pc:~ scls$ python big.py
pc:~ scls$ ls -lh big_random.csv
-rw-r--r--  1 scls  staff   735M  6 aoû 13:37 big_random.csv
pc:~ scls$ head big_random.csv
0,1,2,3,4,5,6,7,8,9
0.21274487359371408,0.9837956452361427,0.15916720143813157,0.2681755865886024,0.7204710887278248,0.3086394030805869,0.30062424542067534,0.6960646137570186,0.34166666946589364,0.27586304644665727
0.8777946375156523,0.6033697123338008,0.1327706266615769,0.19529643231130522,0.27477054259777434,0.4468524316143998,0.940254670593807,0.18968403819945623,0.2738538547944517,0.12564400838744338
0.2018934919749089,0.07524548034574063,0.6473819252708584,0.6071002176130551,0.40511265167956656,0.2791859033387186,0.7154128345443975,0.4866797736287697,0.4584847407677841,0.3798229634416679
0.9011497780314796,0.5777840362131448,0.3499451294403626,0.4070743759854154,0.7087747090990143,0.34894823904330574,0.33488167867742125,0.39637388267588536,0.40657046018943,0.1805436010295245
0.19026708133181092,0.5247328762094844,0.021502947916826387,0.7580506570759334,0.5779723788378057,0.6127493575936307,0.8011351193298644,0.6636015321535718,0.4607859110565661,0.08490276375289674
0.7143217456084715,0.011198040471145032,0.8892333967777504,0.6768191157336442,0.42295595169840083,0.8769479341732865,0.9891525199717826,0.9647959264864102,0.3240608535624976,0.210874737113377
0.21672596123550258,0.3482696140148287,0.7101869395685214,0.6474932686786607,0.16354057335375938,0.3052394529802829,0.7360537292259517,0.3575203114582275,0.9447179623804465,0.03532260562656109
0.6407757887342225,0.06897946464244908,0.4520628499915391,0.22465134543324095,0.7808744507260172,0.005931638090803992,0.8193511179065976,0.5469973751275239,0.4012570157732708,0.9510566112687189
0.43224384198381016,0.681428966272423,0.10416321326939937,0.2879100716695391,0.8998485262708976,0.4314634776128088,0.0885892489077732,0.11030100124975784,0.6841513022708292,0.6409559413160515

@wavexx
Copy link
Member

wavexx commented Aug 7, 2015

Looks like your decimals are 4 digits longer than mine ;)

@scls19fr
Copy link
Author

scls19fr commented Aug 9, 2015

I noticed a 6 digits difference between Python 3 and Python 2. pandas-dev/pandas#10777

To be honest, I can live with this issue (and with the issue of long time to open big CSV file) for now.

I'd prefer to have a tabview version which could also handle Blaze (and share most of gtabview code) before mid-September.

@llllllllll
Copy link

can you get a mwe for the unbound locals error and open an issue on blaze? I will try to look into it.

@scls19fr
Copy link
Author

scls19fr commented Aug 9, 2015

I thought that

$ gtabview file://big_random.csv

was an enough minimal (not) working example ;-)

I did

import blaze
dat = blaze.Data("big_random.csv")
chunk_size = 16384
cols = dat.columns
list(dat[cols][0:chunk_size])

but I wasn't able to reproduce it.

@wavexx
Copy link
Member

wavexx commented Dec 22, 2020

Closing this, since there's not much we can do about this here.

@wavexx wavexx closed this as completed Dec 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants