Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for dict, ndarray and pandas objects #116

Closed
wants to merge 20 commits into from

Conversation

mdbartos
Copy link

Overview

The additions shown in the following pull request implement support for dicts, pandas objects and numpy ndarrays.

In this implementation, the process_data function handles the majority of the data formatting. Instead of returning the data and headers together as a single object, the process_data function now returns a dict with two entries:

  • 'data': which contains the actual data
  • 'header': which contains the header for the data*

This allows the Viewer class to make fewer assumptions about the data that it is receiving. It also allows for faster (vectorized) formatting of data for pandas and numpy objects by avoiding computationally expensive list comprehensions.

In a previous pull request, some contributors raised concerns about whether support for pandas objects should be part of the pandas library, rather than part of the tabview library. Personally, I would prefer a standalone curses library with some support for pandas objects. I regularly work with pandas, numpy, dict and csv objects, and it would be nice to have a curses viewer that can deal with all of these cases in an interactive setting. Moreover, because numpy and pandas have become the de facto standard libraries for working with array-like data in python, I think it makes sense to add support for at least these libraries.

Below, I have tested the new implementation with lists of lists, dicts, pandas objects (Series, DataFrame and Panel) and numpy ndarray. You can see the output by running my fork of tabview (branch feat):

import tabview
import pandas as pd
import numpy as np
import pandas.io.data as web
import datetime

#### LISTS

#3-row list; header defaults to the first row.

sample_list = [[5,4,3,2,1], [1,2,3,4,5], ['a', 'b', 'c', 'd', 'e']]
tabview.view(sample_list)

#1-row list; an integer header is appended automatically.

sample_list_2 = [[1,2,3,4,5]]
tabview.view(sample_list_2)

#### DICTS

sample_dict = {1:[0,1,2,3,4,5], 2:[5,4,3,2,1,0]}

# column view (default); dictionary keys become the header.
tabview.view(sample_dict)

# index view; dictionary keys become the first column (index).
tabview.view(sample_dict, orient='index')

#### PANDAS OBJECTS

# DataFrame
df = pd.read_csv('../sample/data_ohlcv.csv')
tabview.view(df)

# Series
tabview.view(df['Date'])

# Panel (from previous pull request)
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2013, 1, 27)
panel = web.DataReader(["F", "YHOO"], 'yahoo', start, end)
tabview.view(panel)

#### NUMPY ARRAYS

#1-dimensional array
test_array_1 = np.array([1,3,5,3,5,np.nan])
tabview.view(test_array_1)

#2-dimensional array
tabview.view(df.values)

#### CSV (STILL WORKS)

tabview.view('../sample/data_ohlcv.csv')
tabview.view('../sample/unicode-example-utf8.txt')

Summary of changes:

  • Viewer now accepts dicts containing (1) data and (2) header information.
  • data_list_or_file changed to more generalized input_type
  • process_data now formats dicts, pandas objects, and ndarrays and returns a dict of data/header info.
  • view: exception handling for locale.getlocate(locale.LC_ALL)

*Note on headers: There needs to be some sort of parameter for header preferences in tabview.view. In this implementation, I just guessed as to what users would most likely want.

@firecat53
Copy link
Collaborator

Wow, this is a lot to digest for me, especially given I haven't worked with numpy or pandas. It's going to take some time for me to work through. @wavexx and @scls19fr, if you guys have a chance can you also take a look at this PR?

Would you be able to add your test cases to the unit tests as well?

Thanks very much for the interest and work on this!!

@scls19fr
Copy link
Contributor

I wasn't against the idea of having Pandas support inside tabview (and not on Pandas side). That's a very interesting PR.

@scls19fr
Copy link
Contributor

I don't understand why using (for example)

if data.__class__.__name__ == 'Series'

and not

if isinstance(data, pd.Series)

but that's probably to avoid to import Pandas... and that's a good idea

In process_data method I will store input_type(data) in a variable (data_type for example) and use (if elif else)

if data_type == 'dict':
    ...
elif data_type == 'file':
    ...
elif data_type == 'pandas':
    ...
elif data_type == 'numpy':
    ...
else: # default data_type
    ...

Inside else we should have default behavior (including types not supported for now)... so we should consider "default" data_type is a list (a list of list in fact) and if a not know data type is given, it will be considered as a list of list.

@wavexx
Copy link
Member

wavexx commented Mar 28, 2015

On 03/28/2015 02:15 AM, Matthew Bartos wrote:

Overview The additions shown in the following pull request

implement support for dicts, pandas objects and numpy ndarrays.

In this implementation, the process_data function handles the
majority of the data formatting. Instead of returning the data and
headers together as a single object, the process_data function now
returns a dict with two entries:

  • 'data': which contains the actual data
  • 'header': which contains the header for the data*

This allows the Viewer class to make fewer assumptions about the data
that it is receiving. It also allows for faster (vectorized)
formatting of data for pandas and numpy objects by avoiding
computationally expensive list comprehensions.

Somehow I was hoping for the resulting 'data' to be just the contents of
'pd.DataFrame.values', or at least DF.iloc, but many
copies/manipulations are still performed (just a remark, not a critic).

This makes me think that we should really consider avoiding that initial
conversion to string, and perform it later in the draw function. This
would avoid a flat copy in some cases, which is currently necessary.

Now, some suggestions.

For the 'pandas' type, in process_data(), there's a weird apply().
I think you can easily replace that line with just
data.reset_index().astype(str).fillna('')

About the 'orient' keyword argument. I see why it's there, but I'd
rather make the code simpler and have a key for transposition later,
which is actually easier to implement. Especially considering I hope to
implement fixed columns for the index itself, orient='' would then be
just a regular transposition and is redundant.

There is some whitespace/indentation to cleanup (some spurious empty
lines, some changes of indentation). Also, commented dead code should be
removed.

Your concern about the header is valid. We need such an argument for the
cli tool and the view() function. We cannot determine reliably if the
input has an header or not for a simple list. For numpy/pandas it should
probably be ignored. For tabview I would just add a flag to disable the
header (something like -H). This can be done in a second PR.

The code does not depend of pandas/numpy, which is great for having
tabview as an independent utility. I totally agree with numpy/pandas
basically being a standard for python.

As commented by @scls19fr, in process_data an if/elif chain would be
better, and a default to 'list' is important.

If you can fix the above, I would pull this in for further improvement.

@mdbartos
Copy link
Author

Thanks for the positive feedback!

There's a lot to cover here, so I'll respond to @firecat53 first. I've added tests to test_tabview.py that validate tabview integration for dict, 1D ndarray, 2D ndarray, pandas Series and pandas DataFrames.

The tests work on my machine, but currently fail during the Travis build because they require numpy and pandas to be installed (Note: these modules are required only for test_tabview.py, not for the tabview.py module itself). numpy and pandas can be a pain to install on some systems, so I don't want to include them as required dependencies. I will work on getting the build to complete succesfully; however, I'm new to Travis so it might take a couple commits before I figure this issue out.

@scls19fr : You're right; the reason I chose to use if data.__class__.__name__ == 'Series' is because I want to avoid importing pandas. The __name__ attribute should be pretty stable, and I can't really think of a better way of checking the input type. I definitely agree that input_type should be stored as a variable.

@wavexx : Thanks for your suggestions. The reason I chose to convert to string in process_data is because that way I can take advantage of vectorized transformations (astype(), vectorize(), apply(), etc...) afforded by the numpy and pandas libraries. Vectorized operations are supposed to be faster than for loops, especially for large datasets (although with the sample data, it makes little difference).

Huh, I didn't know that there was a DataFrame.astype() method. I was under the impression that it was only a Series method. That's good to know. The reason I use astype(object) first is because astype(str) raises an exception when applied to datetime objects directly. I believe that fillna('') has to be called before the datatype is converted to string.

The orient keyword can be removed if you want to implement a transposition key. However, this would probably require an 'index' argument into Viewer, which isn't implemented yet.

I should be able to address these issues shortly. I'm not sure I want to take on the 'header' issue in this pull request. It's a pretty significant structural issue and I think it's worthy of its own pull request.

@scls19fr
Copy link
Contributor

Hello,

setup.py can have a key named extras_require

extras_require = {
    'dev': ['check-manifest', 'nose'],
    'test': ['coverage', 'nose'],
},

maybe it could be a way to explore to add Pandas and Numpy dependencies (only for test and/or dev)

To avoid a very long continuous integration process because of Pandas install we might probably use Miniconda as Python distribution and conda as packages installer.
It will avoid Pandas compilation on Travis side (which is a quite process).

See for example .travis.yml file of pandas-datareader https://github.com/pydata/pandas-datareader/

@wavexx
Copy link
Member

wavexx commented Mar 29, 2015

On 03/29/2015 01:25 AM, Matthew Bartos wrote:

@scls19fr : You're right; the reason I chose to use if data.__class__.__name__ == 'Series' is because I want to avoid
importing pandas. The __name__ attribute should be pretty stable,
and I can't really think of a better way of checking the input type.
I definitely agree that input_type should be stored as a variable.

I think that's the right approach without adding a dependency.

Huh, I didn't know that there was a DataFrame.astype() method. I
was under the impression that it was only a Series method. That's
good to know. The reason I use astype(object) first is because
astype(str) raises an exception when applied to datetime objects
directly. I believe that fillna('') has to be called before the
datatype is converted to string.

I understand the logic.

If there's a type conversion failure, I'd like to have a test for it.
Can you replicate it easily?

I didn't come with any issues with type conversions to str so far, but
if there is one, it might be from py_datetime or datetimeindex (panda's
internal date type). In that case, I want to actually report the bug to
pandas if I can replicate it.

As for fillna, doing it later should be fine as well. The trick is that
some types (such as int) don't have nullabe types, but str is
intrinsically stored with a Series of 'object' type. So doing it before
might not be possible, but with str it certainly is.

The orient keyword can be removed if you want to implement a
transposition key. However, this would probably require an 'index'
argument into Viewer, which isn't implemented yet.

I would, yes. If you're using ndarrays or dataframes, a transposition is
just trivial to perform and is more explicit with what do you want to do
with the index/labels.

And yes, index will be added later as soon as we have fixed columns.

I should be able to address these issues shortly. I'm not sure I want
to take on the 'header' issue in this pull request. It's a pretty
significant structural issue and I think it's worthy of its own pull
request.

Fine for me.

@mdbartos
Copy link
Author

Summary of revisions

  • The process_data function has been reorganized into an if/elif chain. The 'list' case is now the default.
  • All additions have been made compatible with both python 2 and python 3.
  • Travis has been edited to use conda, and to build with both pandas and numpy.
  • Dataframe type conversion now takes place in two parts. (1) If any columns are datetime columns, they are first converted using astype(object); (2) Next, all columns are converted to unicode using DataFrame.astype(str).

Restructuring of Travis

Travis now builds using conda to avoid the lengthy compilation procedure required by numpy and pandas. Huge thanks to @scls19fr for the heads up. Python 3.2 does not appear to be supported by conda, so I did not include it in .travis.yml.

Pandas error converting datetime to string

@wavexx : I've included a traceback of the error that occurs when trying to convert datetime64 to string using astype().

For string

In [1]: import sys

In [2]: import pandas as pd

In [3]: sys.version
Out[3]: '2.7.6 (default, Mar 22 2014, 22:59:56) \n[GCC 4.8.2]'

In [4]: pd.__version__
Out[4]: '0.15.2'

In [5]: df = pd.read_csv('./sample/data_ohlcv.csv')

In [6]: df['Date'] = pd.to_datetime(df['Date'])

In [7]: df.astype(str)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-962ee1d96c0d> in <module>()
----> 1 df.astype(str)

/usr/local/lib/python2.7/dist-packages/pandas/core/generic.pyc in astype(self, dtype, copy, raise_on_error)
   2212 
   2213         mgr = self._data.astype(
-> 2214             dtype=dtype, copy=copy, raise_on_error=raise_on_error)
   2215         return self._constructor(mgr).__finalize__(self)
   2216 

/usr/local/lib/python2.7/dist-packages/pandas/core/internals.pyc in astype(self, dtype, **kwargs)
   2500 
   2501     def astype(self, dtype, **kwargs):
-> 2502         return self.apply('astype', dtype=dtype, **kwargs)
   2503 
   2504     def convert(self, **kwargs):

/usr/local/lib/python2.7/dist-packages/pandas/core/internals.pyc in apply(self, f, axes, filter, do_integrity_check, **kwargs)
   2455                                                  copy=align_copy)
   2456 
-> 2457             applied = getattr(b, f)(**kwargs)
   2458 
   2459             if isinstance(applied, list):

/usr/local/lib/python2.7/dist-packages/pandas/core/internals.pyc in astype(self, dtype, copy, raise_on_error, values)
    369     def astype(self, dtype, copy=False, raise_on_error=True, values=None):
    370         return self._astype(dtype, copy=copy, raise_on_error=raise_on_error,
--> 371                             values=values)
    372 
    373     def _astype(self, dtype, copy=False, raise_on_error=True, values=None,

/usr/local/lib/python2.7/dist-packages/pandas/core/internals.pyc in _astype(self, dtype, copy, raise_on_error, values, klass)
    399             if values is None:
    400                 # _astype_nansafe works fine with 1-d only
--> 401                 values = com._astype_nansafe(self.values.ravel(), dtype, copy=True)
    402                 values = values.reshape(self.values.shape)
    403             newb = make_block(values,

/usr/local/lib/python2.7/dist-packages/pandas/core/common.pyc in _astype_nansafe(arr, dtype, copy)
   2589         elif dtype != _NS_DTYPE:
   2590             raise TypeError("cannot astype a datetimelike from [%s] to [%s]" %
-> 2591                             (arr.dtype, dtype))
   2592         return arr.astype(_NS_DTYPE)
   2593     elif is_timedelta64_dtype(arr):

TypeError: cannot astype a datetimelike from [datetime64[ns]] to [|S0]

For unicode

In [8]: df.astype(unicode)

... <SAME TRACEBACK> ...

TypeError: cannot astype a datetimelike from [datetime64[ns]] to [<U0]

Bizarre, inconsistent errors in python 2 test

I've been getting some bizarre errors when running tests on 2d ndarrays and pandas DataFrames that are read from utf-8 encoded CSVs. So far, the error only occurs when running tests on unicode-example-utf8.txt after it is read into a pandas dataframe.

The traceback for the error is as follows:

======================================================================
ERROR: test_tabview_array_2d (__main__.TestTabviewIntegration)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/test_tabview.py", line 167, in test_tabview_array_2d
    search_str=None)
  File "/usr/lib/python2.7/curses/wrapper.py", line 43, in wrapper
    return func(stdscr, *args, **kwds)
  File "tests/test_tabview.py", line 115, in main
    v = t.Viewer(stdscr, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tabview-1.4.0-py2.7.egg/tabview/tabview.py", line 121, in __init__
    self.display()
  File "/usr/local/lib/python2.7/dist-packages/tabview-1.4.0-py2.7.egg/tabview/tabview.py", line 822, in display
    addstr(self.scr, yc, xc, s, attr)
  File "/usr/local/lib/python2.7/dist-packages/tabview-1.4.0-py2.7.egg/tabview/tabview.py", line 39, in addstr
    return scr.addstr(*args)
error: addstr() returned ERR

----------------------------------------------------------------------
Ran 13 tests in 0.076s

FAILED (errors=1)

The strange part is that these errors do not occur consistently. Rebooting my machine will sometimes stop them from occurring; however, they may occur later in a session. The only way that I have been able to stop these errors from occurring is to comment out test_tabview_array_2d. Uncommenting the test_tabview_2d_array function will result in errors occuring either with test_tabview_pandas_dataframe or test_tabview_array_2d.

I have not been able to reproduce the error in an interactive session--they only occur during the test. For instance, running:

tabview.view(pd.read_csv('sample/unicode-example-utf8.txt', encoding='utf-8').values)

Will execute succesfully without raising any exceptions in the interactive python shell.

I'm guessing that these errors might have something to do with the default system encoding on my machine ('ANSI_X3.4-1968'). Or maybe some encoding value is being saved between test_tabview_2d_array and test_tabview_pandas_dataframe. Any help in resolving this issue would be much appreciated.

@mdbartos
Copy link
Author

mdbartos commented Apr 1, 2015

Support for different encodings in pandas and numpy objects

I have added some new routines to handle different encodings for numpy and pandas data structures. Previously, the encoding had to be specified up-front:

df = pd.read_csv('./sample/unicode-example-utf8.txt', encoding='utf-8')

The process_data function has been modified to decode the DataFrame when the codec is unknown. The following code should now work.

df = pd.read_csv('./sample/unicode-example-utf8.txt')
tabview.view(df)

These routines rely on the detect_encoding function, and are somewhat slower than just calling astype(str) or np.vectorize(str). For this reason, the 'quick' approach is tried first, and if an exception is encountered, it resorts to the slower encoding routine.

addstr() error solved (kind of)

The addstr() error doesn't appear to be related to pandas/numpy datastructures or their implementation in this pull request. From what I understand, it's a common curses error where there's too much stuff to fit on one screen. I was able to stop it from occurring by setting column_width='mode' in test_tabview. This was the column_width value used for the unicode test (which uses the same dataset as the pandas and numpy 2d tests).

I think this pull request is getting to a pretty good point. Let me know if there are any additional tests that you think I should run.

@wavexx
Copy link
Member

wavexx commented Apr 2, 2015

On 04/01/2015 12:16 PM, Matthew Bartos wrote:

I think this pull request is getting to a pretty good point. Let me
know if there are any additional tests that you think I should run.

I'll have a look hopefully in the next following days.

I reported the astype(str) issue to pandas as well (see 9757) although
for now, the astype(object).astype(str) is the fastest way to go for
datetime objects.

@scls19fr
Copy link
Contributor

scls19fr commented Sep 8, 2015

yes or tabviewers

@wavexx
Copy link
Member

wavexx commented Sep 8, 2015

On 07/09/15 23:58, Scott Hansen wrote:

What about using 'tabviewer' as the organization name? Doesn't seem to
be taken from what I can find. Then continue on with the tabview-common,
tabview-curses, gtabview, etc.

I like that.

@interrogator
Copy link

Hi gang, amazing work. Would we be able to get an update on the state of this?

@firecat53
Copy link
Collaborator

@interrogator I'm afraid there's nothing new to report. I unfortunately haven't done much with this project in the last year other than a few bugfixes. I think @wavexx has gone farther with his GUI fork 'gtabview', as he has added Blaze support and multi-index support. It's on my radar to play with this more but....the usual time, family, work, blah blah blah 😉

Scott

@interrogator
Copy link

Fair enough. I'd love to see Pandas support enter the master, and would be happy to develop things a bit more on the Pandas side if that were happening!

@interrogator
Copy link

@firecat53 My fork of tabview, master branch, is based on this PR. It now supports a few features I've seen requested. I'm only targeting and testing for Pandas data, but other types should/could work too. Main features implemented:

  • Index is frozen in place (i.e. does not scroll off screen) and bold by default
  • Pandas MultiIndex fully and nicely supported
  • Sparse index (if a cell in the index is the same as the cell directly above, don't print it)
  • Index name appearing nicely in the top line
  • Align cells right (this is implemented as a keyword argument)
  • HLINE and VLINE between header, index, data (just an aesthetic preference)

Example screenshot:

screen shot 2016-11-23 at 16 04 09

Anyway, just thought I'd let you know.

@scls19fr
Copy link
Contributor

@interrogator you might have a look at @wavexx code in https://github.com/wavexx/gtabview
because it provides very nice features such as PostgreSQL, MySQL, SQLite databases support (any Blaze supported URI in fact )

@interrogator
Copy link

@scls19fr Thanks. I've had a look, and it's cool, but I'm not really after graphical or Qt dependencies. I'm fine just working off my fork, but just wanted to let people know here, since there are a number of issues asking for index pinned to the left, Pandas support, etc.

@firecat53
Copy link
Collaborator

@interrogator Nice work! I think @scls19fr was just referring to the Blaze dependency which allows access to multiple data types. We could definitely integrate that into the tabview code for use in the curses viewer as well.

@wavexx
Copy link
Member

wavexx commented Nov 23, 2016

On Wed, Nov 23 2016, scls19fr wrote:

@interrogator you might have a look at @wavexx code in
https://github.com/wavexx/gtabview because it provides very nice
features such as PostgreSQL, MySQL, SQLite databases support (any
Blaze supported URI

in fact )

I've been stalled on this front for quite some time now.

Technically the code in gtabview is reusable, but sorting and reindexing
are two things that need to be handled as well, and there's zero support
for this in gtabview [and I do miss this, btw]. My aim in gtabview was
for zero-copy, but since we already massage the data in tabview, I
wouldn't be so picky.

The PR looks very clean to me.
We can build on this.

What do you say?

@interrogator
Copy link

I'd love to be able to rely on a tabview that was in development, rather than use my own fork. My two cents: I'd just use pandas.read_csv() for CSV data, and work only with DataFrame and Series objects. That'd reduce the data reading code a lot, and speed it up. The list of lists model is not very scalable anyway.

All the features I added could easily be kwargs for the view method, or passed in on the command line. Alignment, where to put HLINE AND VLINE, and frozen index/header stand out as things people might want. I'd be happy to contribute that, but not if it meant writing new code for every possible input format.

Blaze though, I don't know anything about, so I might be headed in the wrong direction totally.

Thanks for the thoughts, everyone.

@saulpw
Copy link

saulpw commented Nov 24, 2016

I'm actively developing a curses data browser (https://github.com/saulpw/visidata), which currently has most of the functionality of tabview (and a lot more is planned over the next couple of months). I don't mean to take away from tabview's userbase, but maybe someone can make use of VisiData or steer its development to be useful to them.

@firecat53
Copy link
Collaborator

@saulpw Neat project! Originally saw it via reddit. I think you're going for a more full-featured viewer/editor than I'm aiming for. Choices and competition are always a good thing 😄

@firecat53 firecat53 mentioned this pull request Nov 27, 2016
@firecat53
Copy link
Collaborator

firecat53 commented Nov 27, 2016

Hey all, I merged @interrogator's fork into tabview master (on the interrogator-pandas branch) and opened PR #137 so we can hack on that.

@wavexx
Copy link
Member

wavexx commented Apr 6, 2017

Ok guys, I created TabViewer and moved gtabview into it. Since I've never created an org in github before, we'll see how this goes.

@scls19fr
Copy link
Contributor

scls19fr commented Apr 6, 2017

Hi Yuri, you need to send an invite to people to be member of TabViewer org https://github.com/orgs/TabViewer/people

@scls19fr
Copy link
Contributor

scls19fr commented Apr 6, 2017

Maybe @firecat53 should also consider transfering ownership of tabview to this Tabviewer org.
It can be done using https://github.com/firecat53/tabview/settings
Transfer ownership Transfer this repository to another user or to an organization where you have the ability to create repositories.
But @wavexx must first allow @firecat53 to create repositories into TabViewer org.

An other step to do will be to split gtabview code into

  • tabview_common (or an other name to define)
  • gtabview (just GUI stuff)

@firecat53
Copy link
Collaborator

Thanks @wavexx ! I went ahead and transferred tabview ownership to TabViewer. Hopefully this will garner it some more love. I don't use it almost daily for a second job anymore so I haven't shown it much love for awhile 😞 .

@JPFrancoia
Copy link

JPFrancoia commented Aug 27, 2018

Hi guys, where are we with this PR? It seems really interesting.

EDIT: I'm actually using it already. The ability to view a panda dataset is amazing. Quick question: does it load the data set twice in memory? (Once when created with pandas, the second time when displayed with tabview?)

@JPFrancoia JPFrancoia mentioned this pull request Aug 27, 2018
@interrogator
Copy link

In my case, maintainer didn't seem too interested in pandas structures, so I just forked to suit my own personal needs (not for general use unfortunately, will probably just break). Personally, I think pandas support is the way this project should head, I even got my fork handling multiindex, and running queries when you pressed on a cell, it's really quite flexible!

@JPFrancoia
Copy link

@interrogator : totally agree with you. For now I have quite basic needs, so I just forker this PR. I'm worried about the memory though, I'll be working with big datasets. Do you know if the data is loaded an additional time for the display?

@interrogator
Copy link

For my fork I assume it does, there is the pandas object and then the hacky list of lists kind of thing that tabview uses underneath. If I remember correctly there was once a PR to integrate tabview into Pandas and it was rejected on memory grounds.

If memory was a concern I'd love to see this project forked to be pandas first, and memory-aware. Right now I think it's neither, because the dev had no need or interest in either, which is fair enough.

I am not going to bother, but the way I would approach this is by adding a keyword argument to the view method, more_data_callback or whatever, which happens when the user goes beyond the default n lines loaded into tabview. Then in that function, outside of tabview, you just do a simple pandas slice and refresh. This would at least cut your memory usage down to less than 2x, since most users probably won't scroll that far.

@firecat53
Copy link
Collaborator

firecat53 commented Aug 27, 2018

Yeah, I'm sorry all...I just don't have time or inclination to work much on this project anymore in the foreseeable future! If someone has skills and interest, myself or @wavexx would probably be happy to add you to the Tabviewer organization as a developer/maintainer.

I also feel like the Visidata project has eclipsed pretty much all the goals I originally had for the project. I don't know how much Pandas/numpy integration it has, but I'd definitely check. Visidata has a much larger scope, but it still seemed plenty zippy the few times I've tested it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants