-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for dict, ndarray and pandas objects #116
Conversation
…ifferent inputs. Changed data_list_to_file into more generalized input_type. The process_data function now passes a dict of data and header information into Viewer.
Wow, this is a lot to digest for me, especially given I haven't worked with numpy or pandas. It's going to take some time for me to work through. @wavexx and @scls19fr, if you guys have a chance can you also take a look at this PR? Would you be able to add your test cases to the unit tests as well? Thanks very much for the interest and work on this!! |
I wasn't against the idea of having Pandas support inside tabview (and not on Pandas side). That's a very interesting PR. |
I don't understand why using (for example)
and not
but that's probably to avoid to import Pandas... and that's a good idea In
Inside |
On 03/28/2015 02:15 AM, Matthew Bartos wrote:
Somehow I was hoping for the resulting 'data' to be just the contents of This makes me think that we should really consider avoiding that initial Now, some suggestions. For the 'pandas' type, in process_data(), there's a weird apply(). About the 'orient' keyword argument. I see why it's there, but I'd There is some whitespace/indentation to cleanup (some spurious empty Your concern about the header is valid. We need such an argument for the The code does not depend of pandas/numpy, which is great for having As commented by @scls19fr, in process_data an if/elif chain would be If you can fix the above, I would pull this in for further improvement. |
regarding ndarray.
Thanks for the positive feedback! There's a lot to cover here, so I'll respond to @firecat53 first. I've added tests to The tests work on my machine, but currently fail during the Travis build because they require @scls19fr : You're right; the reason I chose to use @wavexx : Thanks for your suggestions. The reason I chose to convert to string in Huh, I didn't know that there was a The I should be able to address these issues shortly. I'm not sure I want to take on the 'header' issue in this pull request. It's a pretty significant structural issue and I think it's worthy of its own pull request. |
Hello,
maybe it could be a way to explore to add Pandas and Numpy dependencies (only for test and/or dev) To avoid a very long continuous integration process because of Pandas install we might probably use Miniconda as Python distribution and See for example |
On 03/29/2015 01:25 AM, Matthew Bartos wrote:
I think that's the right approach without adding a dependency.
I understand the logic. If there's a type conversion failure, I'd like to have a test for it. I didn't come with any issues with type conversions to str so far, but As for fillna, doing it later should be fine as well. The trick is that
I would, yes. If you're using ndarrays or dataframes, a transposition is And yes, index will be added later as soon as we have fixed columns.
Fine for me. |
process_data. Cleaned up extraneous spaces.
Summary of revisions
Restructuring of TravisTravis now builds using conda to avoid the lengthy compilation procedure required by numpy and pandas. Huge thanks to @scls19fr for the heads up. Python 3.2 does not appear to be supported by conda, so I did not include it in Pandas error converting datetime to string@wavexx : I've included a traceback of the error that occurs when trying to convert datetime64 to string using astype(). For string
For unicode
Bizarre, inconsistent errors in python 2 testI've been getting some bizarre errors when running tests on 2d ndarrays and pandas DataFrames that are read from utf-8 encoded CSVs. So far, the error only occurs when running tests on The traceback for the error is as follows:
The strange part is that these errors do not occur consistently. Rebooting my machine will sometimes stop them from occurring; however, they may occur later in a session. The only way that I have been able to stop these errors from occurring is to comment out I have not been able to reproduce the error in an interactive session--they only occur during the test. For instance, running:
Will execute succesfully without raising any exceptions in the interactive python shell. I'm guessing that these errors might have something to do with the default system encoding on my machine ('ANSI_X3.4-1968'). Or maybe some encoding value is being saved between |
dataframe. Encoding for dataframes no longer needs to be specified.
ndarray to correspond to unicode test column width.
Support for different encodings in pandas and numpy objectsI have added some new routines to handle different encodings for numpy and pandas data structures. Previously, the encoding had to be specified up-front:
The
These routines rely on the addstr() error solved (kind of)The addstr() error doesn't appear to be related to pandas/numpy datastructures or their implementation in this pull request. From what I understand, it's a common curses error where there's too much stuff to fit on one screen. I was able to stop it from occurring by setting I think this pull request is getting to a pretty good point. Let me know if there are any additional tests that you think I should run. |
On 04/01/2015 12:16 PM, Matthew Bartos wrote:
I'll have a look hopefully in the next following days. I reported the astype(str) issue to pandas as well (see 9757) although |
yes or |
On 07/09/15 23:58, Scott Hansen wrote:
I like that. |
Hi gang, amazing work. Would we be able to get an update on the state of this? |
@interrogator I'm afraid there's nothing new to report. I unfortunately haven't done much with this project in the last year other than a few bugfixes. I think @wavexx has gone farther with his GUI fork 'gtabview', as he has added Blaze support and multi-index support. It's on my radar to play with this more but....the usual time, family, work, blah blah blah 😉 Scott |
Fair enough. I'd love to see Pandas support enter the master, and would be happy to develop things a bit more on the Pandas side if that were happening! |
@firecat53 My fork of tabview, master branch, is based on this PR. It now supports a few features I've seen requested. I'm only targeting and testing for Pandas data, but other types should/could work too. Main features implemented:
Example screenshot: Anyway, just thought I'd let you know. |
@interrogator you might have a look at @wavexx code in https://github.com/wavexx/gtabview |
@scls19fr Thanks. I've had a look, and it's cool, but I'm not really after graphical or Qt dependencies. I'm fine just working off my fork, but just wanted to let people know here, since there are a number of issues asking for index pinned to the left, Pandas support, etc. |
@interrogator Nice work! I think @scls19fr was just referring to the Blaze dependency which allows access to multiple data types. We could definitely integrate that into the tabview code for use in the curses viewer as well. |
On Wed, Nov 23 2016, scls19fr wrote:
I've been stalled on this front for quite some time now. Technically the code in gtabview is reusable, but sorting and reindexing The PR looks very clean to me. What do you say? |
I'd love to be able to rely on a tabview that was in development, rather than use my own fork. My two cents: I'd just use All the features I added could easily be kwargs for the Blaze though, I don't know anything about, so I might be headed in the wrong direction totally. Thanks for the thoughts, everyone. |
I'm actively developing a curses data browser (https://github.com/saulpw/visidata), which currently has most of the functionality of tabview (and a lot more is planned over the next couple of months). I don't mean to take away from tabview's userbase, but maybe someone can make use of VisiData or steer its development to be useful to them. |
@saulpw Neat project! Originally saw it via reddit. I think you're going for a more full-featured viewer/editor than I'm aiming for. Choices and competition are always a good thing 😄 |
Hey all, I merged @interrogator's fork into tabview master (on the interrogator-pandas branch) and opened PR #137 so we can hack on that. |
Ok guys, I created TabViewer and moved gtabview into it. Since I've never created an org in github before, we'll see how this goes. |
Hi Yuri, you need to send an invite to people to be member of TabViewer org https://github.com/orgs/TabViewer/people |
Maybe @firecat53 should also consider transfering ownership of tabview to this Tabviewer org. An other step to do will be to split gtabview code into
|
Thanks @wavexx ! I went ahead and transferred tabview ownership to TabViewer. Hopefully this will garner it some more love. I don't use it almost daily for a second job anymore so I haven't shown it much love for awhile 😞 . |
Hi guys, where are we with this PR? It seems really interesting. EDIT: I'm actually using it already. The ability to view a panda dataset is amazing. Quick question: does it load the data set twice in memory? (Once when created with pandas, the second time when displayed with tabview?) |
In my case, maintainer didn't seem too interested in pandas structures, so I just forked to suit my own personal needs (not for general use unfortunately, will probably just break). Personally, I think pandas support is the way this project should head, I even got my fork handling multiindex, and running queries when you pressed on a cell, it's really quite flexible! |
@interrogator : totally agree with you. For now I have quite basic needs, so I just forker this PR. I'm worried about the memory though, I'll be working with big datasets. Do you know if the data is loaded an additional time for the display? |
For my fork I assume it does, there is the pandas object and then the hacky list of lists kind of thing that tabview uses underneath. If I remember correctly there was once a PR to integrate tabview into Pandas and it was rejected on memory grounds. If memory was a concern I'd love to see this project forked to be pandas first, and memory-aware. Right now I think it's neither, because the dev had no need or interest in either, which is fair enough. I am not going to bother, but the way I would approach this is by adding a keyword argument to the view method, |
Yeah, I'm sorry all...I just don't have time or inclination to work much on this project I also feel like the Visidata project has eclipsed pretty much all the goals I originally had for the project. I don't know how much Pandas/numpy integration it has, but I'd definitely check. Visidata has a much larger scope, but it still seemed plenty zippy the few times I've tested it. |
Overview
The additions shown in the following pull request implement support for dicts, pandas objects and numpy ndarrays.
In this implementation, the
process_data
function handles the majority of the data formatting. Instead of returning the data and headers together as a single object, theprocess_data
function now returns a dict with two entries:This allows the Viewer class to make fewer assumptions about the data that it is receiving. It also allows for faster (vectorized) formatting of data for pandas and numpy objects by avoiding computationally expensive list comprehensions.
In a previous pull request, some contributors raised concerns about whether support for pandas objects should be part of the pandas library, rather than part of the tabview library. Personally, I would prefer a standalone curses library with some support for pandas objects. I regularly work with pandas, numpy, dict and csv objects, and it would be nice to have a curses viewer that can deal with all of these cases in an interactive setting. Moreover, because numpy and pandas have become the de facto standard libraries for working with array-like data in python, I think it makes sense to add support for at least these libraries.
Below, I have tested the new implementation with lists of lists, dicts, pandas objects (
Series
,DataFrame
andPanel
) and numpyndarray
. You can see the output by running my fork of tabview (branchfeat
):Summary of changes:
Viewer
now accepts dicts containing (1) data and (2) header information.data_list_or_file
changed to more generalizedinput_type
process_data
now formats dicts, pandas objects, and ndarrays and returns a dict of data/header info.view
: exception handling forlocale.getlocate(locale.LC_ALL)
*Note on headers: There needs to be some sort of parameter for header preferences in tabview.view. In this implementation, I just guessed as to what users would most likely want.