Sync modifications in GUI to iPython & record history #20

pandichef · 2020-07-14T03:37:51Z

I noticed that you're looking for feedback... and I think this project is on to something...

My dream DataFrame viewer would work like this:

from pandasgui import DataFrameViewer
viewer = DataFrameViewer()
viewer.open()  # opens a GUI window
viewer.bind(df1)  # tab 1 in the viewer
viewer.bind(df2)  # tab 2 in the viewer
viewer.bind(df3)  # tab 3 in the viewer

Now when you change a cell value in the GUI, the application "pushes" the code to the python/ipython shell i.e., you see df1.at['row', 'col'] = 10 in the shell.

Moreover, if you type df1.at['row', 'col'] = 20 in the shell, the value automatically changes in the GUI. I think such an application would be 1000+ Github stars in no time.

The text was updated successfully, but these errors were encountered:

adamerose · 2020-07-14T19:38:49Z

@pandichef Thanks for the feedback.

the application "pushes" the code to the python/ipython shell

Just want to make sure I understand this part. I think it would not be explicitly adding or sending anything new to the shell, right? I'd just have it so any modifications to a DataFrame in the GUI would be done in-place, and the DataFrame object you're seeing in the GUI would be the same one you have in iPython so all in-place modifications would be seen in both, let me know if you had something else in mind.

Here are some other thoughts

I'll add a field to the settings argument (maybe called sync or inplace) when creating the PandasGui that enables in-place modifications. It should be False by default so people aren't accidentally changing their data
PyQt has an update signal that needs to be emitted to rerender when changes are made to the underlying data, and I can't think of a good way to trigger this when the user makes changes in iPython other than either...
1. Having a timer that continuously triggers it every few seconds (may impact performance) or
2. Requiring the user to explicitly call some method on the Gui instance after making a change to the data (would be annoying)
My understanding is that Pandas best practice is to not do in-place modification of DataFrames. And I think a lot of people including me often use method chaining which also does not go well with inplace modification. https://stackoverflow.com/questions/45570984/in-pandas-is-inplace-true-considered-harmful-or-not
Not sure what implications this has on this, if any, since this syncing feature will be optional.
Any operations done by the GUI that use functions without an inplace argument can just do an in-place update using the update method on the resulting DataFrame.
You also gave an example of taking an existing GUI instance and adding new DataFrames to it, that is another thing I can look at supporting.
Another feature I want to add is creating new DataFrames within the GUI like by doing pivots, and a question related to this is how I should get those available in iPython. I was thinking on initial creation I could return a dictionary that holds all the DataFrame objects in the GUI, so you'd do my_dfs = PandasGUI() and then when you create a new one in the GUI you'd access it in iPython like my_dfs['new_df_name']

pandichef · 2020-07-14T21:30:38Z

I think it would not be explicitly adding or sending anything new to the shell, right?

It would be. The idea is that you can edit cells/columns/rows in the GUI and those commands would be "recorded" in iPython. So if you type %history, you can see the commands that were implicitly executed. Then you can copy these into a script if you like. This is very common workflow i.e., you play with the data manually until you get it just right and then you convert those steps into a repeatable process. Another option is to just create a console in the GUI that records the underlying commands... and these can be copied/pasted into a .py file.

As for inplace, it's not best practice, but it doesn't matter too much either way. df = df.do_something() is equivalent to df.do_something(inplace=True).

By the way, you might want to check out the Stata or SAS applications. The Stata editor is amazing, but unfortunately only support Stata's native DataFrame object. Stata data editor example... notice how all the commands are "recorded" in the upper left corner.

adamerose · 2020-07-15T16:40:31Z

It would be. The idea is that you can edit cells/columns/rows in the GUI and those commands would be "recorded" in iPython.

Okay I see what you mean, I think these are two separate features

Make DataFrame modification in the GUI in-place so your objects get modified in iPython as well
Log a history of the Python commands corresponding to any changes made within the GUI.

So if you type %history, you can see the commands that were implicitly executed. Another option is to just create a console in the GUI that records the underlying commands... and these can be copied/pasted into a .py file.

Seems like modifying iPython %history programmatically is not possible and would be a bit of a hack anyways since the user is not actually running that code directly, so I would go with keeping a separate history of commands executed by the GUI.

I'm not sure the best way to approach this... Simplest thing would be having a plain string of Python code corresponding to each possible action in the GUI and adding it to a command history (with string substitution to denote the DataFrame names it was applied to). If anyone has a code example of other projects implementing something like to this, please share a link.

df = df.do_something() is equivalent to df.do_something(inplace=True).

That's incorrect. The 1st way is assigning a new object to your df variable, the 2nd way is modifying the original object in-place.

import pandas as pd
df = pd.DataFrame({'a':[1,2,None], 'b':[4,5,None]})
print(id(df))
gui_df = df
df.dropna(inplace=True)
print(id(df))
print(gui_df)

1460280587336
1460280587336
     a    b
0  1.0  4.0
1  2.0  5.0

import pandas as pd
df = pd.DataFrame({'a':[1,2,None], 'b':[4,5,None]})
print(id(df))
gui_df = df
df = df.dropna()
print(id(df))
print(gui_df)

1460280618312
1460280778504
     a    b
0  1.0  4.0
1  2.0  5.0
2  NaN  NaN

By the way, you might want to check out the Stata or SAS applications. The Stata editor is amazing, but unfortunately only support Stata's native DataFrame object. Stata data editor example... notice how all the commands are "recorded" in the upper left corner.

I've used JMP which is similar (I've been working on adding graphing similar to JMP's Graph Builder), I'll check out Stata too - thanks for the link.

pandichef · 2020-07-15T16:47:28Z

Ah. I stand corrected about inplace. All good points. I've used JMP too and really like it.

Indeed, I think the first step is to write a utility that maps GUI clicks to python code (as a string). I can try to help with that, but I don't know enough about your implementation of the GUI.

adamerose · 2020-07-15T17:59:15Z

I can try to help with that, but I don't know enough about your implementation of the GUI.

Yeah I've been meaning to add some documentation on how things are structured. But first I want to do some planned refactoring to clean the code up before trying to get more popularity & requesting contributors. A quick rundown of how it is right now:

Everything is built on PyQt5.
The main GUI object is PandasGUI(QtWidgets.QMainWindow) in /gui.py, which keeps a dictionary called self.df_dicts holding the names of each DataFrame in the GUI along with a corresponding dataframe_explorer object.
The dataframe_explorer (DataFrameExplorer(QtWidgets.QTabWidget) in widgets/dataframe_explorer.py) is just a QTabWidget with the various UI tabs corresponding the each DataFrame. The first tab contains an instance of a dataframe_viewer (DataFrameViewer(QtWidgets.QWidget) in widgets/dataframe_explorer.py), which is the widget that displays a DataFrame as a spreadsheet
Right now all editing happens inside DataFrameViewer methods on its local copy of the DataFrame, but that's subject to change

Indeed, I think the first step is to write a utility that maps GUI clicks to python code (as a string).

That sounds good, I still want to do some thinking about this. One thing that would be useful is a list of all current and future potential actions a person might do in the GUI that affects a DataFrame so I can make sure they'd work in my design. If you have any other ideas, add them to the list

Current

Edit DataFrame by double clicking a single cell and typing a new value
Edit DataFrame by pasting. (single or multiple cells, comma delimited format).. this is broken right now, need to fix
Sort DataFrame by a column by left clicking the header

Future

Modifying DataFrames
- column reorder
- change column data types
- dropna
- date parsing
- fillna
- outlier detection
- reshape (melt, wide_to_long, pivot, pivot_table, stack, unstack, etc.)
Creating new DataFrames:
- reshaped existing ones
- import CSV
Delete DataFrames from the GUI
Filters that can be toggled on and off with an option to lock them in (overwrite original or create new DataFrame with the filters applied). I'd store two copies of each DataFrame in the GUI, the main one and one with enabled filters applied. An edge case is what happens if you do filtering like this and then edit it, I think the changes should propagate back to the original DataFrame
JMP-style 'calculated' columns which are columns with values defined by a function, that get automatically recalculated after any change to the DataFrame
Lasso selection that filters the DataFrame

pandichef · 2020-07-15T19:25:06Z

This looks good. You might also want to include a "how to contribute" section in README.md.

I wouldn't prioritize these tbh:

calculated columns: JMP tries to be like a hybrid between MS Excel and SAS, which is why a calculated column makes sense. However, data scientists usually want to use the GUI just to prototype... but ultimately want a script to repeat the actions taken in the GUI.
graphing: graphing is pretty straightforward in iPython, so I don't think graphing capabilities will offer as much value to users as the DataFrame editor with "recording".

adamerose · 2020-08-30T05:35:20Z

UPDATE

Record history
I added a basic history log. You can find it in the latest dev commits, it's under the "Debug" menu. This required some refactoring but now and going forward any action you take in the GUI will occur by calling some method in pandasgui.store.PandasGuiDataFrame and if it has the @track_history decorator it will be recorded. This doesn't give copy-pastable code to re-create your actions in Python, I found that required too much maintenance. I considered just dumping the code for each method into the log using something like inspect.getsource but that was messy and I think most functions are simple enough to reproduce or people can just look at my source code directly.

Sync modifications
I decided not to add this feature since inplace modification was messier to implement, goes against Pandas best practice, and would probably would annoy more users than it would help (playing around in GUI with your dataset then realizing you messed up the original). I added an alternative though, you can call .get_dataframes() on the PandasGui object (returned by show) and get a dictionary of the dataframes in the GUI with your modifications.

The code base is also cleaner than last time and since all DataFrame modification happens in that one class it should be easier to follow what's going on.

KobaKhit · 2020-10-26T14:36:49Z

A quick note about the recorded history of operations on a data frame in the GUI. How are you applying the filter queries to a dataframe. Through parsing them into pandas operations? You could use the same mechanism to reapply the filters given the filter queries in your top left section. This will make operations performed in the gui reproducible using pandas gui own filtering routine. You can save the operation history string (list of strings) as an attribute in the gui class and should be straight forward to update every time a filter is added or cell is edited.

Then reapply the filters will become something like

from pandasgui import apply_filters
...
h = gui.history
h
['60<hp<70','Name == Abra']

df_processed = apply_filters(h, df)

adamerose · 2020-10-26T14:58:18Z

A quick note about the recorded history of operations on a data frame in the GUI. How are you applying the filter queries to a dataframe. Through parsing them into pandas operations? You could use the same mechanism to reapply the filters given the filter queries in your top left section. This will make operations performed in the gui reproducible using pandas gui own filtering routine. You can save the operation history string (list of strings) as an attribute in the gui class and should be straight forward to update every time a filter is added or cell is edited.

Then reapply the filters will become something like
from pandasgui import apply_filters
...
h = gui.history
h
['60<hp<70','Name == Abra']

df_processed = apply_filters(h, df)

The part that actually applies the filters is just a loop with df = df.query(filt.expr), that method does the parsing and applying the operations. It happens in store.py like this:

    @track_history
    def add_filter(self, expr: str, enabled=True):
        filt = Filter(expr=expr, enabled=enabled, failed=False)
        self.filters.append(filt)
        self.apply_filters()

    @track_history
    def remove_filter(self, index: int):
        self.filters.pop(index)
        self.apply_filters()

    @track_history
    def edit_filter(self, index: int, expr: str):
        filt = self.filters[index]
        filt.expr = expr
        filt.failed = False
        self.apply_filters()

    @track_history
    def toggle_filter(self, index: int):
        self.filters[index].enabled = not self.filters[index].enabled
        self.apply_filters()

    def apply_filters(self):

        df = self.dataframe_original
        for ix, filt in enumerate(self.filters):
            if filt.enabled and not filt.failed:
                try:
                    df = df.query(filt.expr)
                except Exception as e:
                    self.filters[ix].failed = True
                    logger.exception(e)
        self.dataframe = df
        self.update()

And those @track_history decorated methods will show up in the history like this

HistoryItem(name='add_filter', args=(), kwargs={'expr': 'HP > 56'}, time='10:46:09')
HistoryItem(name='add_filter', args=(), kwargs={'expr': '`Type 1` == "Grass"'}, time='10:46:18')
HistoryItem(name='toggle_filter', args=(1,), kwargs={}, time='10:46:21')
HistoryItem(name='toggle_filter', args=(1,), kwargs={}, time='10:46:22')
HistoryItem(name='remove_filter', args=(1,), kwargs={}, time='10:47:10')

So I'm already storing a history of these methods and arguments, it's just going beyond this and being able to generate runnable code is where things get tricky, since these methods often reference self or the data store which wouldn't exist in a user's script

KobaKhit · 2020-10-26T15:03:50Z

Does not have to bu runnable code. Can be a just a function or method as long as it allows to convenietnly reproduce the processed data frame. If you are open to commits I could look into creating a convenient way to reapply to filters from history.

adamerose changed the title ~~Suggestion~~ Sync modifications in GUI to iPython Jul 14, 2020

adamerose changed the title ~~Sync modifications in GUI to iPython~~ Sync modifications in GUI to iPython & record history Jul 15, 2020

adamerose closed this as completed Aug 30, 2020

adamerose mentioned this issue Nov 28, 2020

Support to add new feature using python #69

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync modifications in GUI to iPython & record history #20

Sync modifications in GUI to iPython & record history #20

pandichef commented Jul 14, 2020

adamerose commented Jul 14, 2020 •

edited

pandichef commented Jul 14, 2020 •

edited

adamerose commented Jul 15, 2020

pandichef commented Jul 15, 2020

adamerose commented Jul 15, 2020

pandichef commented Jul 15, 2020

adamerose commented Aug 30, 2020

KobaKhit commented Oct 26, 2020

adamerose commented Oct 26, 2020 •

edited

KobaKhit commented Oct 26, 2020

Sync modifications in GUI to iPython & record history #20

Sync modifications in GUI to iPython & record history #20

Comments

pandichef commented Jul 14, 2020

adamerose commented Jul 14, 2020 • edited

pandichef commented Jul 14, 2020 • edited

adamerose commented Jul 15, 2020

pandichef commented Jul 15, 2020

adamerose commented Jul 15, 2020

pandichef commented Jul 15, 2020

adamerose commented Aug 30, 2020

KobaKhit commented Oct 26, 2020

adamerose commented Oct 26, 2020 • edited

KobaKhit commented Oct 26, 2020

adamerose commented Jul 14, 2020 •

edited

pandichef commented Jul 14, 2020 •

edited

adamerose commented Oct 26, 2020 •

edited