Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync modifications in GUI to iPython & record history #20

Closed
pandichef opened this issue Jul 14, 2020 · 10 comments
Closed

Sync modifications in GUI to iPython & record history #20

pandichef opened this issue Jul 14, 2020 · 10 comments

Comments

@pandichef
Copy link

I noticed that you're looking for feedback... and I think this project is on to something...

My dream DataFrame viewer would work like this:

from pandasgui import DataFrameViewer
viewer = DataFrameViewer()
viewer.open()  # opens a GUI window
viewer.bind(df1)  # tab 1 in the viewer
viewer.bind(df2)  # tab 2 in the viewer
viewer.bind(df3)  # tab 3 in the viewer

Now when you change a cell value in the GUI, the application "pushes" the code to the python/ipython shell i.e., you see df1.at['row', 'col'] = 10 in the shell.

Moreover, if you type df1.at['row', 'col'] = 20 in the shell, the value automatically changes in the GUI. I think such an application would be 1000+ Github stars in no time.

@adamerose
Copy link
Owner

adamerose commented Jul 14, 2020

@pandichef Thanks for the feedback.

the application "pushes" the code to the python/ipython shell

Just want to make sure I understand this part. I think it would not be explicitly adding or sending anything new to the shell, right? I'd just have it so any modifications to a DataFrame in the GUI would be done in-place, and the DataFrame object you're seeing in the GUI would be the same one you have in iPython so all in-place modifications would be seen in both, let me know if you had something else in mind.

Here are some other thoughts

  • I'll add a field to the settings argument (maybe called sync or inplace) when creating the PandasGui that enables in-place modifications. It should be False by default so people aren't accidentally changing their data

  • PyQt has an update signal that needs to be emitted to rerender when changes are made to the underlying data, and I can't think of a good way to trigger this when the user makes changes in iPython other than either...

    1. Having a timer that continuously triggers it every few seconds (may impact performance) or
    2. Requiring the user to explicitly call some method on the Gui instance after making a change to the data (would be annoying)
  • My understanding is that Pandas best practice is to not do in-place modification of DataFrames. And I think a lot of people including me often use method chaining which also does not go well with inplace modification. https://stackoverflow.com/questions/45570984/in-pandas-is-inplace-true-considered-harmful-or-not
    Not sure what implications this has on this, if any, since this syncing feature will be optional.

  • Any operations done by the GUI that use functions without an inplace argument can just do an in-place update using the update method on the resulting DataFrame.

  • You also gave an example of taking an existing GUI instance and adding new DataFrames to it, that is another thing I can look at supporting.

  • Another feature I want to add is creating new DataFrames within the GUI like by doing pivots, and a question related to this is how I should get those available in iPython. I was thinking on initial creation I could return a dictionary that holds all the DataFrame objects in the GUI, so you'd do my_dfs = PandasGUI() and then when you create a new one in the GUI you'd access it in iPython like my_dfs['new_df_name']

@adamerose adamerose changed the title Suggestion Sync modifications in GUI to iPython Jul 14, 2020
@pandichef
Copy link
Author

pandichef commented Jul 14, 2020

I think it would not be explicitly adding or sending anything new to the shell, right?

It would be. The idea is that you can edit cells/columns/rows in the GUI and those commands would be "recorded" in iPython. So if you type %history, you can see the commands that were implicitly executed. Then you can copy these into a script if you like. This is very common workflow i.e., you play with the data manually until you get it just right and then you convert those steps into a repeatable process. Another option is to just create a console in the GUI that records the underlying commands... and these can be copied/pasted into a .py file.

As for inplace, it's not best practice, but it doesn't matter too much either way. df = df.do_something() is equivalent to df.do_something(inplace=True).

By the way, you might want to check out the Stata or SAS applications. The Stata editor is amazing, but unfortunately only support Stata's native DataFrame object. Stata data editor example... notice how all the commands are "recorded" in the upper left corner.

@adamerose
Copy link
Owner

It would be. The idea is that you can edit cells/columns/rows in the GUI and those commands would be "recorded" in iPython.

Okay I see what you mean, I think these are two separate features

  1. Make DataFrame modification in the GUI in-place so your objects get modified in iPython as well
  2. Log a history of the Python commands corresponding to any changes made within the GUI.

So if you type %history, you can see the commands that were implicitly executed. Another option is to just create a console in the GUI that records the underlying commands... and these can be copied/pasted into a .py file.

Seems like modifying iPython %history programmatically is not possible and would be a bit of a hack anyways since the user is not actually running that code directly, so I would go with keeping a separate history of commands executed by the GUI.

I'm not sure the best way to approach this... Simplest thing would be having a plain string of Python code corresponding to each possible action in the GUI and adding it to a command history (with string substitution to denote the DataFrame names it was applied to). If anyone has a code example of other projects implementing something like to this, please share a link.

df = df.do_something() is equivalent to df.do_something(inplace=True).

That's incorrect. The 1st way is assigning a new object to your df variable, the 2nd way is modifying the original object in-place.

import pandas as pd
df = pd.DataFrame({'a':[1,2,None], 'b':[4,5,None]})
print(id(df))
gui_df = df
df.dropna(inplace=True)
print(id(df))
print(gui_df)
1460280587336
1460280587336
     a    b
0  1.0  4.0
1  2.0  5.0
import pandas as pd
df = pd.DataFrame({'a':[1,2,None], 'b':[4,5,None]})
print(id(df))
gui_df = df
df = df.dropna()
print(id(df))
print(gui_df)
1460280618312
1460280778504
     a    b
0  1.0  4.0
1  2.0  5.0
2  NaN  NaN

By the way, you might want to check out the Stata or SAS applications. The Stata editor is amazing, but unfortunately only support Stata's native DataFrame object. Stata data editor example... notice how all the commands are "recorded" in the upper left corner.

I've used JMP which is similar (I've been working on adding graphing similar to JMP's Graph Builder), I'll check out Stata too - thanks for the link.

@adamerose adamerose changed the title Sync modifications in GUI to iPython Sync modifications in GUI to iPython & record history Jul 15, 2020
@pandichef
Copy link
Author

Ah. I stand corrected about inplace. All good points. I've used JMP too and really like it.

Indeed, I think the first step is to write a utility that maps GUI clicks to python code (as a string). I can try to help with that, but I don't know enough about your implementation of the GUI.

@adamerose
Copy link
Owner

I can try to help with that, but I don't know enough about your implementation of the GUI.

Yeah I've been meaning to add some documentation on how things are structured. But first I want to do some planned refactoring to clean the code up before trying to get more popularity & requesting contributors. A quick rundown of how it is right now:

  • Everything is built on PyQt5.
  • The main GUI object is PandasGUI(QtWidgets.QMainWindow) in /gui.py, which keeps a dictionary called self.df_dicts holding the names of each DataFrame in the GUI along with a corresponding dataframe_explorer object.
  • The dataframe_explorer (DataFrameExplorer(QtWidgets.QTabWidget) in widgets/dataframe_explorer.py) is just a QTabWidget with the various UI tabs corresponding the each DataFrame. The first tab contains an instance of a dataframe_viewer (DataFrameViewer(QtWidgets.QWidget) in widgets/dataframe_explorer.py), which is the widget that displays a DataFrame as a spreadsheet
  • Right now all editing happens inside DataFrameViewer methods on its local copy of the DataFrame, but that's subject to change

Indeed, I think the first step is to write a utility that maps GUI clicks to python code (as a string).

That sounds good, I still want to do some thinking about this. One thing that would be useful is a list of all current and future potential actions a person might do in the GUI that affects a DataFrame so I can make sure they'd work in my design. If you have any other ideas, add them to the list

Current

  • Edit DataFrame by double clicking a single cell and typing a new value
  • Edit DataFrame by pasting. (single or multiple cells, comma delimited format).. this is broken right now, need to fix
  • Sort DataFrame by a column by left clicking the header

Future

  • Modifying DataFrames
    • column reorder
    • change column data types
    • dropna
    • date parsing
    • fillna
    • outlier detection
    • reshape (melt, wide_to_long, pivot, pivot_table, stack, unstack, etc.)
  • Creating new DataFrames:
    • reshaped existing ones
    • import CSV
  • Delete DataFrames from the GUI
  • Filters that can be toggled on and off with an option to lock them in (overwrite original or create new DataFrame with the filters applied). I'd store two copies of each DataFrame in the GUI, the main one and one with enabled filters applied. An edge case is what happens if you do filtering like this and then edit it, I think the changes should propagate back to the original DataFrame
  • JMP-style 'calculated' columns which are columns with values defined by a function, that get automatically recalculated after any change to the DataFrame
  • Lasso selection that filters the DataFrame

@pandichef
Copy link
Author

This looks good. You might also want to include a "how to contribute" section in README.md.

I wouldn't prioritize these tbh:

  • calculated columns: JMP tries to be like a hybrid between MS Excel and SAS, which is why a calculated column makes sense. However, data scientists usually want to use the GUI just to prototype... but ultimately want a script to repeat the actions taken in the GUI.
  • graphing: graphing is pretty straightforward in iPython, so I don't think graphing capabilities will offer as much value to users as the DataFrame editor with "recording".

@adamerose
Copy link
Owner

UPDATE

Record history
I added a basic history log. You can find it in the latest dev commits, it's under the "Debug" menu. This required some refactoring but now and going forward any action you take in the GUI will occur by calling some method in pandasgui.store.PandasGuiDataFrame and if it has the @track_history decorator it will be recorded. This doesn't give copy-pastable code to re-create your actions in Python, I found that required too much maintenance. I considered just dumping the code for each method into the log using something like inspect.getsource but that was messy and I think most functions are simple enough to reproduce or people can just look at my source code directly.

image

Sync modifications
I decided not to add this feature since inplace modification was messier to implement, goes against Pandas best practice, and would probably would annoy more users than it would help (playing around in GUI with your dataset then realizing you messed up the original). I added an alternative though, you can call .get_dataframes() on the PandasGui object (returned by show) and get a dictionary of the dataframes in the GUI with your modifications.

image

The code base is also cleaner than last time and since all DataFrame modification happens in that one class it should be easier to follow what's going on.

@KobaKhit
Copy link

A quick note about the recorded history of operations on a data frame in the GUI. How are you applying the filter queries to a dataframe. Through parsing them into pandas operations? You could use the same mechanism to reapply the filters given the filter queries in your top left section. This will make operations performed in the gui reproducible using pandas gui own filtering routine. You can save the operation history string (list of strings) as an attribute in the gui class and should be straight forward to update every time a filter is added or cell is edited.

Then reapply the filters will become something like

from pandasgui import apply_filters
...
h = gui.history
h
['60<hp<70','Name == Abra']

df_processed = apply_filters(h, df)

@adamerose
Copy link
Owner

adamerose commented Oct 26, 2020

A quick note about the recorded history of operations on a data frame in the GUI. How are you applying the filter queries to a dataframe. Through parsing them into pandas operations? You could use the same mechanism to reapply the filters given the filter queries in your top left section. This will make operations performed in the gui reproducible using pandas gui own filtering routine. You can save the operation history string (list of strings) as an attribute in the gui class and should be straight forward to update every time a filter is added or cell is edited.

Then reapply the filters will become something like

from pandasgui import apply_filters
...
h = gui.history
h
['60<hp<70','Name == Abra']

df_processed = apply_filters(h, df)

The part that actually applies the filters is just a loop with df = df.query(filt.expr), that method does the parsing and applying the operations. It happens in store.py like this:

    @track_history
    def add_filter(self, expr: str, enabled=True):
        filt = Filter(expr=expr, enabled=enabled, failed=False)
        self.filters.append(filt)
        self.apply_filters()

    @track_history
    def remove_filter(self, index: int):
        self.filters.pop(index)
        self.apply_filters()

    @track_history
    def edit_filter(self, index: int, expr: str):
        filt = self.filters[index]
        filt.expr = expr
        filt.failed = False
        self.apply_filters()

    @track_history
    def toggle_filter(self, index: int):
        self.filters[index].enabled = not self.filters[index].enabled
        self.apply_filters()

    def apply_filters(self):

        df = self.dataframe_original
        for ix, filt in enumerate(self.filters):
            if filt.enabled and not filt.failed:
                try:
                    df = df.query(filt.expr)
                except Exception as e:
                    self.filters[ix].failed = True
                    logger.exception(e)
        self.dataframe = df
        self.update()

And those @track_history decorated methods will show up in the history like this

HistoryItem(name='add_filter', args=(), kwargs={'expr': 'HP > 56'}, time='10:46:09')
HistoryItem(name='add_filter', args=(), kwargs={'expr': '`Type 1` == "Grass"'}, time='10:46:18')
HistoryItem(name='toggle_filter', args=(1,), kwargs={}, time='10:46:21')
HistoryItem(name='toggle_filter', args=(1,), kwargs={}, time='10:46:22')
HistoryItem(name='remove_filter', args=(1,), kwargs={}, time='10:47:10')

So I'm already storing a history of these methods and arguments, it's just going beyond this and being able to generate runnable code is where things get tricky, since these methods often reference self or the data store which wouldn't exist in a user's script

@KobaKhit
Copy link

Does not have to bu runnable code. Can be a just a function or method as long as it allows to convenietnly reproduce the processed data frame. If you are open to commits I could look into creating a convenient way to reapply to filters from history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants