Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto complete columns #95

Merged
merged 5 commits into from Feb 8, 2021
Merged

Conversation

fdion
Copy link
Contributor

@fdion fdion commented Feb 5, 2021

Purpose

Adding autocompletion to the filter QLineEdit widgets. This is to get feedback. I would think this is better than copy / paste or drag and drop, maybe? If this works ok, I'd like to expand and do proper filtering of valid values based on column in a future PR.

This doesn't really close #88 as I think the column header filtering is probably what a lot of people really want, but that is too much of time budget for me ATM.

Added:

  • constant for max cardinality of categoricals to be included in autocompletion (should be a user setting...)

  • checkbox below the filter box to enable/disable autocompletion
    backtick_checkbox

  • QCompleter based on a list of columns (backtick delimited) and values for low cardinality categoricals (quote delimited)

  • case insensitive suggestions
    all_c_matches

  • start with backtick to get list of columns only
    backtick

  • type the numexpr as usual, ie ==, in (), !=, then to select an available value, use quote (")
    quote

@fdion
Copy link
Contributor Author

fdion commented Feb 5, 2021

as a side note, you can't run filter_viewer.py directly, but that's not something I broke.

@adamerose
Copy link
Owner

Cool feature! Still needs some tweaks though I think

When I type country it suggests values in continent
image

This should autosuggest only values in the country columns, not other column names:
image

This should suggest other column names:
image

as a side note, you can't run filter_viewer.py directly, but that's not something I broke.

Just pushed a fix for that

@fdion
Copy link
Contributor Author

fdion commented Feb 8, 2021

When I type country it suggests values in continent

This wont work until a full custom completer is built. That would be a major project. I'll have to let that simmer for a while to see how I could even implement this. The reason is that:

  • QCompleter doesn't have a way to handle that
  • hierarchical model assumes same separator, and I can't see a way to make this work, especially since if we start with backtick twice in a row, that means we are at the same level of hierarchy. QCompleter can't cope with that complex of a model
  • some kind of state machine support is needed or similar

Also, as to why the country names are not showing up: cardinality is 187, but I've set in constants.py:

CATEGORICAL_THRESHOLD = 50

I've pushed some code to detect categoricals. If you load a parquet file with country as categorical, or set it in the notebook before starting pandasgui, it'll work and find the country names. I could also bump the constant to 200, but how much is enough? Might hit a case with 201. or 300. Didn't want to bog down too much the suggestion speed, so there is a tradeoff here.

Also fixed a few bugs that were eating spaces etc.

This should suggest other column names:

Yep, that now works, I switched the split to regex to include both backtick and double quote matches.

Unfortunately, this doesn't support unicode at the moment, since pandas doesn't handle them in column names with numexpr (it does in all other features). I'll have to think about how to handle this. PR against Pandas is probably unlikely to get merged anytime soon...

I think the current state is usable, let me know what you think.

@adamerose
Copy link
Owner

I've pushed some code to detect categoricals. If you load a parquet file with country as categorical, or set it in the notebook before starting pandasgui, it'll work and find the country names. I could also bump the constant to 200, but how much is enough? Might hit a case with 201. or 300. Didn't want to bog down too much the suggestion speed, so there is a tradeoff here.

Yeah thanks. We can improve this heuristic later.

Yep, that now works, I switched the split to regex to include both backtick and double quote matches.

Still doesn't work for me. To clarify what I meant, when you have a backticked column name already in the text it won't suggest more column names if you type a third backtick. Like when someone types this it should suggest more column names

`country`==`

I think the current state is usable, let me know what you think.

Sure I'll merge, the improvements can be added if you have time.

@adamerose adamerose merged commit 6f13517 into adamerose:develop Feb 8, 2021
@fdion
Copy link
Contributor Author

fdion commented Feb 8, 2021

Still doesn't work for me. To clarify what I meant, when you have a backticked column name already in the text it won't suggest more column names if you type a third backtick. Like when someone types this it should suggest more column names

you have to have a space before the backtick or the double quote for this to trigger again. That's how the re does the split. I could probably design a stronger regex to tackle that, but then you get into more and more corner cases. Also, I typically start typing, down arrow then space then operator, then space then backtick or double quote, down arrow to select, then tab (highlights button) space to add filter (or enter). This does bring the point of documenting all of that and documentation in general...

@fdion
Copy link
Contributor Author

fdion commented Feb 8, 2021

you have to have a space before the backtick or the double quote for this to trigger again

Or comma, if you do a column (unit) in ( "something","something else" )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Easy selection of variables for filters / filtering in general
2 participants