Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Statistics - open issues #6173

Closed
3 of 4 tasks
lanzagar opened this issue Oct 14, 2022 · 2 comments · Fixed by #6294
Closed
3 of 4 tasks

Feature Statistics - open issues #6173

lanzagar opened this issue Oct 14, 2022 · 2 comments · Fixed by #6294
Assignees
Labels
snack This will take an hour or two wish

Comments

@lanzagar
Copy link
Contributor

lanzagar commented Oct 14, 2022

After PR #6158 there are some open issues left that should be discussed:

  • Compute Mode for numeric variables?
  • Show Mode in the widget to make it consistent with the new (and improved) output? Currently the mode is squeezed into the Median column, which would otherwise be empty for categorical variables. But numeric variables could have both...
  • Documentation should be updated
  • There are some warnings which could cause issues in the future:
Orange/statistics/util.py:510: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.
  res = scipy.stats.stats.mode(x, axis)
orange-widget-base/orangewidget/gui.py:2068: UserWarning: decorate OWFeatureStatistics.commit with @gui.deferred and then explicitly call commit.now or commit.deferred.
@lanzagar lanzagar added the needs discussion Core developers need to discuss the issue label Oct 14, 2022
@wvdvegte
Copy link

If it doesn't require a lot of effort, I think computing mode for numeric variables makes sense - especially if they are integers. And even if they're real numbers it makes sense - among them, there are sometimes round numbers like 0 and 1 that appear in a dataset more often than other numbers. This could be useful information.

@janezd
Copy link
Contributor

janezd commented Oct 14, 2022

Showing mode for numeric variables is trivial. How would it look in the output? Current output variable 'mode' is a string variable because it contains values of different variables (which is what #6185 was mostly about). If this same variable also contained mode for numeric variables, they would be strings? Or would it again be a separate column? @lanzagar?

Screen Shot 2022-10-14 at 18 41 48

As for the second warning: after introducing decorators for deferred commits, I changed almost all widgets to use them (#5495). I remember skipping this one, but forgot to document the reason. It could have been that I'd like to avoid recommitting output "Statistics" if only "Reduced Data" has changed. I'l look into it.

@janezd janezd added wish snack This will take an hour or two and removed needs discussion Core developers need to discuss the issue labels Jan 10, 2023
@janezd janezd self-assigned this Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
snack This will take an hour or two wish
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants