Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: mention most frequent value #47

Closed
paulfeitsma opened this issue Nov 1, 2018 · 4 comments
Closed

Suggestion: mention most frequent value #47

paulfeitsma opened this issue Nov 1, 2018 · 4 comments

Comments

@paulfeitsma
Copy link

In the data frame summary if an column contains 115 distinct values (such as countries) and 99% of the values is a specific country, this is very useful to mention what the most frequent country is. In general I believe It is usefull to display to most frequent values.

@dcomtois
Copy link
Owner

dcomtois commented Nov 3, 2018

Isn't it what it does already? The most frequent occurences are listed along with their frequencies. Am I misunderstanding your suggestion?

@paulfeitsma
Copy link
Author

Let me try to this with an example (code below), with a sample data set with salaries. I found that many times there are always a few salaries that appear more often than others. In my example this are 1594.20 (minimum monthly wage in The Netherlands) and 2000. This will create the following data frame summary.

df_summary_example_salaries

With regard to the 2nd column it would be handy if it would mention that values 1594.20 and 2000 appear most frequent. The first one you can see in the graph, but the value 2000 would be overlooked.

minimum_wage <- 1594.20; df <- data.frame(PersonID = 1:1000 ,salary = c(sample(x = minimum_wage:5000, size = 600), rep(minimum_wage, 200), rep(2000,200))); view(dfSummary(df)); head(sort(table(df$salary), decreasing=TRUE));

@dcomtois
Copy link
Owner

dcomtois commented Nov 6, 2018

Ah ok, I see now, thanks for clarifying... Deserves some thinking for sure!

@dcomtois
Copy link
Owner

After giving it some thought, I think this is a bit of an overkill. The mode will be shown for binary variables (issue #48) but otherwise, I think there's already a lot of info for numerical variables. I'll reopen if the feature is requested further in the future. Thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants