Indicate column where "Number of Distinct values" = "Total Number of rows" #170

dangus-aktivbo · 2022-08-25T07:34:24Z

Scanning numeric columns, I quickly wish to find out which columns have unique, distinct, values on each row.

The usefulness of dfSummary in scanning columns quickly, and figuring out the structural and statistical properties of each column. Normally, when I dig into datasets, I try to quickly find out if natural keys, like social security number, housing address, customer id etc are duplicated. The simplest way now, is to do a count-distinct (eg n_distinct(x) in dplyr) and compare distinct values to the row number of the data frame. I'm using dfSummary a lot, and think this would be a super enhancement.

One possible solution is to add a "% distinct" value on the marked columns since you have a (% of valid) in the column header. Or a "flag" like a string saying "Unique" or "(all unique)" or something. Now I have to check the Freqs against the row count, which of course is just a minor inconvenience... Anyway.

dcomtois · 2022-09-20T05:16:29Z

This is a good idea. I'd go for the "All distinct values", however, a new term ("All") will need to be added to the translations dataset, which will require some work. Help is always welcome.

dcomtois added the help wanted label Sep 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indicate column where "Number of Distinct values" = "Total Number of rows" #170

Indicate column where "Number of Distinct values" = "Total Number of rows" #170

dangus-aktivbo commented Aug 25, 2022

dcomtois commented Sep 20, 2022

Indicate column where "Number of Distinct values" = "Total Number of rows" #170

Indicate column where "Number of Distinct values" = "Total Number of rows" #170

Comments

dangus-aktivbo commented Aug 25, 2022

dcomtois commented Sep 20, 2022