Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: add TOP x column values and distribution. #65

Open
diegodewilde opened this issue Feb 9, 2023 · 3 comments
Open

Question: add TOP x column values and distribution. #65

diegodewilde opened this issue Feb 9, 2023 · 3 comments

Comments

@diegodewilde
Copy link

Hi,

I was looking at this project and I must say: it's awesome and something that dbt docs currently is missing.

One thing got in my mind is the question why there's not an option to add the TOP x column values and their distribution? Is there any other reason to not include this in the docs?

Like in this example where you show TOP 2 for example:

Column Name Top 1 Value Distribution Top 2 Value Distribution
Column 1 Value 1 A 0.50 ("number"/"total") Value 1 B 0.20 ("number"/"total")
Column 2 Value 2 A 0.50 ("number"/"total") Value 2 B 0.30 ("number"/"total")
Column 3 Value 3 A 0.10 ("number"/"total") Value 3 B 0.05 ("number"/"total")
Column 4 Value 4 A 0.10 ("number"/"total" Value 4 B 0.05 ("number"/"total")

Looking forward to your thoughts!

@stumelius
Copy link
Contributor

@diegodewilde I've thought about adding a "mode" (most common value) profiling metric to the package but never around to implementing it. This proposal expands the mode concept into N most common values and I think it's a good idea.

Just throwing thoughts here:

  • What would be a sensible default for the number of top values? 1, 2, 3?
  • How should we name the columns? top_1_value, top_1_value_proportion, top_2_value, top_2_value_distribution, etc?
  • Is there a better way to display the distributions than the (value, proportion) pairs for each top value?

Would you be interested in implementing this? :)

@diegodewilde
Copy link
Author

Hi stumelius,

  • It would make sense to make this dynamic, so you can choose the amount of top values you want to see in your docs. Not sure if that's possible?
  • Sounds like a good suggestion, I actually think this goes hand-in-hand with the visualization you would like to see here.

@stumelius
Copy link
Contributor

@diegodewilde Circling back to this. Is this feature still in your interests and if so, would you like to contribute? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants